WebAssembly (abbreviated Wasm) is a binary instruction format that allows you to compile application code written in over 50 languages (including Rust, C++, and Golang) and run it inside sand-boxed environments.
WebAssembly was initially designed to run native code in web browsers to improve the speed of the web applications while increasing the safety brought by the sandboxed environment.
Since there, runtimes like WasmEdge or Wasmer have been developed to run WebAssembly modules either Standalone or Embedded within other languages such as C, Rust, Go or Javascript.
Rust is a multi-paradigm, general-purpose programming language used for fast, low-resource and cross-platform solutions. Rust emphasizes as well reliability with its strong type system and ownership model that guarantee memory-safety and thread-safety.
For these reasons, both Rust and Wasm combined are extremely attractive to design small, efficient and safe data processing functions.
Punch provides a new beta feature to do exactly that. First, Punch provides a Rust function API to simplify the development of data processing functions. That API is available on crates.io. Next, Punch also provides a Wasm runtime to run these functions and insert them in configurable data pipelines.
To illustrate these features, let us go through a simple use case.
Air Quality Anomaly Detection
Our goal is to detect anomalies on the air quality. Input data comes from a BME680 sensor embedded on a Raspberry-Pi. That sensors sends us in real time air quality measures.
The punch library makes it easy to receive the data directly as serialized rust structure. More precisely the library defines:
- two structures that describe the input and output types (here
InputValue
andOutputValue
). These types can be primitive or complex types as long as they implement the Serde’sSerialize
andDeserialize
traits such asVec<Timestamp>
here.
use serde::{Deserialize, Serialize}; use iso8601_timestamp::Timestamp; #[derive(Deserialize)] pub struct InputValue { air_quality: Vec<f32>, timestamp: Vec<Timestamp> } #[derive(Serialize)] pub struct OutputValue { air_quality: Vec<f32>, timestamp: Vec<Timestamp>, is_anomaly: Vec<bool> }
- one type that implements the punch_api’s
Function
trait. TheFunction
trait has a single methodexecute
that given an input value returns aResult
of an output value on success or an error on fail. Indeed, punch_api defines a typeError
to propagate errors to the pipeline runtime. Once the type is defined, it is registered with the punch_api’sregister
macro.
use punch_api::{Error, Function, register}; use anomaly_detection; pub struct AnomalyDetection; impl Function<'_, InputValue, OutputValue> for AnomalyDetection { fn execute(&self, input: InputValue) -> Result<Box<OutputValue>, Box<Error>> { let mut is_anomaly = vec![false; input.air_quality.len()]; for idx in anomaly_detection::params() .alpha(1.0) .max_anoms(0.5) .fit(&input.air_quality, 10) .unwrap() .anomalies() { is_anomaly[*idx] = true; } Ok(Box::new(OutputValue { air_quality: input.air_quality, timestamp: input.timestamp, is_anomaly })) } } register!(AnomalyDetection);
As you can see, developers focus mostly on the data logic in the execute
method. Using minimal code, an algorithm of anomaly detection available on the shelf is imported: the Seasonal Hybrid ESD implemented in the anomaly_detection
crate.
Go Production
Simply compile the function to WebAssembly with cargo:
cargo build --release --target wasm32-unknown-unknown
You can package the previously built WebAssembly in a Punch artifact and upload it in the Punch Artifact Server. See the Punch artifact server as your FaaS Function Registry. All your functions (java python rust spark etc..) are packaged and published to that registry first. Once there you next decide what function you want to deploy on what type of data: real-time, stream, batch, the choice is yours.
Here is a capture of the artifact registry UI once our rust function has been published. In addition to our anomaly function, you also see a punch provided Wasm connector
that makes it possible to run Wasm functions as part of the existing Punch pipelines.
Here is the actual execution pipeline. This view is the punch pipeline editor UI. Hopefully you easily guess what that pipeline does: first, the data is read from a TCP socket. That raw log (in fact CSV lines) is then parsed and transformed using a simple punchlet. Punchlets are small functions coded using a special language called punchlang that makes it extra easy to code JSON, CSV or any textual document transformation. Of course we could have done that part in Rust as well, but the point is to illustrate here how to combine several functions to do the job.
The parsed logs are then forwarded to the WebAssembly Function. The Punch provided connector makes the glue with the WebAssembly runtime. It is a stateful node that buffers input measures to get fairly long time series for the anomaly detection.
Finally, the result of the anomaly detection is written to an Elasticsearch index. Here is a Kibana dashboard of the application.
Conclusions
Punch provides a first experimental version of a FaaS Rust API punch_api
. It allows you to write functions in Rust and to integrate them in powerful data pipelines with minimal code; and with the support of generic input and output types.
We demonstrated its usage with a complete end to end data application: time series anomaly detection over the index of air quality.
This year we work seriously on this topic to provide Punch with a new pipeline engine written in Rust and providing full support for this new class of data processing functions. If you are interested in collaborating with us, get in touch, we plan to work on this jointly with student, academics and friends.
To continue reading about our work, click over to the next page at Frugal and Robust Data Processing Using Rust and WebAssembly.
1 Comment
Smart data FaaS with WebAssembly & Rust · September 14, 2023 at 12:38
[…] a previous exploration, we demonstrated the potential of using Rust and WebAssembly for efficient data transformation […]