Introduction

The interplay among technologies in today’s software landscape can produce remarkable results. The convergence of WebAssembly, Rust, and Serverless/Function as a Service (FaaS) architectures makes us envision data processing platforms that are both frugal, safe, and highly efficient.

In a previous exploration, we demonstrated the potential of using Rust and WebAssembly for efficient data transformation within Punch data processing pipelines. We prototyped a pipeline engine to run WebAssembly data processing functions on streaming data. Since then, the integration of Rust and  WebAssembly has seen substantial growth. Rust’s expressive syntax, memory safety, and WebAssembly’s binary format have solidified their place as a dynamic duo for accelerated and secure code execution. The open-source community’s dedication to refining this integration has paved the way for smoother development experiences and improved runtime performance. 

Our prototype only focused on WebAssembly. We decided to develop a robust industrial version with additional objectives:

  1. Frugality: There is no need to explain the dramatic contribution of data centers and data processing to the ever-increasing worldwide carbon emission. We look for the tiniest data processing engine to reduce energy costs wherever possible. This usually starts with attempts to reduce the required CPU and memory footprints; in this space, Rust is a wise choice. This is, however, not good enough; a holistic approach to processing only the required data at the right place is the proper solution. As we will see, such smart architecture can significantly benefit from the Rust WebAssembly couple. 
  2. Safety: We work in a company that provides essential mission-critical services. Auditability, security, and sandboxing are, of course, paramount.
  3. Performance: Frugality and safety must come with improved performance.

We implemented a robust industrial asset called REEF to achieve these goals. 

REEF logoREEF stands for REsource Efficient Functions. It is (yet another) engine that allows you to run your business functions, themselves written in Rust or any programming language compatible with WebAssembly.

In the rest of this blog, we will explain how it works, why we do it, and how we plan to achieve frugal data processing. 

The case for serverless architecture

Serverless means you only code the business functions, not the server provided by someone else. REEF is an application that is designed to host your functions. At Punch, we have deployed functions for years now in various runtime engines, small or big, leveraging Java, Python, (py)spark, or Flink capabilities and features. Exploring the same pattern using Rust and allowing developers to ship their functions in Rust or WebAssembly was thus a logical step.  

Here is how it concretely works: to deploy your business functions on data streams, you insert them as part of graphs that you define using a simple configuration file (or a fancy editor). That graph chains data sources, (your) functions, and data sinks. Sources (resp. sinks) are responsible for reading the data from (resp. to) somewhere useful. You only focus on your business functions. Here is what it looks like.

Note that REEF, as such, is not a serverless solution; it is only one of its main building blocks. An example of a Kubernetes native serverless platform is Punch, which provides lots of management and configuration features to automate the deployment of functions inside REEF. This is, however, not the topic of this blog. Remember that REEF has no adherence to Punch, Kubernetes or any other stack; you can use it as is to build your solution.

Why REEF?

Returning to REEF as a function processing engine, many similar engines are on the market or in the open-source communities. Some specialized in log management, others in artificial intelligence or big data. Presented this way, REEF is simply a new implementation in Rust with WebAssembly support. What are the alternatives? Here are some in the Rust ecosystem :

  • Vector: a lightweight observability runtime written in Rust. It provides many collectors. REEF is similar to Vector but focuses on two features not offered by Vector: an explicit node and function API and the support of WebAssembly.
  • RisingWave: a promising streaming SQL engine written in Rust. See it as an Apache Flink alternative. RisingWave is SQL-centric, which is excellent, and only allows you to plug in user-defined functions. It provides exact-once semantics, scalability, and high availability. We plan to use it to implement cases requiring stateful complex processing, which is not REEF’s mission.  
  • Spin: “Spin is an open source framework for building and running fast, secure, and composable cloud microservices with WebAssembly.” Spin shares REEF objectives to make deploying WebAssembly user functions easy.

Why REEF, then? REEF has a precise focus that differs from these friend technologies:

  • Run WebAssembly and Rust functions in a stateless/at-least-once engine that fits small devices. This sounds simple but poses subtle issues in data exchange between Rust and WebAssembly, as explained below. 
  • Allow for lightweight ML use cases. We use REEF to run TensorFlow lightweight models.
  • Offer device management features such as remote function updates. Our goal is to equip devices with REEF binaries that are, in turn, quickly and safely updated with new versions of WebAssembly functions. WebAssembly technology brings critical advantages.

This combination of management, frugality, and flexibility positions REEF as an attractive choice for modern, containerized ecosystems, from large-scale cloud deployment to edge IoT devices.

REEF Connectors

REEF provides 20+ out-of-the-box and well-documented connectors, including HTTP, Kafka, and MQTT, each with security features to ensure data integrity. More importantly, REEF exposes a clean API and quick-start examples that allow developers to create custom native connectors, opening up possibilities for tailored integrations. This dual approach balances convenience and extensibility, addressing fast deployment needs and specialized use cases.

Custom or not, the provided connectors free the business developers from the intricacies of I/O operations, asynchronous style programming, and subtle memory allocation issues requiring fair expertise in system programming and Rust. 

The case for WebAssembly

Central to the capabilities of REEF is its support for WebAssembly. Developers can submit their functions as WebAssembly modules. This opens the possibilities to code functions in one of the 60+ programming languages compatible with WebAssembly. This portability comes at (almost) no performance penalty.

Functions are encapsulated within a secure and isolated sandbox. As a result, updating and managing these functions becomes simple and safe, even for a  fleet of remote IoT devices: no more costly and dangerous firmware updates. 

The integration of WebAssembly relies on two key points.

  • Typed Data Serialization Mechanism: REEF leverages a specific mechanism for serializing typed data. This facilitates seamless communication between the Rust engine and the WebAssembly module.
  • High-Level API: REEF exposes a high-level API that, under the hood, orchestrates the glue between the Rust engine and the WebAssembly function, as just explained.

rline (pronounced “airline“) takes the spotlight as the name of REEF’s Rust FaaS libraries. Developers can access these libraries through APIs available on crates.io—specifically, rline_macro and rline_api.

The following code snippet demonstrates a simple data transformation use case using REEF  WebAssembly capabilities. Say you want to convert a date from one format to another. Write the following function in Rust:

Cargo.toml

[package]
edition = "2021"
name = "enrichment"
version = "0.1.0"

[lib]
crate-type = ["cdylib"]

[dependencies]
rline_macro = "1.0"
rline_api = "1.0"
chrono = { version = "0.4.24", default-features = false, features = ["std"] }

lib.rs

use std::time::{Duration, UNIX_EPOCH};

use chrono::prelude::DateTime;
use chrono::{Timelike, Utc};

use rline_api::row::Row;
use rline_api::value::Value;
use rline_macro::rline_bindgen;

#[rline_bindgen]
pub fn moment(row: Row) -> Result<Row, String> {
    let timestamp = row
        .get("creation_timestamp")
        .unwrap()
        .as_integer()
        .expect("Not an integer");

    let system_time = UNIX_EPOCH + Duration::from_secs(*timestamp as u64);
    let date_time = DateTime::<Utc>::from(system_time);
    let moment = match (date_time.hour(), date_time.minute()) {
        (0, 0) => "midnight",
        (12, 0) => "midday",
        (h, _) if h <= 8 || h >= 20 => "night",
        _ => "day",
    };

    Ok(Row::from([(
        "moment".to_string(),
        Value::from(moment.to_string()),
    )]))
}

Let us go through the important steps:

  1. Importing Dependencies:

    • The std::time module is imported for handling time-related operations. You can import any WWASI-compatible standard libraries.
    • The chrono crate is imported to work with dates and times.
    • The rline_api::row::Row and rline_api::value::Value are imported from the REEF API for data manipulation.
    • The rline_macro::reef_bindgen attribute is imported, which is used for binding the function to the REEF engine.
  2. Defining the WebAssembly-Powered function:

    • The function moment is defined with a type parameter representing a data row.
    • The function returns a Result<Row, String>, indicating success or failure. REEF is well-equipped to deal with errors.
    • The function is decorated by the reef_bindgen macro that requires this signature.
  3. Extracting timestamp and determining moment of the day:

    • The timestamp is extracted from the input Row using the get method and converted to an integer.
    • The system_time is calculated by adding the timestamp to the UNIX_EPOCH time.
    • A DateTime The object is created from the system_time, specifying that it is in the UTC timezone.
    • A variable moment is determined based on the hour and minute of the DateTime thing. It arbitrarily categorizes the time as ” midnight,” “midday,” “night,” or “day.”
  4. Creating and returning the output:

    • A new Row is constructed containing a single column named “moment” with the determined moment value. The Value is created from the moment string.
    • The Row is wrapped in an Ok variant of the Result and returned.
  5. Deploying

    • Compile the Rust library with cargo, targeting wasm32-wasi. The Rust Programming Language supports WebAssembly as a compilation target. The WASI target is integrated into the standard library and is intended for producing standalone binaries.
    • Package the wasm module along with REEF binary or, best, use the Punch platform that provides advanced packaging and deployment for versioned functions. 

It is worth noticing that REEF functions accept as input parameters rows of data. In this case, the input row contains a single column with a timestamp, determines the moment of the day based on the time, and produces a new row containing the calculated moment. 

Why rows? Because all Punch pipelines expose standard and well-known (SQL) concepts of tables, rows, and columns. These are widely used, are easy to understand, and allow additional services to be implemented, such as data lineage, SQL manipulation, schema sharing, etc.

REEF Use Cases

There are countless potential usages of such a Rust/WebAssembly function engine. Here are the ones we currently work on:

Carbon Calculation Platform.

The first use case, Thalc, consists of collecting electricity measures from various data centers and forwarding the data to a central platform hosted on the Google Cloud Platform. REEF is used here as a lightweight agent to grab data from servers (where it is pre-installed) or from applications (using its HTTP poller source). It can thus act as an agent or as a gateway. In both cases, it forwards the data using its HTTP sink to a platform where carbon and energy costs are further computed and exposed to the customers.

Biodiversity Tracking

REEF is used here on small devices to capture bird songs (audio signals), apply some TensorFlow Lite song identification, and forward the results to a central platform where the findings can be further processed and visualized. This is somehow a re-implementation of the well-known Birdnet application using Rust and WebAssembly technologies. We also explore the deployment of such applications onto RISC-V hardware. Here is a simple architectural view of this application:

A complete and separate blog describes this interesting experiment.  

Log Management and CyberSecurity Platforms

Lastly, we plan to use REEF as a unique processing engine for Punch-powered log management platforms to keep reducing their overall footprints. Today, a single Java Punch log processor can handle several tenths of thousands of logs per second. The processing includes log autodiscovery, parsing, enrichment, normalization, error processing, and indexing. We do not expect to beat these numbers using Rust. We have already shown that this Java engine performs similarly to Vector, a similar Rust implementation. 

However, REEF can reduce the required memory and CPU resources, significantly reducing the underlying (Kubernetes) platform size. This footprint reduction makes deploying small remote log collectors easier on a single or a few servers, on top of lightweight Kubernetes instances, or even lighter directly as an autonomous application that will provide remote management facilities. Think of updating parsers packaged as WebAssembly functions. 

Looking Ahead: A Roadmap to Uncharted Horizons

Our exploration into efficient data processing and sustainability is just starting. We are embarking on a new and exciting chapter as we forge partnerships with academics where innovation and biodiversity conservation intersect.

We are collaborating to develop an innovative biodiversity monitoring solution. This comprehensive solution encompasses end-to-end signal capture using intermittent microcomputers, machine learning-based species identification, and data transmission to a central platform. There, Punch showcases its value, facilitating the development of data algorithms through its fully-packaged Jupyter component and empowering the creation of insightful biodiversity dashboards.

The challenge we face is minimizing the energy consumption of the deployed sensors and ensuring that they operate efficiently. By leveraging REEF, we aim to optimize the energy consumption of the deployed sensors, maximizing their battery life and minimizing their environmental impact.

This endeavor poses multifaceted challenges. Fleet management and deployment of sensors, cross-compilation of code for compatibility across diverse hardware, and ensuring the seamless integration of REEF into our solution are just a few aspects we are currently addressing.

Conclusion: Crafting the Future of Data Processing

Our solution’s technical pas de trois of frugality, safety, and security resonates across the industry, from environmental monitoring to sustainable energy management, embedded systems, and wildlife research. Through our dedication to collaboration and open-source principles, we invite developers, researchers, and enthusiasts to join us in this exciting journey. 

Stay tuned for more exciting updates and innovations in the coming months!

As always, thanks to the team!

 

 

Author


5 Comments

FaaS revisited using Wasm and Rust - The Punch · September 14, 2023 at 13:56

[…] To continue reading about our work, click over to the next page at Frugal and Robust Data Processing Using Rust and WebAssembly. […]

Navigating the Challenges of Integrating TensorFlow Lite into Rust: A BirdNet Use Case - The Punch · September 14, 2023 at 15:20

[…] experiment is part of a larger innovation track in charge of providing a Rust/WebAssembly function engine to design frugal serverless […]

Why we use Punch - The Punch · September 18, 2023 at 09:34

[…] (Rust) punchlines are deployed directly into the external observed systems to grab server consumption measures. They forward (directly or through an intermediate gateway. That gateway is yet another punchline.  […]

Sigma rule processing using streaming SQL - The Punch · October 12, 2023 at 08:48

[…] function engine dedicated to running Rust and Web Assembly functions. This engine, called Reef, will help us deploy detection rules on edge devices or central platforms shipped as Web Assembly […]

Eco-Design Meetup: Innovating with Rust and WebAssembly · November 20, 2023 at 12:43

[…] Frugal and Robust Data Processing Using Rust and WebAssembly by Reyyan Tekin […]

Leave a Reply

Avatar placeholder