NebulaStream: The core of the ELEGANT Orchestrator and Elastic Runtime

NebulaStream is a novel end-to-end data management system for the IoT, and is the core software on which we build the ELEGANT Orchestrator and the ELEGANT Elastic Runtime.

Recently, we released version 0.2.0 of NebulaStream under an Apache License v2.0, and we look forward to collaborators.

The success of cloud-based computing and data analytics is in many ways due to the existence of popular data analytics platforms, such as Apache Hadoop, Spark, and Flink, and the software ecosystems that build around them. However, these cloud-based data analytics platforms are not suitable for emerging Internet-of-Things (IoT) applications. The characteristics of cloud and IoT environments are fundamentally different, as the table below shows.

Consequently, there are a number of research prototypes for stream processing for IoT devices, e.g., Frontier or CSA, or wireless sensor networks, e.g., TinyDB.

Furthermore, while popular cloud-based data analytics frameworks often exhibit good scale-out behavior, they do not use the hardware efficiently, as the research by Zeuch et al. and McSherry et al. shows.

This is why, at the DIMA Group at Technische Universität Berlin, we are developing NebulaStream, an end-to-end data management system for the IoT. NebulaStream features many capabilities, which are highly desirable in such an environment:

Dynamic Dataflow Execution: NebulaStream is powered by a data processing engine that runs stateful distributed computation while maximizing hardware resource utilization. Overall, NebulaStream attains high per-core data processing throughput.

Hardware-tailored Query Compilation: NebulaStream features a query compiler that lowers user queries into an own Intermediate Representation (IR). After that, the query compiler lowers the IR to efficient machine code, which is tailored for executing on the underlying hardware.

Robust Query Optimization: NebulaStream comes with a robust optimizer designed for optimizing thousands of queries. It comes with extended IoT-specific query rewrite optimizations, a state-of-the-art multi-query optimizer, and several placement algorithms.

Adaptive Sensor Management Layer: NebulaStream considers sensor hardware as a first-class component and uses it to adapt to incoming data changes in a dynamic manner.

Rich Set of Streaming Operators: NebulaStream provides all traditional streaming ETL, including temporal aggregations and joins in event-time with two window definitions (tumbling and sliding windows). Furthermore, it provides Complex Event Processing operators, such as the temporal sequence operator, as well as spatial operators.

Customizable Monitoring: NebulaStream provides an extensible monitoring stack that enables the management of performance metrics for large IoT topologies.

With these features, NebulaStream provides much of the required functionality of the ELEGANT project out of the box. We therefore have selected NebulaStream as the core software on which we build the ELEGANT Orchestrator and the ELEGANT Elastic Runtime.

For more information about NebulaStream, we recommend the keynote presented by Prof. Dr. Volker Markl at the BiDEDE 2022 workshop, who leads the DIMA research group.