How Siemens Replaced Nightly Batch Jobs with Real-Time Streaming Using RisingWave

The Challenge: Batch Processing Holding Back a Global Manufacturer

Siemens is one of the world's largest technology companies, operating across industry, infrastructure, mobility, and healthcare. Across its global operations, thousands of field devices and sensors generate massive volumes of data every second. But until recently, that data was trapped in a batch-processing bottleneck.

The previous system relied on nightly batch jobs to synchronize and clean sensor data. Complex ETL scripts ran on dedicated scheduling clusters, pulling raw data through multi-step pipelines before it was available for business use. The result was a familiar set of problems:

Hours of latency between data generation and data availability
High maintenance overhead from complex script stacks and scheduling logic
Escalating infrastructure costs for batch processing clusters and intermediate storage
Limited business agility, with teams unable to make decisions based on current data

Siemens needed a fundamental shift: from batch to real-time, from complexity to simplicity.

The Solution: A Streaming Medallion Architecture with RisingWave

Working with Hivemind Technologies, a Germany-based data platform consultancy, Siemens adopted a streaming Medallion architecture powered by RisingWave.

The traditional Medallion architecture (Bronze, Silver, Gold layers) is a well-established pattern for organizing data pipelines. But most implementations rely on offline batch processing with tools like Apache Spark, leading to the same latency and complexity problems Siemens was facing.

Hivemind's innovation was making the entire Medallion architecture stream-native. Instead of scheduling batch jobs to move data between layers, every transformation happens continuously in real time.

Why RisingWave?

During the evaluation phase, Hivemind compared several real-time processing solutions, including Apache Flink, Kafka Streams, and Spark Structured Streaming. Many of these tools proved too heavyweight, overly complex, or came with steep learning curves.

RisingWave stood out for several reasons:

PostgreSQL-compatible SQL: Engineers write standard SQL instead of learning new frameworks or programming models
Native materialized views: Continuously updated views that power the Gold layer without batch aggregation
Built-in stream processing and storage: No need to stitch together separate compute and storage systems
Multi-source input and multi-target output: Data flows in from Kafka, MQTT, or CDC sources and delivers to dashboards, data lakes, or downstream systems

Architecture: Three Layers, All Streaming

Bronze Layer: Raw Data Ingestion

Thousands of Siemens field devices and sensors send raw telemetry data in real time. This data arrives in inconsistent formats, with varying field names, units, and schemas across different device manufacturers.

RisingWave ingests this raw data directly as a streaming Bronze layer. No disk landing, no pre-processing. Every data point is immediately available for downstream processing while retaining full fidelity for traceability and auditing.

-- Bronze: Ingest raw sensor data from Kafka
CREATE SOURCE sensor_raw FROM KAFKA (
    topic = 'siemens.sensors.raw',
    brokers = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;

Silver Layer: Real-Time Cleaning and Standardization

The Silver layer transforms raw, messy data into clean, standardized information. Previously, this required complex Spark batch jobs triggered on schedules. With RisingWave, the entire cleaning process runs continuously using SQL.

Engineers write declarative SQL to:

Unify field names (e.g., temp_c, temperature_f become a single temperature_celsius)
Convert units to consistent standards
Filter out invalid data in real time
Enrich records with reference data from dimension tables

-- Silver: Clean and standardize sensor readings
CREATE MATERIALIZED VIEW sensor_cleaned AS
SELECT
    device_id,
    CASE
        WHEN unit = 'F' THEN (raw_value - 32) * 5.0 / 9.0
        ELSE raw_value
    END AS temperature_celsius,
    COALESCE(location_name, 'Unknown') AS location,
    event_time
FROM sensor_raw
WHERE raw_value IS NOT NULL
  AND raw_value BETWEEN -50 AND 200;

The key advantage: cleaning rules are readable, maintainable, and can be updated without redeploying pipelines. Engineers focus on business logic, not scheduling logic.

Gold Layer: Real-Time Metrics and Insights

The Gold layer generates aggregated metrics and business-ready insights. RisingWave's materialized views compute these continuously, eliminating the need for batch aggregation or intermediate caching.

-- Gold: Real-time device health metrics
CREATE MATERIALIZED VIEW device_health_dashboard AS
SELECT
    location,
    COUNT(DISTINCT device_id) AS active_devices,
    AVG(temperature_celsius) AS avg_temperature,
    MAX(temperature_celsius) AS max_temperature,
    COUNT(*) FILTER (WHERE temperature_celsius > 85) AS overheat_alerts,
    window_start
FROM TUMBLE(sensor_cleaned, event_time, INTERVAL '5 MINUTES')
GROUP BY location, window_start;

These materialized views serve multiple downstream consumers simultaneously:

Dashboards and BI tools query the views directly for real-time visualization
Iceberg tables receive continuous syncs for historical analysis
Kafka topics deliver results to downstream systems
Alerting systems monitor threshold breaches in real time

The Gold layer is a result of streaming, not a product of batching.

Results: Immediate, Measurable Impact

The migration from batch to streaming delivered significant improvements across every dimension:

Metric	Before (Batch)	After (RisingWave)	Improvement
Data latency	Hours	Seconds	~1000x faster
Cleaning logic	Complex script stacks	SQL rules	Dramatically simpler
Infrastructure costs	Dedicated scheduling clusters + intermediate storage	Single streaming platform	>50% reduction
Data availability	Next-day reports	Real-time views	Immediate decisions
Maintenance effort	High (scripts, schedulers, landing zones)	Low (SQL, no scheduling)	Major reduction

Key outcomes in detail:

Data latency dropped from hours to seconds. Business teams no longer wait until the next morning to see yesterday's data. Operational decisions happen in real time based on live sensor readings.

Cleaning logic went from complex scripts to SQL. What previously required dedicated engineering teams to maintain now lives in readable, version-controlled SQL statements. New cleaning rules can be deployed in minutes.

Infrastructure costs fell by over 50%. The dedicated scheduling clusters, intermediate data landing layers, and batch processing infrastructure were eliminated entirely. RisingWave handles ingestion, processing, and serving in a single platform.

Business departments gained direct access to real-time data. Instead of requesting reports from the data team, business users query materialized views directly, enabling faster and more informed decision-making.

Why This Matters for Manufacturing

Siemens' transformation illustrates a broader trend in manufacturing and industrial IoT. As the volume of sensor and device data grows, batch processing architectures become increasingly untenable. The gap between "data collected" and "data utilized" widens.

A streaming Medallion architecture closes that gap. By processing data continuously rather than in scheduled batches, organizations can:

Detect equipment anomalies before they cause downtime
Monitor production quality as it happens, not after the fact
Optimize energy consumption and resource allocation in real time
Feed ML models with fresh data for predictive maintenance

The architecture also offers high flexibility and portability. It runs in the cloud or in on-premises data centers, fully aligning with enterprise requirements for security, compliance, and operational control.

Getting Started

If your organization is dealing with similar challenges, here's how to get started with RisingWave:

Try RisingWave Cloud: Get started with a fully managed streaming platform in minutes
Read the documentation: Explore how to build streaming pipelines, materialized views, and data integrations
Download the open-source version: Deploy RisingWave on your own infrastructure
Join the community: Connect with other engineers building real-time data systems

The shift from batch to streaming is not just a technology upgrade. It is a fundamental change in how enterprises think about data: from something you collect and process later to something that flows, transforms, and delivers value continuously.