The Challenge: Batch Processing Holding Back a Global Manufacturer
Siemens is one of the world's largest technology companies, operating across industry, infrastructure, mobility, and healthcare. Across its global operations, thousands of field devices and sensors generate massive volumes of data every second. But until recently, that data was trapped in a batch-processing bottleneck.
The previous system relied on nightly batch jobs to synchronize and clean sensor data. Complex ETL scripts ran on dedicated scheduling clusters, pulling raw data through multi-step pipelines before it was available for business use. The result was a familiar set of problems:
- Hours of latency between data generation and data availability
- High maintenance overhead from complex script stacks and scheduling logic
- Escalating infrastructure costs for batch processing clusters and intermediate storage
- Limited business agility, with teams unable to make decisions based on current data
Siemens needed a fundamental shift: from batch to real-time, from complexity to simplicity.
The Solution: A Streaming Medallion Architecture with RisingWave
Working with Hivemind Technologies, a Germany-based data platform consultancy, Siemens adopted a streaming Medallion architecture powered by RisingWave.
The traditional Medallion architecture (Bronze, Silver, Gold layers) is a well-established pattern for organizing data pipelines. But most implementations rely on offline batch processing with tools like Apache Spark, leading to the same latency and complexity problems Siemens was facing.
Hivemind's innovation was making the entire Medallion architecture stream-native. Instead of scheduling batch jobs to move data between layers, every transformation happens continuously in real time.
Why RisingWave?
During the evaluation phase, Hivemind compared several real-time processing solutions, including Apache Flink, Kafka Streams, and Spark Structured Streaming. Many of these tools proved too heavyweight, overly complex, or came with steep learning curves.
RisingWave stood out for several reasons:
- PostgreSQL-compatible SQL: Engineers write standard SQL instead of learning new frameworks or programming models
- Native materialized views: Continuously updated views that power the Gold layer without batch aggregation
- Built-in stream processing and storage: No need to stitch together separate compute and storage systems
- Multi-source input and multi-target output: Data flows in from Kafka, MQTT, or CDC sources and delivers to dashboards, data lakes, or downstream systems
Architecture: Three Layers, All Streaming
Bronze Layer: Raw Data Ingestion
Thousands of Siemens field devices and sensors send raw telemetry data in real time. This data arrives in inconsistent formats, with varying field names, units, and schemas across different device manufacturers.
RisingWave ingests this raw data directly as a streaming Bronze layer. No disk landing, no pre-processing. Every data point is immediately available for downstream processing while retaining full fidelity for traceability and auditing.
-- Bronze: Ingest raw sensor data from Kafka
CREATE SOURCE sensor_raw FROM KAFKA (
topic = 'siemens.sensors.raw',
brokers = 'kafka:9092'
) FORMAT PLAIN ENCODE JSON;
Silver Layer: Real-Time Cleaning and Standardization
The Silver layer transforms raw, messy data into clean, standardized information. Previously, this required complex Spark batch jobs triggered on schedules. With RisingWave, the entire cleaning process runs continuously using SQL.
Engineers write declarative SQL to:
- Unify field names (e.g.,
temp_c,temperature_fbecome a singletemperature_celsius) - Convert units to consistent standards
- Filter out invalid data in real time
- Enrich records with reference data from dimension tables
-- Silver: Clean and standardize sensor readings
CREATE MATERIALIZED VIEW sensor_cleaned AS
SELECT
device_id,
CASE
WHEN unit = 'F' THEN (raw_value - 32) * 5.0 / 9.0
ELSE raw_value
END AS temperature_celsius,
COALESCE(location_name, 'Unknown') AS location,
event_time
FROM sensor_raw
WHERE raw_value IS NOT NULL
AND raw_value BETWEEN -50 AND 200;
The key advantage: cleaning rules are readable, maintainable, and can be updated without redeploying pipelines. Engineers focus on business logic, not scheduling logic.
Gold Layer: Real-Time Metrics and Insights
The Gold layer generates aggregated metrics and business-ready insights. RisingWave's materialized views compute these continuously, eliminating the need for batch aggregation or intermediate caching.
-- Gold: Real-time device health metrics
CREATE MATERIALIZED VIEW device_health_dashboard AS
SELECT
location,
COUNT(DISTINCT device_id) AS active_devices,
AVG(temperature_celsius) AS avg_temperature,
MAX(temperature_celsius) AS max_temperature,
COUNT(*) FILTER (WHERE temperature_celsius > 85) AS overheat_alerts,
window_start
FROM TUMBLE(sensor_cleaned, event_time, INTERVAL '5 MINUTES')
GROUP BY location, window_start;
These materialized views serve multiple downstream consumers simultaneously:
- Dashboards and BI tools query the views directly for real-time visualization
- Iceberg tables receive continuous syncs for historical analysis
- Kafka topics deliver results to downstream systems
- Alerting systems monitor threshold breaches in real time
The Gold layer is a result of streaming, not a product of batching.
Results: Immediate, Measurable Impact
The migration from batch to streaming delivered significant improvements across every dimension:
| Metric | Before (Batch) | After (RisingWave) | Improvement |
| Data latency | Hours | Seconds | ~1000x faster |
| Cleaning logic | Complex script stacks | SQL rules | Dramatically simpler |
| Infrastructure costs | Dedicated scheduling clusters + intermediate storage | Single streaming platform | >50% reduction |
| Data availability | Next-day reports | Real-time views | Immediate decisions |
| Maintenance effort | High (scripts, schedulers, landing zones) | Low (SQL, no scheduling) | Major reduction |
Key outcomes in detail:
Data latency dropped from hours to seconds. Business teams no longer wait until the next morning to see yesterday's data. Operational decisions happen in real time based on live sensor readings.
Cleaning logic went from complex scripts to SQL. What previously required dedicated engineering teams to maintain now lives in readable, version-controlled SQL statements. New cleaning rules can be deployed in minutes.
Infrastructure costs fell by over 50%. The dedicated scheduling clusters, intermediate data landing layers, and batch processing infrastructure were eliminated entirely. RisingWave handles ingestion, processing, and serving in a single platform.
Business departments gained direct access to real-time data. Instead of requesting reports from the data team, business users query materialized views directly, enabling faster and more informed decision-making.
Why This Matters for Manufacturing
Siemens' transformation illustrates a broader trend in manufacturing and industrial IoT. As the volume of sensor and device data grows, batch processing architectures become increasingly untenable. The gap between "data collected" and "data utilized" widens.
A streaming Medallion architecture closes that gap. By processing data continuously rather than in scheduled batches, organizations can:
- Detect equipment anomalies before they cause downtime
- Monitor production quality as it happens, not after the fact
- Optimize energy consumption and resource allocation in real time
- Feed ML models with fresh data for predictive maintenance
The architecture also offers high flexibility and portability. It runs in the cloud or in on-premises data centers, fully aligning with enterprise requirements for security, compliance, and operational control.
Getting Started
If your organization is dealing with similar challenges, here's how to get started with RisingWave:
- Try RisingWave Cloud: Get started with a fully managed streaming platform in minutes
- Read the documentation: Explore how to build streaming pipelines, materialized views, and data integrations
- Download the open-source version: Deploy RisingWave on your own infrastructure
- Join the community: Connect with other engineers building real-time data systems
The shift from batch to streaming is not just a technology upgrade. It is a fundamental change in how enterprises think about data: from something you collect and process later to something that flows, transforms, and delivers value continuously.

