Batch Processing is a method of running data processing jobs in "batches," where a large volume of data is collected over a period, stored, and then processed all at once as a group. This contrasts with Stream Processing, which processes data continuously as it arrives. Batch processing is typically used for tasks that can tolerate some latency, involve large datasets, and require comprehensive, often complex, computations.
Feature | Batch Processing | Stream Processing |
---|---|---|
Data Scope | Large, bounded datasets | Unbounded, continuous data streams |
Latency | High (minutes, hours, days) | Low (milliseconds, seconds) |
Data Model | Data at rest | Data in motion |
Computation | On entire dataset or large chunks | On individual events or micro-batches |
Result Update | Periodic (after batch completion) | Continuous or near real-time |
Primary Goal | Throughput, comprehensive analysis | Low latency, immediate insights |
While stream processing has gained prominence for real-time needs, batch processing remains essential for many use cases.
RisingWave, while primarily a Streaming Database, can interact with systems that are populated or managed by batch processes (e.g., reading from tables in a data warehouse that are updated nightly) or can sink data that might be consumed by downstream batch jobs. However, its core strength lies in incremental, low-latency processing of streaming data.