Separation of Storage and Compute
Separation of Storage and Compute is an architectural design principle in data systems, particularly prevalent in modern cloud-native architectures. It refers to the decoupling of the resources and components responsible for storing data from those responsible for processing and querying that data. Instead of a monolithic system where storage and compute are tightly integrated and scaled together, they are treated as independent layers that can be scaled, managed, and optimized separately.
RisingWave is designed with this principle at its core, especially concerning its state management for stream processing.
Core Concepts
- Independent Scaling: Compute resources (CPUs, memory for processing tasks) can be scaled up or down based on the computational workload (e.g., query complexity, data ingestion rate) without affecting storage capacity. Conversely, storage resources (disk space, I/O bandwidth for data persistence) can be scaled independently based on data volume and retention needs.
- Resource Optimization: Allows for choosing the most cost-effective and performant resources for each layer. For example, using elastic compute instances for processing and durable, low-cost object storage (like AWS S3, Google Cloud Storage) for data.
- Elasticity: Enables rapid scaling of compute resources to handle peak loads, and then scaling down to save costs during off-peak times, without the need to re-provision or migrate storage.
- Fault Isolation: Failures in the compute layer are less likely to affect the durability and availability of data stored in the separated storage layer, and vice-versa.
- Flexibility: Different compute engines or versions can potentially access the same shared storage layer, offering flexibility in choosing processing tools.
Benefits in Streaming Systems like RisingWave
- Cost Efficiency: Pay only for the compute needed for the current processing load, and for the storage needed for the state. Cloud object storage is significantly cheaper for large volumes of data compared to local disk storage that scales with compute nodes.
- Scalability for State: Stream processing can be highly stateful (e.g., for complex joins or long-window aggregations). Separating state storage allows RisingWave to manage vast amounts of state on cost-effective cloud storage (via its Hummock state store) without requiring equally massive compute clusters.
- Enhanced Durability & Availability: Leveraging durable cloud object storage for state means that even if compute nodes fail, the critical state is preserved and can be recovered.
- Faster Recovery: Compute nodes can be stateless or have minimal local state, making it faster to replace failed nodes and resume processing from checkpoints stored in the remote state store.
- Operational Simplicity: Simplifies cluster management and upgrades, as compute and storage layers can be managed somewhat independently.
How RisingWave Implements This
- Compute Layer: Consists of Compute Nodes responsible for executing the stream processing dataflows (operators like joins, filters, aggregations).
- Storage Layer (State Store): RisingWave's Hummock state store is specifically designed for this separation. It uses cloud object storage (e.g., AWS S3) as its primary durable persistence medium for streaming state. Meta-information and recent state changes might be cached or managed locally on Meta Nodes or Compute Nodes for performance, but the source of truth for persistent state resides in object storage.
- Checkpointing: State is periodically checkpointed from the compute layer to the Hummock state store on object storage, ensuring durability.
This architecture is fundamental to RisingWave's ability to offer a scalable, resilient, and cost-effective solution for stateful stream processing in the cloud.
Related Glossary Terms