Imagine you’re running a coffee shop, and there’s a camera at the entrance, continuously capturing every customer who walks in. These snapshots are your event stream — one after another, playing endlessly.
But here’s the challenge, if you want to know:
How many times has this customer visited today?
How much has he spent this week?
What drink did he order five minutes after being seated?
You can’t just look at the current snapshot; you need to remember what happened before. This ability to retain history is what we call a stateful operation.
What is State?
In a streaming system, state refers to the historical information that the system needs to keep when processing new events.
Stateful processing relies on this historical information to compute results.
Stateless vs Stateful Processing:
Stateless:
Event1 ─→ [Process] ─→ Result1
Event2 ─→ [Process] ─→ Result2 (processed independently, no memory of Event1)
Event3 ─→ [Process] ─→ Result3
Stateful:
Event1 ─→ [Process + State] ─→ Result1
↓
Event2 ─→ [Process + State] ─→ Result2 (remembers Event1)
↓
Event3 ─→ [Process + State] ─→ Result3 (remembers Event1 and Event2)
Examples of Stateful Operations
Common stateful operations include:
1. JOIN
Keeps part of Stream A and Stream B in memory until a match is found, then outputs the result.
Stream JOIN Example:
Orders Stream: [Order1] ──┐
├── [JOIN State] ── [Complete Order]
Payments Stream: [Payment1]─┘
↑
Need to store:
• Unmatched orders
• Unmatched payments
2. GROUP BY / Aggregation
Maintains cumulative statistics for each group.
Aggregation Example:
User Events ─→ [Aggregation State] ─→ User Statistics
↑
Store per user:
• Visit count
• Total spending
• Last activity
3. Window Aggregation
Computes metrics like average, max, or sum over a time window.
Window Aggregation:
Events ─→ [Window State] ─→ Window Results
↑
Store in window:
• All events in the timeframe
• Intermediate calculations
Ways to Store State
In a streaming system, state is usually stored in one of two ways:
1. Local State Store
Local State Store:
Stream Processor ──┬── Processing Logic
└── Local RocksDB
↓
Checkpoint to HDFS
Characteristics:
Example: RocksDB, co-located with the processing node
Low latency, ideal for high-frequency queries
Requires additional checkpoint for fault tolerance
2. External State Store
External State Store:
Stream Processor ── Network ── Redis/Cassandra/TiKV
↑
Shared across nodes
Characteristics:
Examples: Redis, Cassandra, TiKV
Easier to scale horizontally and share state across nodes
Higher latency, network cost must be considered
Industry Status
Mainstream streaming systems like Flink primarily use Local State Stores:
| Approach | Latency | Scalability | Shareability | Fault Tolerance Complexity | Mainstream Use |
| Local Store | Low | Medium | Difficult | High | Flink |
| External Store | Medium | High | Easy | Low | Special cases |
Why Local Store dominates:
Performance first: Streaming is extremely latency-sensitive; microsecond-level access is a huge advantage
Mature technology: RocksDB + Checkpointing is very stable
Complete ecosystem: Frameworks like Flink have built-in support, ready to use out of the box
Challenges with Local State Store
Simply storing data locally isn’t enough; you also need to address:
Persistence: State must survive node restarts
Consistency: State must remain correct in a distributed environment
Low-latency access: State must be readable/writable immediately when processing new events
How to achieve these three requirements?
Persistence:
RocksDB (Memory + Disk) ─── Checkpoint ──→ HDFS/S3
(periodic snapshots)
↑ │
└────── Recovery ←──────────────────┘
(on node restart)
Periodic checkpoints store RocksDB snapshots in a distributed file system
RocksDB keeps state in both memory and disk; on node failure, the latest checkpoint is loaded to recover state
Consistency:
Event Processing ──→ State Update ──→ Checkpoint Barrier
│ │
└── Ensures event processing order ──────────┘
Checkpoint barriers ensure all nodes create snapshots at the same point in time
Guarantees a consistent view of state across distributed nodes
Low-latency access:
Hot Data (Memory) ←── RocksDB ──→ Cold Data (SSD)
↑ ↑
microsecond access millisecond access
RocksDB caches hot data in memory, cold data on high-speed SSDs
Local access avoids network latency
Summary
State storage is the key technology that lets streaming evolve from “dumb forwarding” to intelligent computation.
Choices:
Local State Store (RocksDB + Checkpoint) is now the industry standard
Sacrifices some scalability complexity for microsecond-level performance
Mechanisms:
RocksDB: layered storage (memory + disk)
Checkpoint: ensures distributed consistency and fault recovery
Local access: avoids network latency, meets real-time processing needs
With state storage, streaming systems can finally support complex operations like JOINs, aggregations, and windowing.
Day 13 Preview: Moving into Streaming JOINs
On Day 13, we’ll stop treating JOINs as a “query the database once” patch and dive into Streaming JOINs. By implementing Join, GroupBy, and Window, we’ll better understand how state actually works in practice and gain a deeper understanding of stateful operations.
We’ll focus on in-memory state management to show how streaming can “remember” past events for stateful computation, without relying on RocksDB for the example implementation.

