Remember last time we asked: “Why not just move all the logic to streaming? Wouldn’t that be faster?”
The Dual-Track Problem of Lambda Architecture
In a Lambda architecture, our data pipelines follow a dual-track approach:
Lambda Architecture:
Raw Data ┬ Batch Layer ─ Batch Views ────┐
│ ├ Serving Layer ─ Query Results
└ Speed Layer ─ Real-time Views ┘
Responsibilities of each layer
Batch Layer: Handles batch processing, ensures final accuracy, and recomputes full datasets overnight
Speed Layer: Handles real-time processing, provides up-to-date results
Serving Layer: Merges outputs from both layers for queries
The cost
Logic must be maintained twice (once for batch, once for streaming)
Two sets of infrastructure are required (Hadoop/Spark + Flink/Kafka)
Any change in requirements forces engineers to update both layers
The Simplified Idea of Kappa Architecture
The idea behind Kappa architecture is straightforward: “Remove the batch layer and handle everything with streaming.”
Kappa Architecture:
Raw Data ── Event Log (Kafka) ── Stream Processing ── Results
Benefits: Only one set of logic, one system, drastically reducing maintenance overhead.
An Upgraded Coffee Shop Example
Imagine moving all coffee shop data processing logic to streaming:
Kappa Architecture applied to a Coffee Shop:
Order Events ── Kafka ── Stream Processing
│
▼
┌────────────┐
│ • JOIN │
│ • GROUP BY │
│ • ORDER BY │
│ • TopN │
└────────────┘
│
▼
Result Tables
(Pre-computed Results)
│
▼
Dashboard
Processing flow:
Kafka ingests order events
Stream processing handles them in real time
Results are written directly to Result Tables
Dashboards query pre-computed results — no need to JOIN, GROUP BY, or ORDER BY, latency < 50ms
Impact: Even at peak hours with hundreds of queries per second, the database is never overloaded because everything reads from a single result table.
The Core Challenge of Kappa: Stateful Operations
Think about it: “How do we handle JOINs, GROUP BYs, or sliding window computations?”
Exactly — the key step to adopting Kappa is enabling stateful operations.
Why is state needed?
JOINs: Need to store recent events to match with arriving events
GROUP BY / Aggregations: Must maintain intermediate aggregates
Window Functions: Need to track all events within the window
Stateful Operations Challenge:
Event1 ──┐
Event2 ──┼── [State Storage] ── Computed Results
Event3 ──┘ ↑
│
Must remember:
• Past events
• Intermediate results
• Window boundaries
The key challenge: Without state management, we cannot fully move multi-table JOINs or aggregation logic from databases into stream processing.
Giving Stream Processing “Memory”
From Day 12 onwards, we’ll dive into what stateful operations are and introduce a simple State Store in the Simple Streaming framework. This allows the system not just to blindly process events, but to remember what happened — just like a database.
This is the crucial step that brings our streaming pipeline fully into the world of Kappa architecture.

