From Lambda to Kappa: The Evolution of Stream Processing Systems

From Lambda to Kappa: The Evolution of Stream Processing Systems

Remember last time we asked: “Why not just move all the logic to streaming? Wouldn’t that be faster?”

The Dual-Track Problem of Lambda Architecture

In a Lambda architecture, our data pipelines follow a dual-track approach:

Lambda Architecture:

Raw Data ┬ Batch Layer ─ Batch Views ────┐
         │                               ├ Serving Layer ─ Query Results
         └ Speed Layer ─ Real-time Views ┘

Responsibilities of each layer

  • Batch Layer: Handles batch processing, ensures final accuracy, and recomputes full datasets overnight

  • Speed Layer: Handles real-time processing, provides up-to-date results

  • Serving Layer: Merges outputs from both layers for queries

The cost

  • Logic must be maintained twice (once for batch, once for streaming)

  • Two sets of infrastructure are required (Hadoop/Spark + Flink/Kafka)

  • Any change in requirements forces engineers to update both layers

The Simplified Idea of Kappa Architecture

The idea behind Kappa architecture is straightforward: “Remove the batch layer and handle everything with streaming.”

Kappa Architecture:

Raw Data ── Event Log (Kafka) ── Stream Processing ── Results

Benefits: Only one set of logic, one system, drastically reducing maintenance overhead.

An Upgraded Coffee Shop Example

Imagine moving all coffee shop data processing logic to streaming:

Kappa Architecture applied to a Coffee Shop:

Order Events ── Kafka ── Stream Processing
                              │
                              ▼
                        ┌────────────┐
                        │ • JOIN     │
                        │ • GROUP BY │
                        │ • ORDER BY │
                        │ • TopN     │
                        └────────────┘
                              │
                              ▼
                        Result Tables
                      (Pre-computed Results)
                              │
                              ▼
                          Dashboard

Processing flow:

  1. Kafka ingests order events

  2. Stream processing handles them in real time

  3. Results are written directly to Result Tables

  4. Dashboards query pre-computed results — no need to JOIN, GROUP BY, or ORDER BY, latency < 50ms

Impact: Even at peak hours with hundreds of queries per second, the database is never overloaded because everything reads from a single result table.

The Core Challenge of Kappa: Stateful Operations

Think about it: “How do we handle JOINs, GROUP BYs, or sliding window computations?”

Exactly — the key step to adopting Kappa is enabling stateful operations.

Why is state needed?

  • JOINs: Need to store recent events to match with arriving events

  • GROUP BY / Aggregations: Must maintain intermediate aggregates

  • Window Functions: Need to track all events within the window

Stateful Operations Challenge:

Event1 ──┐
Event2 ──┼── [State Storage] ── Computed Results
Event3 ──┘        ↑
                  │
            Must remember:
            • Past events
            • Intermediate results
            • Window boundaries

The key challenge: Without state management, we cannot fully move multi-table JOINs or aggregation logic from databases into stream processing.

Giving Stream Processing “Memory”

From Day 12 onwards, we’ll dive into what stateful operations are and introduce a simple State Store in the Simple Streaming framework. This allows the system not just to blindly process events, but to remember what happened — just like a database.

This is the crucial step that brings our streaming pipeline fully into the world of Kappa architecture.

The Modern Backbone for
Real-Time Data and AI
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.