Our real-time dashboard was once like a simple coffee shop counter—it showed only how many cups were sold today (orders). Each morning, the owner would glance at the numbers and feel confident that business was going well.
Then the focus shifted to what was actually being sold—lattes or Americanos, large or small.
That’s the challenge we face: moving from basic order counts to deeper analysis based on detailed order data (orders_detail).
Note: All code in this article is pseudo-code for educational and conceptual illustration. Focus on understanding the architecture and logic, not on running the exact code. To keep today’s focus clear, we’ll simplify some implementation details from previous days and concentrate on the checkpoint mechanism itself.
The Three Layers of the Lambda Architecture
You may remember the Lambda architecture we discussed earlier:
Batch Layer: Processes historical data in full to ensure accuracy
Speed Layer: Handles real-time data streams to ensure low latency
Serving Layer: Merges results from both layers and serves queries
So far, our focus has been on how the Speed Layer handles real-time data efficiently. Now the question is: when the owner asks for more complex analytics, where should JOINs happen?
A Design Decision: Keeping the Speed Layer Pure
Think of a highway toll booth. If every car had to stop for a detailed inspection and paperwork, traffic would grind to a halt. The Speed Layer is like that toll booth—its job must stay simple.
Key principle: no JOINs in the Speed Layer.
The Speed Layer only does three things:
simple cleansing → fast writes → done.
When the owner opens the dashboard and clicks “View order details,” that’s when the magic happens—in the Serving Layer:
SELECT
o.order_id,
o.customer_id,
o.order_time,
o.total_amount,
od.product_name,
od.quantity,
od.unit_price,
SUM(od.quantity * od.unit_price) AS line_total
FROM
orders AS o
JOIN
orders_detail AS od ON o.order_id = od.order_id -- JOIN happens here
GROUP BY
o.order_id, od.product_name
ORDER BY
o.order_time DESC;
The Real Challenge: Complex OLAP Logic
This architecture looks simple on the surface, but the real test is this: how strong are your SQL skills?
As the owner’s questions grow more sophisticated, the road quickly becomes rocky.
Scenario Upgrade: Advanced Business Questions
The owner says:
“I want to see sales trends for each product at different times of day—and segment them by customer type.”
Breaking Down a Complex JOIN Query
-- Complex multi-table JOIN + analytical query
WITH hourly_product_sales AS (
SELECT
od.product_name,
DATE_FORMAT(o.order_time, '%H') AS hour_of_day,
c.customer_type,
SUM(od.quantity) AS qty_sold,
SUM(od.quantity * od.unit_price) AS revenue,
COUNT(DISTINCT o.customer_id) AS unique_customers
FROM
orders o
JOIN
orders_detail od ON o.order_id = od.order_id
JOIN
customers c ON o.customer_id = c.customer_id
WHERE
o.order_time >= DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY
od.product_name, hour_of_day, c.customer_type
),
product_rankings AS (
SELECT
product_name,
hour_of_day,
customer_type,
qty_sold,
revenue,
unique_customers,
ROW_NUMBER() OVER (
PARTITION BY hour_of_day, customer_type
ORDER BY revenue DESC
) AS sales_rank
FROM hourly_product_sales
)
SELECT
pr.product_name,
pr.hour_of_day,
pr.customer_type,
pr.revenue,
pr.sales_rank,
ROUND(
pr.revenue / SUM(pr.revenue) OVER (
PARTITION BY pr.hour_of_day, pr.customer_type
) * 100,
2
) AS revenue_percentage
FROM
product_rankings pr
WHERE
pr.sales_rank <= 5
ORDER BY
pr.hour_of_day,
pr.customer_type,
pr.sales_rank;
And this is just the appetizer. Real-world OLAP reports often involve:
Deeply nested subqueries
Complex window functions
Crosstab and pivot analysis
Period-over-period comparisons and trend analysis
If you choose this architecture, you’ll need to master:
Performance Tuning
Table partitioning strategies
Composite indexing
Query execution plan analysis
Summary: A Common Choice for Early-Stage Teams
This story reflects a common approach many teams take when implementing a Lambda architecture: keep the Speed Layer simple, and push complex logic down to the database layer.
This “store first, JOIN later” strategy is a double-edged sword.
Pros
A minimal Speed Layer with fast processing
Lower development overhead and faster time to market
Independent scaling and operation of each layer
Real Challenges
Extremely high demands on SQL expertise
Difficulty building complex OLAP reports
Deep performance-tuning knowledge required
The owner now has access to richer analytical insights, while engineers handle the complexity behind the scenes with increasingly sophisticated SQL.
Day 10 Preview: Moving JOINs into Streaming
As you tune performance, you’ll quickly discover that JOINs are among the most expensive operations in a database.
In the data warehousing world, a common solution is the wide table: instead of joining at query time, you pre-flatten all required fields into a single table.
This idea isn’t limited to offline warehouses—it also applies to real-time processing. If we can join multiple tables during streaming and produce a real-time wide table, downstream queries become faster and lighter, significantly reducing OLAP pressure.
In the next article, we’ll explore how to design real-time wide tables in streaming systems, how they reduce JOIN overhead in the Serving Layer, and the trade-offs involved.

