Streaming Joins Explained: Temporal, Interval, Window, and ASOF Joins in SQL

Streaming Joins Explained: Temporal, Interval, Window, and ASOF Joins in SQL

·

14 min read

Joining two tables in a batch database is straightforward. Both tables sit on disk, the query planner picks a join strategy, and you get your result. But what happens when both sides of the join are continuously changing streams?

Streaming joins introduce challenges that do not exist in the batch world. State grows without bound unless you constrain it. Late-arriving data can invalidate previously emitted results. And the relationship between time on the left side and time on the right side determines whether a join even makes sense. Pick the wrong join type and you get either missing results, exploding state, or both.

This post walks through the four streaming join types available in RisingWave: regular joins, temporal joins, interval joins, window joins, and the recently added ASOF joins. Every SQL example was tested against RisingWave 2.8.0 and includes actual output. By the end, you will know exactly which join to reach for in each scenario.

Why Are Streaming Joins Different from Batch Joins?

A streaming join is a join that operates continuously over unbounded data. Instead of running once against a fixed dataset, it produces new output every time a row arrives on either side.

In a batch database, the engine scans both tables, matches rows, and returns a complete result. In a streaming system, the engine must maintain internal state for previously seen rows because a matching row from the other side might arrive seconds, minutes, or hours later. This creates three problems that batch joins never face:

  • State growth: A regular inner join between two unbounded streams must remember every row from both sides indefinitely. In production, this means state grows forever.
  • Late data: If a row arrives after its matching window has closed, the join either drops it or produces incorrect late results.
  • Ordering: Events from different streams arrive in unpredictable order. The join engine must handle out-of-order data without corrupting results.

Each streaming join type solves these problems differently by making trade-offs between completeness, state size, and latency.

How Do Regular Joins Work on Streams?

A regular join in RisingWave uses the same syntax as PostgreSQL. It maintains full state for both sides of the join and produces correct, continuously updated results as data arrives or changes on either side.

This is the right choice when you are joining a stream against a relatively stable dimension table, or when both sides are bounded enough that state growth is acceptable.

Example: enriching orders with product details

CREATE TABLE orders (
    order_id INT,
    product_id INT,
    customer_id INT,
    amount DECIMAL,
    order_time TIMESTAMPTZ
);

CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR,
    category VARCHAR,
    price DECIMAL
);

INSERT INTO products VALUES
    (1, 'Wireless Mouse', 'Electronics', 29.99),
    (2, 'USB-C Hub', 'Electronics', 49.99),
    (3, 'Desk Lamp', 'Home Office', 39.99),
    (4, 'Keyboard', 'Electronics', 79.99),
    (5, 'Monitor Stand', 'Home Office', 59.99);

INSERT INTO orders VALUES
    (101, 1, 1001, 29.99, '2025-03-28 10:00:00+00'),
    (102, 2, 1002, 49.99, '2025-03-28 10:05:00+00'),
    (103, 3, 1001, 39.99, '2025-03-28 10:10:00+00'),
    (104, 1, 1003, 29.99, '2025-03-28 10:15:00+00'),
    (105, 4, 1002, 79.99, '2025-03-28 10:20:00+00');

CREATE MATERIALIZED VIEW enriched_orders AS
SELECT
    o.order_id,
    o.customer_id,
    p.product_name,
    p.category,
    o.amount,
    o.order_time
FROM orders o
JOIN products p ON o.product_id = p.product_id;

Result:

 order_id | customer_id |  product_name  |  category   | amount |        order_time
----------+-------------+----------------+-------------+--------+---------------------------
      101 |        1001 | Wireless Mouse | Electronics |  29.99 | 2025-03-28 10:00:00+00:00
      102 |        1002 | USB-C Hub      | Electronics |  49.99 | 2025-03-28 10:05:00+00:00
      103 |        1001 | Desk Lamp      | Home Office |  39.99 | 2025-03-28 10:10:00+00:00
      104 |        1003 | Wireless Mouse | Electronics |  29.99 | 2025-03-28 10:15:00+00:00
      105 |        1002 | Keyboard       | Electronics |  79.99 | 2025-03-28 10:20:00+00:00

This materialized view updates automatically. If you insert a new order or update a product name, the view reflects the change within seconds.

When to use regular joins

Regular joins are ideal when at least one side of the join is a slowly changing dimension (like a products table or a users table). They are also the only option when you need full outer join semantics across streams. The trade-off is that RisingWave must maintain state for both sides, so they work best when the dimension side is bounded.

What Is a Temporal Join and When Should You Use One?

A temporal join (also called a lookup join) joins a stream against the current snapshot of a reference table. Instead of maintaining full state for both sides, it looks up the right-side table at processing time using the FOR SYSTEM_TIME AS OF PROCTIME() syntax.

This is the most state-efficient join type. RisingWave only needs to maintain state for the stream side. The lookup table can be updated independently, and each incoming stream record gets matched against whatever value is current at that moment.

Example: currency conversion with live exchange rates

CREATE TABLE exchange_rates (
    currency VARCHAR PRIMARY KEY,
    rate DECIMAL,
    updated_at TIMESTAMPTZ
);

CREATE TABLE transactions (
    txn_id INT,
    amount_usd DECIMAL,
    currency VARCHAR,
    txn_time TIMESTAMPTZ
) APPEND ONLY;

INSERT INTO exchange_rates VALUES
    ('EUR', 0.92, '2025-03-28 08:00:00+00'),
    ('GBP', 0.79, '2025-03-28 08:00:00+00'),
    ('JPY', 151.25, '2025-03-28 08:00:00+00');

INSERT INTO transactions VALUES
    (1, 100.00, 'EUR', '2025-03-28 10:00:00+00'),
    (2, 250.00, 'GBP', '2025-03-28 10:05:00+00'),
    (3, 50.00, 'JPY', '2025-03-28 10:10:00+00'),
    (4, 175.00, 'EUR', '2025-03-28 10:15:00+00'),
    (5, 300.00, 'GBP', '2025-03-28 10:20:00+00');

CREATE MATERIALIZED VIEW converted_transactions AS
SELECT
    t.txn_id,
    t.amount_usd,
    t.currency,
    e.rate,
    t.amount_usd * e.rate AS converted_amount,
    t.txn_time
FROM transactions t
INNER JOIN exchange_rates FOR SYSTEM_TIME AS OF PROCTIME() e
ON t.currency = e.currency;

Result:

 txn_id | amount_usd | currency |  rate  | converted_amount |         txn_time
--------+------------+----------+--------+------------------+---------------------------
      1 |     100.00 | EUR      |   0.92 |          92.0000 | 2025-03-28 10:00:00+00:00
      2 |     250.00 | GBP      |   0.79 |         197.5000 | 2025-03-28 10:05:00+00:00
      3 |      50.00 | JPY      | 151.25 |        7562.5000 | 2025-03-28 10:10:00+00:00
      4 |     175.00 | EUR      |   0.92 |         161.0000 | 2025-03-28 10:15:00+00:00
      5 |     300.00 | GBP      |   0.79 |         237.0000 | 2025-03-28 10:20:00+00:00

Key behavior to understand

Temporal joins are one-directional. They only produce output when data arrives on the left (stream) side. If you update the exchange rate for EUR from 0.92 to 0.95, existing rows in the materialized view will not change. Only new transactions arriving after the update will use the new rate. This is by design: it captures the rate that was active at the time each transaction was processed.

Requirements for temporal joins

  • The left side must be an append-only table or source.
  • The right side must have a primary key.
  • The join condition must include the primary key of the right side.
  • The syntax uses FOR SYSTEM_TIME AS OF PROCTIME() after the right-side table name.

How Do Interval Joins Handle Time-Bounded Stream Correlation?

An interval join matches rows from two streams where the timestamps fall within a specified range of each other. This is the solution for correlating events across streams when you expect them to occur within a known time window.

The critical advantage over regular joins: RisingWave can garbage-collect state for rows that fall outside the interval boundary. If your interval is 10 minutes, the engine only needs to buffer at most 10 minutes of data from each side.

Example: ad impression to click attribution

A common ad-tech problem is attributing clicks to the impression that caused them. A user sees an ad (impression), and if they click within 10 minutes, you want to pair those events.

CREATE TABLE ad_impressions (
    impression_id INT,
    user_id INT,
    campaign_id INT,
    impression_time TIMESTAMPTZ
);

CREATE TABLE ad_clicks (
    click_id INT,
    user_id INT,
    campaign_id INT,
    click_time TIMESTAMPTZ
);

INSERT INTO ad_impressions VALUES
    (1, 501, 10, '2025-03-28 09:00:00+00'),
    (2, 502, 10, '2025-03-28 09:05:00+00'),
    (3, 503, 20, '2025-03-28 09:10:00+00'),
    (4, 501, 20, '2025-03-28 09:30:00+00'),
    (5, 504, 10, '2025-03-28 10:00:00+00');

INSERT INTO ad_clicks VALUES
    (1, 501, 10, '2025-03-28 09:02:00+00'),
    (2, 502, 10, '2025-03-28 09:20:00+00'),
    (3, 503, 20, '2025-03-28 09:12:00+00'),
    (4, 501, 20, '2025-03-28 09:35:00+00'),
    (5, 505, 10, '2025-03-28 10:05:00+00');

CREATE MATERIALIZED VIEW ad_click_attribution AS
SELECT
    i.impression_id,
    c.click_id,
    i.user_id,
    i.campaign_id,
    i.impression_time,
    c.click_time,
    c.click_time - i.impression_time AS time_to_click
FROM ad_impressions i
JOIN ad_clicks c
    ON i.user_id = c.user_id
    AND i.campaign_id = c.campaign_id
    AND c.click_time BETWEEN i.impression_time
        AND i.impression_time + INTERVAL '10 minutes';

Result:

 impression_id | click_id | user_id | campaign_id |      impression_time      |        click_time         | time_to_click
---------------+----------+---------+-------------+---------------------------+---------------------------+---------------
             1 |        1 |     501 |          10 | 2025-03-28 09:00:00+00:00 | 2025-03-28 09:02:00+00:00 | 00:02:00
             3 |        3 |     503 |          20 | 2025-03-28 09:10:00+00:00 | 2025-03-28 09:12:00+00:00 | 00:02:00
             4 |        4 |     501 |          20 | 2025-03-28 09:30:00+00:00 | 2025-03-28 09:35:00+00:00 | 00:05:00

Notice what happened: user 502 clicked campaign 10 at 09:20, which is 15 minutes after the impression at 09:05. That exceeds our 10-minute window, so it was excluded. User 505 clicked campaign 10 but never saw an impression (no matching user_id in impressions), so that click was also excluded.

How interval joins control state

The BETWEEN clause tells RisingWave exactly how long it needs to buffer data. With a 10-minute interval, any impression older than 10 minutes before the latest click can be safely discarded. This makes interval joins practical for high-throughput production workloads where unbounded state would be a problem.

How Do Window Joins Align Streams into Fixed Time Buckets?

A window join partitions both streams into aligned time windows (tumbling, hopping, or session) and only matches rows that fall into the same window. After a window closes, its state is released entirely.

This is useful when you want to correlate events that happened during the same time period, not within a relative offset of each other. Think of it as "group both streams by time period, then join within each group."

Example: correlating page views with purchases

CREATE TABLE page_views (
    user_id INT,
    page VARCHAR,
    view_time TIMESTAMPTZ,
    WATERMARK FOR view_time AS view_time - INTERVAL '5 seconds'
) APPEND ONLY;

CREATE TABLE purchases (
    user_id INT,
    product VARCHAR,
    amount DECIMAL,
    purchase_time TIMESTAMPTZ,
    WATERMARK FOR purchase_time AS purchase_time - INTERVAL '5 seconds'
) APPEND ONLY;

INSERT INTO page_views VALUES
    (1, '/products/laptop', '2025-03-28 10:00:30+00'),
    (2, '/products/phone', '2025-03-28 10:00:45+00'),
    (1, '/checkout', '2025-03-28 10:01:15+00'),
    (3, '/products/tablet', '2025-03-28 10:01:30+00'),
    (2, '/checkout', '2025-03-28 10:02:00+00');

INSERT INTO purchases VALUES
    (1, 'Laptop', 999.99, '2025-03-28 10:01:20+00'),
    (2, 'Phone', 699.99, '2025-03-28 10:02:10+00'),
    (3, 'Tablet', 449.99, '2025-03-28 10:03:30+00');

CREATE MATERIALIZED VIEW browsing_to_purchase AS
SELECT
    pv.user_id,
    pv.page,
    p.product,
    p.amount,
    pv.view_time,
    p.purchase_time,
    pv.window_start,
    pv.window_end
FROM TUMBLE(page_views, view_time, INTERVAL '5 minutes') pv
JOIN TUMBLE(purchases, purchase_time, INTERVAL '5 minutes') p
ON pv.user_id = p.user_id
AND pv.window_start = p.window_start;

Result:

 user_id |       page       | product | amount |         view_time         |       purchase_time       |       window_start        |        window_end
---------+------------------+---------+--------+---------------------------+---------------------------+---------------------------+---------------------------
       1 | /products/laptop | Laptop  | 999.99 | 2025-03-28 10:00:30+00:00 | 2025-03-28 10:01:20+00:00 | 2025-03-28 10:00:00+00:00 | 2025-03-28 10:05:00+00:00
       1 | /checkout        | Laptop  | 999.99 | 2025-03-28 10:01:15+00:00 | 2025-03-28 10:01:20+00:00 | 2025-03-28 10:00:00+00:00 | 2025-03-28 10:05:00+00:00
       2 | /products/phone  | Phone   | 699.99 | 2025-03-28 10:00:45+00:00 | 2025-03-28 10:02:10+00:00 | 2025-03-28 10:00:00+00:00 | 2025-03-28 10:05:00+00:00
       2 | /checkout        | Phone   | 699.99 | 2025-03-28 10:02:00+00:00 | 2025-03-28 10:02:10+00:00 | 2025-03-28 10:00:00+00:00 | 2025-03-28 10:05:00+00:00
       3 | /products/tablet | Tablet  | 449.99 | 2025-03-28 10:01:30+00:00 | 2025-03-28 10:03:30+00:00 | 2025-03-28 10:00:00+00:00 | 2025-03-28 10:05:00+00:00

All events fall within the same 5-minute tumble window (10:00 to 10:05), so every user's page views are joined with their purchases. If a purchase fell into the next window (10:05 to 10:10), it would not match page views from the previous window.

Window join requirements

  • Both sides must use the same window function (TUMBLE, HOP, or SESSION).
  • Both sides must use the same window size.
  • The join condition must include window_start = window_start (and optionally window_end = window_end).
  • Both sides need watermark definitions for the time column.

What Is an ASOF Join and How Does It Find the Nearest Match?

An ASOF join matches each row from the left stream with the most recent row from the right stream that satisfies an inequality condition. It is designed for scenarios where you need the "latest known value" at a given point in time.

This is particularly useful in financial data, IoT sensor correlation, and any domain where readings arrive at irregular intervals and you need to pair each event with the closest preceding reference value.

Example: matching trade orders with the latest stock price

CREATE TABLE stock_prices (
    symbol VARCHAR,
    price DECIMAL,
    price_time TIMESTAMPTZ,
    WATERMARK FOR price_time AS price_time - INTERVAL '1 second'
) APPEND ONLY;

CREATE TABLE trade_orders (
    order_id INT,
    symbol VARCHAR,
    quantity INT,
    order_time TIMESTAMPTZ,
    WATERMARK FOR order_time AS order_time - INTERVAL '1 second'
) APPEND ONLY;

INSERT INTO stock_prices VALUES
    ('AAPL', 178.50, '2025-03-28 09:30:00+00'),
    ('AAPL', 179.25, '2025-03-28 09:31:00+00'),
    ('GOOG', 141.80, '2025-03-28 09:30:00+00'),
    ('GOOG', 142.10, '2025-03-28 09:31:00+00'),
    ('AAPL', 179.80, '2025-03-28 09:32:00+00');

INSERT INTO trade_orders VALUES
    (1, 'AAPL', 100, '2025-03-28 09:30:30+00'),
    (2, 'GOOG', 50, '2025-03-28 09:30:45+00'),
    (3, 'AAPL', 200, '2025-03-28 09:31:30+00'),
    (4, 'GOOG', 75, '2025-03-28 09:32:00+00');

CREATE MATERIALIZED VIEW trade_with_price AS
SELECT
    t.order_id,
    t.symbol,
    t.quantity,
    s.price AS execution_price,
    t.quantity * s.price AS total_value,
    t.order_time,
    s.price_time AS price_at
FROM trade_orders t
ASOF JOIN stock_prices s
ON t.symbol = s.symbol
AND s.price_time <= t.order_time;

Result:

 order_id | symbol | quantity | execution_price | total_value |        order_time         |         price_at
----------+--------+----------+-----------------+-------------+---------------------------+---------------------------
        1 | AAPL   |      100 |          178.50 |    17850.00 | 2025-03-28 09:30:30+00:00 | 2025-03-28 09:30:00+00:00
        2 | GOOG   |       50 |          141.80 |     7090.00 | 2025-03-28 09:30:45+00:00 | 2025-03-28 09:30:00+00:00
        3 | AAPL   |      200 |          179.25 |    35850.00 | 2025-03-28 09:31:30+00:00 | 2025-03-28 09:31:00+00:00
        4 | GOOG   |       75 |          142.10 |    10657.50 | 2025-03-28 09:32:00+00:00 | 2025-03-28 09:31:00+00:00

Look at order 1 (AAPL at 09:30:30). Two AAPL prices exist: 178.50 at 09:30:00 and 179.25 at 09:31:00. The ASOF join correctly selects 178.50 because it is the most recent price at or before the order time. Order 3 (AAPL at 09:31:30) gets 179.25, which is the latest price at that point.

ASOF join vs interval join

Both operate on time ranges, but they solve different problems. An interval join returns all matches within a time range (one-to-many). An ASOF join returns exactly one match: the nearest record (one-to-one). Use an interval join when you want all correlated events. Use an ASOF join when you want the single best match.

Which Streaming Join Should You Choose?

Here is a decision framework based on your data characteristics:

Join TypeState SizeUse WhenExample
RegularUnbounded (both sides retained)One side is a dimension table, or both sides are boundedOrders + Products
TemporalMinimal (stream side only)Looking up current reference data for each eventTransactions + Exchange Rates
IntervalBounded by interval rangeCorrelating events within a known time offsetImpressions + Clicks
WindowBounded by window sizeGrouping events into aligned time buckets before joiningPage Views + Purchases
ASOFBounded by watermarkFinding the nearest preceding value from another streamTrade Orders + Stock Prices

Three rules of thumb

  1. If one side is a lookup table that changes slowly, use a temporal join. It has the lowest state cost and the simplest semantics.

  2. If you know the maximum time gap between correlated events, use an interval join. A 10-minute click attribution window, a 30-second sensor correlation, a 1-hour order-to-shipment match. The bounded interval keeps state manageable.

  3. If you need to align both streams to the same clock (same 1-minute bucket, same 1-hour window), use a window join. This is the most aggressive at releasing state because entire windows are discarded after closing.

FAQ

What is a streaming join?

A streaming join is a continuously executing join operation over unbounded data streams. Unlike batch joins that run once, streaming joins produce new output whenever matching data arrives on either side. They require the engine to maintain internal state for previously seen rows because matching data from the other stream may arrive at any time.

How does RisingWave handle state for streaming joins?

RisingWave stores join state in a shared storage layer backed by object storage (S3 or compatible). For regular joins, state includes all rows from both sides. For interval and window joins, RisingWave automatically garbage-collects rows that fall outside the time boundary. For temporal joins, only the stream side is stored as state because the lookup side is read directly from the table.

What is the difference between an ASOF join and a temporal join?

An ASOF join correlates two streams based on the nearest timestamp match and requires watermarks on both sides. A temporal join looks up the current value of a reference table at processing time and uses FOR SYSTEM_TIME AS OF PROCTIME(). Use ASOF when both sides are event streams with timestamps. Use temporal when one side is a reference table you want to look up at the moment each event arrives.

Can I use LEFT JOIN with all streaming join types?

Yes. RisingWave supports LEFT JOIN variants for regular joins, temporal joins, and ASOF joins. For interval and window joins, LEFT JOIN is also supported. A LEFT JOIN preserves all rows from the left stream even when no match exists on the right side, filling unmatched columns with NULL.

Conclusion

Streaming joins are the backbone of any real-time data pipeline that combines multiple event sources. The key takeaways:

  • Regular joins give you complete, continuously updated results but maintain unbounded state. Use them when one side is a bounded dimension table.
  • Temporal joins are the most efficient for enriching a stream with reference data. They require the lookup table to have a primary key and use FOR SYSTEM_TIME AS OF PROCTIME().
  • Interval joins bound state by constraining the time range between matching events. They are ideal for attribution, correlation, and any use case with a known time gap.
  • Window joins partition both streams into aligned time buckets and discard state after each window closes. They require watermarks and matching window definitions on both sides.
  • ASOF joins find the single nearest match from a reference stream, which is particularly useful in financial and IoT applications.

All examples in this post were tested on RisingWave 2.8.0 and run as standard SQL. No custom operators, no external frameworks, no JVM tuning.


Ready to try these joins yourself? Get started with RisingWave in 5 minutes. Quickstart

Join our Slack community to ask questions and connect with other stream processing developers.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.