Tumbling Window
A Tumbling Window is a specific type of time window used in stream processing that divides a data stream into a series of fixed-size, non-overlapping, and contiguous time intervals. Each event in the stream belongs to exactly one tumbling window.
Think of them as a series of adjacent, fixed-duration "buckets" where events are collected for processing (e.g., aggregation).
Key Characteristics
- Fixed Size (Duration): All tumbling windows have the same, predefined duration (e.g., 5 minutes, 1 hour, 1 day).
- Non-overlapping: Windows do not share any time. The end of one window is immediately followed by the start of the next.
- If a window is defined as [T, T + duration), the next window will be [T + duration, T + 2*duration), and so on.
- Contiguous: There are no gaps between consecutive windows. The entire timeline is covered.
- Assignment: Each event from the stream is assigned to a single window based on its timestamp (event time or processing time).
Example
If you define a 10-minute tumbling window:
- Events occurring between 10:00:00 and 10:09:59.999 would fall into the window [10:00:00, 10:10:00).
- Events occurring between 10:10:00 and 10:19:59.999 would fall into the window [10:10:00, 10:20:00).
- And so on.
Use Cases
Tumbling windows are commonly used for:
- Periodic Reporting: Generating reports at regular intervals (e.g., total sales per minute, number of errors per hour, average sensor reading per day).
- Fixed-Interval Analysis: Analyzing data within distinct, non-overlapping time segments.
- Simple Aggregations: When you need to aggregate data over fixed chunks of time without overlap. For example, counting the number of tweets every 5 minutes.
RisingWave SQL Implementation
In RisingWave, tumbling windows are typically implemented using the TUMBLE() Time Window Valued Function (TVF) within a GROUP BY clause.
SELECT
window_start,
window_end,
SUM(amount) AS total_amount
FROM TUMBLE(transactions_stream, event_timestamp, INTERVAL '15' MINUTE)
GROUP BY window_start, window_end;
SELECT
TUMBLE_START(login_time, INTERVAL '1' HOUR) AS hour_bucket,
COUNT(user_id) AS login_count
FROM user_logins
GROUP BY TUMBLE(login_time, INTERVAL '1' HOUR);
Advantages
- Simplicity: Easy to understand and implement.
- Clear Boundaries: Each event belongs to one and only one window, avoiding ambiguity.
- Efficiency: Can be processed efficiently as windows are distinct.
Considerations
- Boundary Effects: Events occurring exactly on a window boundary might be assigned to one window or the other based on inclusive/exclusive boundary definitions (typically, start is inclusive, end is exclusive).
- Event Time vs. Processing Time: The choice of time characteristic (event time or processing time) significantly impacts how events are assigned to windows and the accuracy of results, especially with out-of-order data or processing delays. Using event time with watermarks is generally recommended for accuracy.
Tumbling windows are a fundamental building block for many stream processing analytics and are well-supported in systems like RisingWave.
Related Glossary Terms