Join our Streaming Lakehouse Tour!
Register Now.->

Hopping Window

A Hopping Window is a type of time-based window used in stream processing that groups elements based on time intervals. Unlike Tumbling Windows that are fixed and non-overlapping, Hopping Windows have a fixed duration (the window size) but slide forward by a specified hop size (or slide interval). This means that Hopping Windows can overlap if the hop size is smaller than the window size. If the hop size is equal to the window size, a Hopping Window behaves like a Tumbling Window.

Hopping Windows are useful for analyzing data over moving time periods, often to calculate moving averages or other statistics that need to be updated more frequently than the total window duration.

Key Characteristics

  • Window Size (Duration): The length of the time interval each window covers (e.g., 1 hour).
  • Hop Size (Slide): The interval at which windows are created (e.g., 10 minutes).
  • Overlapping Windows: If the hop size is less than the window size, an element can belong to multiple windows. For example, with a 1-hour window size and a 10-minute hop, a new window starts every 10 minutes, covering the preceding hour. An event occurring at 10:05 AM could belong to windows starting at 10:00 AM, 9:50 AM, 9:40 AM, etc.
  • Fixed Start Times: Windows are typically aligned with the epoch, meaning their start times are multiples of the hop size, adjusted for any potential offset.

Example Scenario

Imagine you want to monitor the average transaction value over the last 5 minutes, with updates every 1 minute.

  • Window Size: 5 minutes
  • Hop Size: 1 minute

A Hopping Window configuration would look like this:

  • Window 1: [00:00 - 00:05)
  • Window 2: [00:01 - 00:06)
  • Window 3: [00:02 - 00:07)
  • ... and so on.

An event occurring at 00:04:30 would be included in Window 1, Window 2, Window 3, Window 4, and Window 5.

Use Cases

  • Moving Averages: Calculating averages (e.g., average sensor reading, average stock price) over a recent period, updated frequently.
  • Anomaly Detection: Identifying unusual patterns by observing statistics over sliding intervals. If a metric suddenly spikes within a recent window, it might indicate an anomaly.
  • Trend Analysis: Observing how a metric changes over time with more frequent updates than a tumbling window would allow.
  • Real-time Dashboards: Populating dashboards with metrics that reflect recent activity, updated frequently.

Hopping Windows in RisingWave

RisingWave supports Hopping Windows through its SQL interface using the HOP function. You can define Hopping Windows within GROUP BY clauses to perform aggregations over these sliding time intervals.

SELECT
    window_start,
    window_end,
    source_id,
    AVG(value) as average_value
FROM HOP(
    your_table,          -- The table or stream to process
    event_timestamp_col, -- The column representing event time
    INTERVAL '1 minute', -- Hop Size (slide)
    INTERVAL '5 minutes' -- Window Size (duration)
)
GROUP BY window_start, window_end, source_id;

In this example:

  • your_table: The input stream or table containing the data.
  • event_timestamp_col: The column indicating the event time of each record.
  • INTERVAL '1 minute': This is the hop_interval (how often a new window starts).
  • INTERVAL '5 minutes': This is the window_duration (the length of each window).

The query would output the average value for each source_id within 5-minute windows that advance every 1 minute.

Considerations

  • State Management: Since elements can belong to multiple windows, the system needs to manage state for each active window. This can lead to higher state storage and processing overhead compared to Tumbling Windows, especially with small hop sizes and large window sizes.
  • Duplicate Output (for non-idempotent operations): If an operation is performed once per window an element belongs to, it means the same element can trigger multiple outputs if windows overlap. The semantics of how results are emitted (e.g., only when a window closes, or with continuous updates) are important. RisingWave's materialized views handle this by incrementally updating results.
  • Resource Consumption: Smaller hop sizes mean more frequent computations and potentially more overlapping windows, which can increase resource usage.

Related Glossary Terms

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.