Emit Strategy / Output Mode

An Emit Strategy (or Output Mode, Trigger Strategy) in stateful stream processing defines when and how results are produced and sent downstream from operators, particularly those involving Windowing or stateful aggregations. It controls the timing and frequency of output updates based on the arrival of new data or the progression of time.

Different emit strategies offer trade-offs between result freshness (latency), completeness, and the volume of downstream traffic generated.

The Problem: When to Output Streaming Results?

Consider a continuous query calculating the count of events within a 5-minute Tumbling Window. When should the system output the result for a specific window (e.g., 10:00 AM - 10:05 AM)?

  • Option 1: Only when the window closes? Output the final count for 10:00-10:05 only after 10:05 AM has passed (and potentially after a Watermark indicates completeness). This gives a single, final result per window but introduces latency.
  • Option 2: Continuously update as each event arrives? Output the current count every time a new event belonging to the 10:00-10:05 window arrives. This provides low-latency, up-to-the-minute results but generates many intermediate updates.
  • Option 3: Periodically update? Output the current count for the window every, say, 30 seconds while the window is active. This is a balance between latency and update frequency.

The emit strategy dictates which of these (or similar) behaviors is chosen.

Common Emit Strategies

Different stream processing systems use varying terminology, but common strategies include:

  1. Emit on Watermark / On Window Close:

    • Behavior: Results for a window are emitted only once, when the system's Watermark passes the end timestamp of the window. This signifies that, according to the watermark, no more late events are expected for that window.
    • Pros: Produces a single, final, and typically complete result per window. Simpler downstream processing.
    • Cons: Higher latency, as output is delayed until the watermark progresses. May discard very late events arriving after the watermark.
    • Use Case: Classic batch-like window processing, reporting, dashboards where final results are sufficient.
  2. Emit Immediately / Continuous Mode:

    • Behavior: Results are updated and emitted every time a new input element is processed that affects the result (e.g., a new event arrives for an aggregation, a change occurs in a Materialized View).
    • Pros: Lowest latency. Downstream systems always see the most up-to-date result based on processed data.
    • Cons: Can generate a high volume of updates, potentially overwhelming downstream systems. Results are often intermediate or partial until a window closes or the stream ends.
    • Use Case: Real-time monitoring, alerting, applications needing immediate feedback. This is the default behavior for Materialized Views in RisingWave.
  3. Periodic Emit / Processing Time Trigger:

    • Behavior: Results are emitted periodically based on processing time intervals (e.g., every 1 minute), regardless of event arrival or watermarks. Outputs the current state of the result at the time of the trigger.
    • Pros: Bounds the output rate, predictable update frequency.
    • Cons: Latency depends on the trigger interval. Results might not align perfectly with event-time windows.
    • Use Case: Dashboards requiring regular updates but not necessarily per-event updates.
  4. Accumulating Mode: (Similar to Emit Immediately for windows)

    • The output includes all intermediate results. For a window, it would emit count=1, then count=2, etc., as events arrive.
  5. Accumulating & Retracting Mode:

    • When an update occurs, the system first emits a "retraction" message for the previous result, followed by the new result. For example, to update a count from 5 to 6, it might emit (-1, count=5) and then (+1, count=6). This allows downstream systems to correctly update aggregates that cannot easily overwrite previous values. This is related to the concept of Changelog Streams.

Emit Strategies in RisingWave

RisingWave primarily uses a continuous, "Emit Immediately" strategy by default for its Materialized Views.

  • Materialized Views: When the result of a query underlying a materialized view changes due to new upstream data, RisingWave updates the view and makes the new result available immediately for querying. It maintains the current, up-to-date result.
  • Sinks: When data is sent to external systems via Sinks, the behavior often resembles Accumulating & Retracting Mode through Changelog Streams. RisingWave sends insert (+I), delete (-D), and update (-U, +U) events, allowing the downstream system to correctly maintain the state based on these changes. This is essential for replicating the state of a materialized view into another system (e.g., a Kafka topic, database, data warehouse).
  • Windowing Functions (Tumbling, Hopping, Sliding): While RisingWave calculates window results internally, the output behavior through sinks or when querying materialized views follows the continuous/changelog model. Queries against a materialized view containing window results will show the current state of those windows based on processed data. Sinks will receive change events reflecting updates to those window results. RisingWave does not typically operate in a purely "Emit on Watermark" mode where results are held back; it favors low-latency continuous updates.

Understanding the emit strategy is crucial for interpreting the output of a streaming job and designing downstream consumers correctly.

Related Glossary Terms

  • Windowing (Tumbling, Hopping, Sliding, Session)
  • Watermark
  • Trigger (Conceptual)
  • Materialized View
  • Sink
  • Changelog Stream
  • Latency
  • Stateful Stream Processing
The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.