Filtering, in the context of stream processing, is a fundamental operation that selectively passes or discards events from a Data Stream based on whether they meet certain criteria or conditions. It's a stateless transformation applied independently to each incoming event.
The purpose of filtering is often to:
A filtering operation typically involves evaluating a boolean condition (a predicate) for each event in the stream:
status = 'ERROR'
, temperature > 100
, country = 'USA'
).user_id IS NOT NULL
).url LIKE '%/login%'
).true
, the event is passed downstream.false
, the event is discarded and not processed further by that path.Because filtering operates on each event independently without needing to remember information from previous events, it is considered a Stateless Stream Processing operation. This generally makes it computationally lightweight and easy to scale.
WHERE
Clause)In RisingWave, filtering is primarily achieved using the standard SQL WHERE
clause within continuous queries, typically when defining Materialized Views or selecting data to be sent to a Sink.
Example:
Consider a source stream sensor_readings
with columns sensor_id
, timestamp
, temperature
, and humidity
.
To create a materialized view containing only high-temperature readings from a specific sensor:
CREATE MATERIALIZED VIEW high_temp_alerts AS
SELECT
sensor_id,
timestamp,
temperature
FROM sensor_readings
WHERE
temperature > 100 AND sensor_id = 'sensor-A1';
In this continuous query:
sensor_readings
source.WHERE temperature > 100 AND sensor_id = 'sensor-A1'
clause acts as the filter predicate.high_temp_alerts
materialized view. Events not matching the WHERE
clause are discarded and do not affect this view.The WHERE
clause can be used in various parts of RisingWave SQL queries, including subqueries, joins, and window functions, to filter data at different stages of the processing pipeline defined by the Dataflow Graph.
WHERE
clauses on the same source).