Continuous Query

A Continuous Query is a query designed to operate perpetually over one or more unbounded, continuously arriving data streams. Unlike traditional database queries that execute once against a static, finite dataset and produce a fixed result set, a continuous query runs indefinitely, processing new data as it arrives and producing results that dynamically update over time.

Continuous queries are the core abstraction used in many stream processing systems, including Streaming Databases like RisingWave, to define ongoing computations on data in motion.

Traditional Queries vs. Continuous Queries

Feature	Traditional Query (e.g., Standard SQL on DB)	Continuous Query (e.g., Streaming SQL)
Input Data	Finite, static dataset (at query time)	Unbounded, dynamic data stream(s)
Execution	Runs once, terminates	Runs perpetually until explicitly stopped
Output	Single, fixed result set	Dynamically updating result set over time
State	Typically stateless (operates on data-at-rest)	Often stateful (needs to remember past data)
Time Semantics	Implicitly 'now' (snapshot of data)	Explicit time handling (Event/Processing Time)

How Continuous Queries Work

Executing a continuous query involves:

Definition: The user defines the query logic, often using an extension of SQL (Streaming SQL) or a programmatic API (like Flink's DataStream API). This definition specifies sources, transformations (filters, projections), stateful operations (joins, aggregations, windowing), and potentially sinks.
Dataflow Graph Planning: The stream processing system translates the query definition into a logical and then a physical execution plan, typically represented as a Dataflow Graph. This graph outlines the operators and the data flow between them.
Deployment & Execution: The dataflow graph is deployed across the system's distributed processing nodes. Operators start consuming data from sources or upstream operators.
Continuous Processing: As new data records arrive, they flow through the operators in the graph.
State Management: Stateful operators (like aggregations or joins) continuously update their internal state based on incoming records.
Result Emission: The system produces output results based on the query logic and potentially an Emit Strategy. Results might be:
- A continuous stream of changes (inserts, updates, deletes) to the result set.
- Periodically emitted snapshots (e.g., at the end of a time window).
- Maintained internally as a continuously updated table or Materialized View.

Key Concepts in Continuous Queries

Unbounded Data: Handling potentially infinite streams without defined ends.
State Management: The need to store and manage state for operations like joins and aggregations.
Time Semantics: Explicitly dealing with time (Event Time vs. Processing Time) is crucial, especially for windowing.
Windowing: Grouping stream data into finite buckets (often based on time) for operations like aggregation.
Incremental Computation: Efficiently updating results based only on new data, rather than recomputing everything (a key feature of systems like RisingWave).

Continuous Queries in RisingWave

RisingWave uses a PostgreSQL-compatible dialect of Streaming SQL as the primary way to define continuous queries. Key constructs include:

'CREATE SOURCE': Defines the connection to input data streams (e.g., from Kafka, Pulsar).
'CREATE TABLE': Defines tables, which can also serve as sources (e.g., via CDC) or internal state.
'CREATE MATERIALIZED VIEW': This is the core construct for executing a continuous query and persistently storing its results. The query defined within the 'AS SELECT ...' clause runs continuously. RisingWave automatically updates the materialized view incrementally and efficiently as source data changes. For instance, this command allows users to define aggregations over time windows, join multiple streams, or perform complex transformations, with the results being stored and kept fresh.
'CREATE SINK': Defines where to send the output stream of changes from a Table or Materialized View.

By using 'CREATE MATERIALIZED VIEW', users define continuous queries whose results are immediately queryable with low latency, without needing to re-execute the query logic on demand. The underlying continuous query runs in the background, managed by RisingWave's stream processing engine.

Benefits