EXPLAIN ANALYZE in RisingWave: Profiling Real‑Time Query Performance

You’ve meticulously crafted a real-time streaming pipeline in RisingWave. You’ve created a materialized view to power a critical dashboard, but something is wrong. The data on the dashboard is lagging, alerts are delayed, and you suspect a performance bottleneck somewhere in your query.

You reach for a familiar tool: EXPLAIN. You check the query plan, and it looks perfectly optimal. The join order is correct, filters are being pushed down, and the distribution strategy makes sense. Yet, the problem persists.

This is a common frustration in stream processing. A static execution plan only tells you what the system intends to do, not how well it's actually doing it under a live workload. Is one of your joins struggling to keep up? Is a specific operator getting overwhelmed and buffering too much data? EXPLAIN can't answer these questions.

To bridge this gap between plan and reality, we introduced EXPLAIN ANALYZE, a powerful diagnostic tool that enriches the execution plan with live, real-time performance statistics.

What Is `EXPLAIN ANALYZE`?

While EXPLAIN shows the logical, physical, and distributed plan for a query, EXPLAIN ANALYZE goes a step further. It executes the query (or, for a materialized view, attaches to the already-running job) and collects live metrics from each operator in the dataflow graph. With this information, you can stop guessing and start diagnosing.

Metrics Collection and Reporting

When you run:

EXPLAIN ANALYZE MATERIALIZED VIEW my_view;

RisingWave performs the following steps:

The serving node sends an RPC to the meta service to fetch fragment assignments and streaming worker locations.
It sends two RPC calls to each streaming node:
- The first collects a baseline metric snapshot.
- The second (after a brief delay) captures a follow-up snapshot.
The frontend computes deltas (e.g., records/sec) from these two samples.
It maps the collected metrics to a reconstructed streaming execution graph, using fragment metadata.

The result is a per-operator breakdown of live performance, including:

Records per second (RPS): Throughput per operator.
Average output pending ratio: How full the output buffer is; indicates potential backpressure or downstream delays.
Actor IDs: Actors contributing to each operator.

Example

--- Create a table and a materialized view
CREATE TABLE t(v1 int);
CREATE MATERIALIZED VIEW m1 AS 
  SELECT * FROM 
    t x JOIN t y USING(v1)
      JOIN t z USING(v1) 
        JOIN t c USING(v1);

--- Analyze the materialized view
EXPLAIN ANALYZE MATERIALIZED VIEW m1;

Example output:

identity                                        |  actor_ids  | output_rps | avg_output_pending_ratio
------------------------------------------------+-------------+------------+--------------------------
 StreamMaterialize                              | 12,11,10,9  | 41589.1    | 0.00
 └─ MergeExecutor                               | 12,11,10,9  | 41589.1    | 0.81
    └─ Dispatcher                               |             | 0          | NaN
       └─ StreamHashJoin                        | 16,15,14,13 | 41560.1    | 0.85
          ├─ StreamHashJoin                     | 16,15,14,13 | 152.6      | 0.93
          │  ├─ MergeExecutor                   | 16,15,14,13 | 25.6       | 1.63
          │  │  └─ Dispatcher                   |             | 0          | NaN
          │  │     └─ StreamFilter              | 20,19,18,17 | 0          | 0
          │  │        └─ StreamTableScan        | 20,19,18,17 | 0          | 0
          │  │           ├─ BatchPlanNode       | 20,19,18,17 | 0          | 0
          │  │           └─ Upstream            | 20,19,18,17 | 0          | 0
          │  └─ MergeExecutor                   | 16,15,14,13 | 0          | 0
          │     └─ Dispatcher                   |             | 0          | NaN
          │        └─ StreamFilter              | 28,27,26,25 | 0          | 0
          │           └─ StreamTableScan        | 28,27,26,25 | 0          | 0
          │              ├─ BatchPlanNode       | 28,27,26,25 | 0          | 0
          │              └─ Upstream            | 28,27,26,25 | 0          | 0
          └─ MergeExecutor                      | 16,15,14,13 | 51.2       | 0.71
             └─ Dispatcher                      |             | 0          | NaN
                └─ StreamHashJoin               | 24,23,22,21 | 51.2       | 0.71
                   ├─ MergeExecutor             | 24,23,22,21 | 0          | 0
                   │  └─ Dispatcher             |             | 0          | NaN
                   │     └─ StreamFilter        | 36,35,34,33 | 0          | 0
                   │        └─ StreamTableScan  | 36,35,34,33 | 0          | 0
                   │           ├─ BatchPlanNode | 36,35,34,33 | 0          | 0
                   │           └─ Upstream      | 36,35,34,33 | 0          | 0
                   └─ MergeExecutor             | 24,23,22,21 | 0          | 0
                      └─ Dispatcher             |             | 0          | NaN
                         └─ StreamFilter        | 32,31,30,29 | 0          | 0
                            └─ StreamTableScan  | 32,31,30,29 | 0          | 0
                               ├─ BatchPlanNode | 32,31,30,29 | 0          | 0
                               └─ Upstream      | 32,31,30,29 | 0          | 0
(32 rows)

Practical Use Cases

With EXPLAIN ANALYZE, stream processing engineers can:

Pinpoint bottlenecks: Use low output_rps or high avg_output_pending_ratio to identify slow or blocked operators.
Tune query plans: Compare runtime performance across strategies (e.g., local vs. global aggregation, filter pushdown, partitioning schemes).
Diagnose backpressure: A consistently high pending ratio may indicate downstream congestion or under-provisioned sinks.
Test at scale: Run synthetic load tests on new materialized views and verify they can scale before rollout.

Try It Today

If you're running RisingWave 2.4 or later versions, you can start using EXPLAIN ANALYZE immediately on materialized views, sinks, and indexes. It’s a low-overhead, high-impact tool to keep your streaming jobs efficient and transparent.

Want to explore further?

Check out our EXPLAIN ANALYZE documentation for detailed syntax and parameters.
Join the conversation in our Slack community to ask questions and connect with others.

EXPLAIN ANALYZE in RisingWave: Profiling Real‑Time Query Performance

What Is EXPLAIN ANALYZE?

Metrics Collection and Reporting

Practical Use Cases

Try It Today

What Is `EXPLAIN ANALYZE`?