Introducing Write Modes for Apache Iceberg in RisingWave

Every data engineer knows the struggle: optimizing for high-throughput data ingestion often means sacrificing query performance. Do you tune your system for absorbing massive volumes of real-time events, or for powering snappy, responsive dashboards? For a long time, the answer has been, "it depends on the compromise you're willing to make."

That's why we're excited to announce that with our latest v2.6 release, RisingWave now supports both Copy-on-Write (CoW) and Merge-on-Read (MoR) write modes for Apache Iceberg. This powerful feature is available for two key scenarios:

Iceberg Sinks: For delivering data from RisingWave to an externally managed Iceberg table.
Internal Iceberg Tables: For creating and managing Iceberg tables directly within RisingWave.

This gives you the power to fine-tune the behavior of your Iceberg tables directly within your stream processing workflow, ensuring that your data architecture is perfectly optimized for its intended use case.

Understanding the Two Write Modes: CoW vs. MoR

To understand the power of this new feature, let's demystify the two write modes.

Merge-on-Read (MoR): The "Write-Optimized" Approach

Think of Merge-on-Read like a notebook where you quickly jot down new information and edits on sticky notes. The main text isn't changed immediately. When you want to read the most up-to-date version, you have to read the original text and then apply the changes from the sticky notes.

This is how MoR works in RisingWave and is the default mode. When data is updated or deleted, RisingWave writes these changes to separate, smaller "delta" files. This process is extremely fast, making it perfect for write-heavy workloads. The trade-off is that at query time, the read engine must merge the base files with the delta files on the fly, which can lead to slower query performance.

Copy-on-Write (CoW): The "Read-Optimized" Approach

Now, imagine editing a document by rewriting the entire page to incorporate your changes. There are no sticky notes. Every time you read the document, you are guaranteed to see the final, clean version without any extra work.

This is the essence of Copy-on-Write. When a record is updated, RisingWave rewrites the entire data file (or files) containing the changed rows. This results in slightly higher write latency, but the benefit is immense: queries are significantly faster because there is no on-the-fly merging required.

Real-World Scenarios: When to Use CoW vs. MoR

To make this concrete, let's explore two common real-world scenarios.

Use Case for Merge-on-Read (MoR): High-Volume, Real-Time Ingestion

Scenario: You are streaming change data capture (CDC) events from a production PostgreSQL database that powers a busy e-commerce platform. The database handles thousands of transactions and inventory updates every minute.
Why MoR is a Fit: Your top priority is to ingest this high volume of changes into your Iceberg table without falling behind. MoR excels here because its low write latency ensures that RisingWave can keep pace, guaranteeing your data lakehouse is always up-to-date. The occasional analytical query might be a bit slower, but that's an acceptable trade-off for real-time data freshness.

Use Case for Copy-on-Write (CoW): Powering Analytical Dashboards

Scenario: You are building a materialized view in RisingWave to power a series of business intelligence (BI) dashboards for your executive team. The dashboards are refreshed every 10 minutes and are queried constantly throughout the day.
Why CoW is a Fit: Here, read performance is paramount. By using CoW for your Iceberg table, you ensure that every query against the dashboard is as fast as possible. The slightly higher write latency that occurs during the 10-minute refresh cycle is negligible compared to the benefit of consistently fast dashboards that drive business decisions.

Head-to-Head Comparison

Here is a quick-reference table to help you decide which mode is right for you.

Feature	Copy-on-Write (CoW)	Merge-on-Read (MoR)
Optimized For	Read Performance	Write Performance
Write Latency	Higher (rewrites files)	Lower (appends to delta files)
Query Speed	Faster (no on-the-fly merging)	Slower (requires merging data and delta files)
Storage Cost	Potentially higher (due to file rewriting)	Generally lower
Best For	BI Dashboards, Analytics, Ad-hoc Queries	Real-time Ingestion, CDC, High-Volume Streaming

How to Configure the Write Mode in RisingWave

Getting started is simple. You can specify the write mode directly in your WITH clause.

Important Note: Both write modes require RisingWave’s Iceberg compaction service and a dedicated compactor node. For Iceberg sinks, you must also enable compaction in your CREATE SINK statement. This is not necessary for internal Iceberg tables, as compaction is enabled by default.

For an Iceberg Sink

Use CREATE SINK to deliver data to an external Iceberg table, specifying the write_mode.

CREATE SINK my_iceberg_dashboard_sink
FROM my_dashboard_mv
WITH (
    connector = 'iceberg',
    type = 'upsert',
    primary_key = 'user_id',
    warehouse.path = 's3a://my-bucket/my-warehouse',
    s3.region = 'us-east-1',
    database.name = 'analytics',
    table.name = 'user_activity',
    write_mode = 'copy-on-write' -- Specify CoW for fast reads
    enable_compaction = true,
    compaction_interval_sec = 600
);

For an Internal Iceberg Table

Use CREATE TABLE to have RisingWave manage the Iceberg table for you.

CREATE TABLE my_internal_iceberg_table (
    user_id INT,
    activity VARCHAR,
    event_timestamp TIMESTAMP
) WITH (
    connector = 'iceberg',
    warehouse.path = 's3a://my-bucket/rw-managed-warehouse',
    s3.region = 'us-east-1',
    database.name = 'internal_db',
    write_mode = 'copy-on-write' -- Or 'merge-on-read'
);

Get Started with RisingWave

With the addition of configurable write modes in RisingWave v2.6, you now have another powerful tool to build a data lakehouse that truly fits your needs. Whether you are delivering data to an existing data lakehouse with an Iceberg sink or building a new one with RisingWave's internal Iceberg tables, you can make a deliberate choice to optimize for write-heavy pipelines, read-heavy workloads, or a mix of both.

For more detailed information, please see the official documentation.
Try RisingWave Today:
- Download the open-sourced version of RisingWave to deploy on your own infrastructure.
- Get started quickly with RisingWave Cloud for a fully managed experience.
Talk to Our Experts: Have a complex use case or want to see a personalized demo? Contact us to discuss how RisingWave can address your specific challenges.
Join Our Community: Connect with fellow developers, ask questions, and share your experiences in our vibrant Slack community.

If you’d like to see a personalized demo or discuss how this could work for your use case, please contact our sales team.