Introducing Write Modes for Apache Iceberg in RisingWave

Introducing Write Modes for Apache Iceberg in RisingWave

Every data engineer knows the struggle: optimizing for high-throughput data ingestion often means sacrificing query performance. Do you tune your system for absorbing massive volumes of real-time events, or for powering snappy, responsive dashboards? For a long time, the answer has been, "it depends on the compromise you're willing to make."

That's why we're excited to announce that with our latest v2.6 release, RisingWave now supports both Copy-on-Write (CoW) and Merge-on-Read (MoR) write modes for Apache Iceberg. This powerful feature is available for two key scenarios:

  1. Iceberg Sinks: For delivering data from RisingWave to an externally managed Iceberg table.

  2. Internal Iceberg Tables: For creating and managing Iceberg tables directly within RisingWave.

This gives you the power to fine-tune the behavior of your Iceberg tables directly within your stream processing workflow, ensuring that your data architecture is perfectly optimized for its intended use case.

Understanding the Two Write Modes: CoW vs. MoR

To understand the power of this new feature, let's demystify the two write modes.

Merge-on-Read (MoR): The "Write-Optimized" Approach

Think of Merge-on-Read like a notebook where you quickly jot down new information and edits on sticky notes. The main text isn't changed immediately. When you want to read the most up-to-date version, you have to read the original text and then apply the changes from the sticky notes.

This is how MoR works in RisingWave and is the default mode. When data is updated or deleted, RisingWave writes these changes to separate, smaller "delta" files. This process is extremely fast, making it perfect for write-heavy workloads. The trade-off is that at query time, the read engine must merge the base files with the delta files on the fly, which can lead to slower query performance.

Copy-on-Write (CoW): The "Read-Optimized" Approach

Now, imagine editing a document by rewriting the entire page to incorporate your changes. There are no sticky notes. Every time you read the document, you are guaranteed to see the final, clean version without any extra work.

This is the essence of Copy-on-Write. When a record is updated, RisingWave rewrites the entire data file (or files) containing the changed rows. This results in slightly higher write latency, but the benefit is immense: queries are significantly faster because there is no on-the-fly merging required.

Real-World Scenarios: When to Use CoW vs. MoR

To make this concrete, let's explore two common real-world scenarios.

Use Case for Merge-on-Read (MoR): High-Volume, Real-Time Ingestion

  • Scenario: You are streaming change data capture (CDC) events from a production PostgreSQL database that powers a busy e-commerce platform. The database handles thousands of transactions and inventory updates every minute.

  • Why MoR is a Fit: Your top priority is to ingest this high volume of changes into your Iceberg table without falling behind. MoR excels here because its low write latency ensures that RisingWave can keep pace, guaranteeing your data lakehouse is always up-to-date. The occasional analytical query might be a bit slower, but that's an acceptable trade-off for real-time data freshness.

Use Case for Copy-on-Write (CoW): Powering Analytical Dashboards

  • Scenario: You are building a materialized view in RisingWave to power a series of business intelligence (BI) dashboards for your executive team. The dashboards are refreshed every 10 minutes and are queried constantly throughout the day.

  • Why CoW is a Fit: Here, read performance is paramount. By using CoW for your Iceberg table, you ensure that every query against the dashboard is as fast as possible. The slightly higher write latency that occurs during the 10-minute refresh cycle is negligible compared to the benefit of consistently fast dashboards that drive business decisions.

Head-to-Head Comparison

Here is a quick-reference table to help you decide which mode is right for you.

FeatureCopy-on-Write (CoW)Merge-on-Read (MoR)
Optimized ForRead PerformanceWrite Performance
Write LatencyHigher (rewrites files)Lower (appends to delta files)
Query SpeedFaster (no on-the-fly merging)Slower (requires merging data and delta files)
Storage CostPotentially higher (due to file rewriting)Generally lower
Best ForBI Dashboards, Analytics, Ad-hoc QueriesReal-time Ingestion, CDC, High-Volume Streaming

How to Configure the Write Mode in RisingWave

Getting started is simple. You can specify the write mode directly in your WITH clause.

Important Note: Both write modes require RisingWave’s Iceberg compaction service and a dedicated compactor node. For Iceberg sinks, you must also enable compaction in your CREATE SINK statement. This is not necessary for internal Iceberg tables, as compaction is enabled by default.

For an Iceberg Sink

Use CREATE SINK to deliver data to an external Iceberg table, specifying the write_mode.

CREATE SINK my_iceberg_dashboard_sink
FROM my_dashboard_mv
WITH (
    connector = 'iceberg',
    type = 'upsert',
    primary_key = 'user_id',
    warehouse.path = 's3a://my-bucket/my-warehouse',
    s3.region = 'us-east-1',
    database.name = 'analytics',
    table.name = 'user_activity',
    write_mode = 'copy-on-write' -- Specify CoW for fast reads
    enable_compaction = true,
    compaction_interval_sec = 600
);

For an Internal Iceberg Table

Use CREATE TABLE to have RisingWave manage the Iceberg table for you.

CREATE TABLE my_internal_iceberg_table (
    user_id INT,
    activity VARCHAR,
    event_timestamp TIMESTAMP
) WITH (
    connector = 'iceberg',
    warehouse.path = 's3a://my-bucket/rw-managed-warehouse',
    s3.region = 'us-east-1',
    database.name = 'internal_db',
    write_mode = 'copy-on-write' -- Or 'merge-on-read'
);

Get Started with RisingWave

With the addition of configurable write modes in RisingWave v2.6, you now have another powerful tool to build a data lakehouse that truly fits your needs. Whether you are delivering data to an existing data lakehouse with an Iceberg sink or building a new one with RisingWave's internal Iceberg tables, you can make a deliberate choice to optimize for write-heavy pipelines, read-heavy workloads, or a mix of both.

If you’d like to see a personalized demo or discuss how this could work for your use case, please contact our sales team.

The Modern Backbone for Your
Data Streaming Workloads
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.