Apache Iceberg is a leading open table format for data lakehouses, valued for its reliability and features like time-travel. However, for many teams, the initial setup of an Iceberg Catalog can be a point of friction.
Traditionally, this requires provisioning and managing a separate component—a PostgreSQL database for a JDBC catalog, an AWS Glue Catalog, or a REST service like Nessie. This setup adds operational overhead and can delay the actual work of building data pipelines.
To address this, RisingWave now includes a Hosted Iceberg Catalog. This feature is designed to reduce the setup complexity to a single configuration parameter.
This guide will show you how to create a streaming-enabled Iceberg table in three steps.
Prerequisites
You will need two things to get started:
A running RisingWave instance. (You can get started easily with RisingWave Cloud or our Docker Compose setup).
Access to an object store (like S3, GCS, or Azure) and the necessary credentials.
Step 1: Create a Connection with the Hosted Catalog
First, we need to tell RisingWave where to store the Iceberg data and that we want it to manage the catalog for us. We do this by creating an Iceberg connection.
Notice the key parameter here: hosted_catalog = true
. This single line instructs RisingWave to use its own internal metastore as a fully-functional Iceberg catalog. You don't need to provide any external database URIs or credentials.
CREATE CONNECTION my_iceberg_connection
WITH (
type = 'iceberg',
warehouse.path = 's3://your-bucket/iceberg-warehouse',
s3.access.key = 'your-access-key',
s3.secret.key = 'your-secret-key',
s3.endpoint = 'your-s3-endpoint',
s3.region = '<your-region>', -- e.g., 'us-east-1'
s3.path.style.access = 'true',
hosted_catalog = true -- The magic happens here!
);
With this one command, you've just provisioned a fully standard, JDBC-compatible Iceberg catalog without leaving RisingWave.
Note that the example uses S3 as the object storage for Iceberg. You can also use GCS, Azure, or S3-compatible object storage for Iceberg. For details, see Object storage configuration for Apache Iceberg in RisingWave.
Step 2: Create Your Iceberg Table
Now that we have our connection, let’s create a table that writes data directly to the Iceberg format.
First, we set our session to use the connection we just created. Then, we use a standard CREATE TABLE
command, adding ENGINE = iceberg
to specify that this table's data should be stored in the Iceberg format.
-- Set the connection for your session
SET iceberg_engine_connection = 'public.my_iceberg_connection';
-- Create the table
CREATE TABLE machine_sensors ( sensor_id INT PRIMARY KEY, temperature DOUBLE, reading_ts TIMESTAMP
)
WITH (commit_checkpoint_interval = 1)
ENGINE = iceberg;
That’s it! You now have an Iceberg table ready to receive streaming data.
Step 3: Stream Data and Query It
You can insert data into this table just like any other in RisingWave, whether from a direct INSERT
or a CREATE SINK
from a Kafka topic.
INSERT INTO machine_sensors
VALUES
(101, 25.5, NOW()),
(102, 70.2, NOW());
Now, let's query the table to see our data.
SELECT * FROM machine_sensors;
-- sensor_id | temperature | reading_ts
-- -----------+-------------+----------------------------
-- 101 | 25.5 | 2024-05-21 10:30:00.123...
-- 102 | 70.2 | 2024-05-21 10:30:01.456...
Your data is now persisted in your S3 bucket in the open Apache Iceberg format.
Why This Matters
In just a few minutes and with three simple commands, you have a fully functional streaming pipeline writing to an open lakehouse table.
Let’s reflect on what you didn't have to do:
You didn't have to provision a separate PostgreSQL or MySQL database.
You didn't have to configure network rules and user credentials for that database.
You didn't have to manage IAM permissions for an AWS Glue Catalog.
You didn't have to deploy and manage a REST catalog service like Nessie.
By collapsing the entire catalog setup into a single hosted_catalog = true
flag, we've dramatically lowered the barrier to entry for building on the streaming lakehouse. This allows you to focus on what matters: building your data applications, not managing infrastructure.
Furthermore, because the hosted catalog is a standard JDBC catalog, your data remains accessible to other tools in your stack, like Spark or Trino. This ensures interoperability without vendor lock-in.
Ready to try it for yourself? Check out our official Iceberg documentation or join our community on Slack to ask questions and share what you build.