RisingWave's integration with Apache Iceberg took a significant step forward with the introduction of REST catalog support in v2.5. This enhancement allows you to connect RisingWave to any modern Iceberg catalog service through a standardized API. We are especially excited to highlight the seamless integration with Lakekeeper, a new, open-source, self-hosted Iceberg REST catalog.
This combination empowers you to build a truly open and flexible streaming lakehouse, giving you full control over your metadata without vendor lock-in.
Why a REST Catalog Matters
An Iceberg catalog is the central nervous system of your lakehouse, tracking table schemas, snapshots, and data file locations. The Iceberg REST Catalog specification has become the modern standard for interoperability, allowing different processing engines like Spark, Flink, and RisingWave to communicate with a single, shared metadata service. This API-driven approach ensures a consistent and unified view of your data across your entire stack.
Take Control with Lakekeeper: A Self-Hosted Catalog
To help you leverage this new REST capability, we're highlighting Lakekeeper, a fast and lightweight open-source Iceberg REST catalog written in Rust. Instead of relying on a managed (and often costly) cloud service like AWS Glue, you can deploy Lakekeeper in your own environment using Docker or Kubernetes. This gives you the benefits of a modern catalog while maintaining complete control over your infrastructure and costs.
Benefits of Using RisingWave with Lakekeeper
Full Control & No Vendor Lock-in: By self-hosting your catalog with Lakekeeper, you own your metadata. This prevents dependency on proprietary cloud services and gives you the freedom to choose the best tools for your stack.
Open and Interoperable: The integration is built on the official Iceberg REST protocol. This means any tool that speaks this standard can interact with the tables you create and manage with RisingWave.
Simplified, Modern Stack: Lakekeeper is a lightweight, single binary that is easy to deploy and manage, making a production-ready streaming lakehouse more accessible than ever.
Unified Streaming and Management: Use RisingWave to ingest data, perform real-time transformations, and sink results directly into Iceberg tables managed by your Lakekeeper instance—all within a cohesive, SQL-driven workflow.
How to Get Started: A Practical Guide
First, ensure you have a running instance of Lakekeeper and your S3-compatible storage (like Minio). The easiest way to do this is by using the docker-compose-with-lakekeeper.yml
file from the RisingWave repository, which handles the setup for you.
RisingWave offers two distinct ways to interact with an Iceberg REST catalog based on your goal.
Use Case 1: Sinking Data into an Iceberg Table
This approach is for streaming data from a RisingWave source or materialized view into an Iceberg table. All the necessary connection parameters for the catalog and storage are defined directly within the CREATE SINK
statement.
CREATE SINK users_sink
FROM user_profiles_stream
WITH (
connector = 'iceberg',
type = 'upsert',
primary_key = 'user_id',
-- Catalog Configuration
catalog.type = 'rest',
catalog.uri = '<http://lakekeeper:8181>', -- Your Lakekeeper endpoint
-- Warehouse and Table info
warehouse.path = 's3://warehouse/',
database.name = 'users',
table.name = 'profiles',
-- S3 Configuration
s3.endpoint = '<http://minio:9301>',
s3.access.key = 'minioadmin',
s3.secret.key = 'minioadmin'
);
Use Case 2: Creating and Managing Native Iceberg Tables
This approach is for creating Iceberg tables that are natively managed by RisingWave. It allows you to create and interact with Iceberg tables directly using SQL. This requires creating a reusable CONNECTION
object first.
Step 1: Create a Reusable Connection
Define a CONNECTION
object that stores the details for your Iceberg catalog and warehouse. This makes your configuration clean and reusable.
CREATE CONNECTION lakekeeper_catalog_conn
WITH (
type = 'iceberg',
catalog.type = 'rest',
catalog.uri = '<http://lakekeeper:8181/catalog/>', -- URI of your Lakekeeper service
warehouse.path = 'my-warehouse',
s3.endpoint = '<http://minio:9301>',
s3.access.key = 'minioadmin',
s3.secret.key = 'minioadmin',
s3.region = 'us-east-1'
);
Step 2: Set the Connection as Active
Activate the connection to make it the default for all native Iceberg table operations in your session or for the entire system.
-- Set the connection as Iceberg engine default connection
SET iceberg_engine_connection = 'public.lakekeeper_catalog_conn';
ALTER SYSTEM SET iceberg_engine_connection = 'public.lakekeeper_catalog_conn';
Step 3: Create a Native Iceberg Table
Now, create your Iceberg table. The WITH
clause is minimal because all catalog and storage details are inherited from the active CONNECTION
.
CREATE TABLE users (
user_id INT,
user_name VARCHAR
) WITH (
connector = 'iceberg'
);
Conclusion
The support for Iceberg REST catalogs in RisingWave, especially when paired with a self-hosted solution like Lakekeeper, marks a significant milestone. It offers a powerful, flexible, and cost-effective path toward building a modern streaming lakehouse. This feature empowers you to take full ownership of your data architecture, break free from vendor lock-in, and embrace the interoperability of the open data ecosystem. We can't wait to see what you build.
Get Started with RisingWave
For more detailed information, please see the official documentation.
Try RisingWave Today:
Download the open-sourced version of RisingWave to deploy on your own infrastructure.
Get started quickly with RisingWave Cloud for a fully managed experience.
Talk to Our Experts: Have a complex use case or want to see a personalized demo? Contact us to discuss how RisingWave can address your specific challenges.
Join Our Community: Connect with fellow developers, ask questions, and share your experiences in our vibrant Slack community.
If you’d like to see a personalized demo or discuss how this could work for your use case, please contact our sales team.