Managing data lakes effectively often involves choosing the right catalog service, especially within cloud environments. For AWS users leveraging Apache Iceberg, Amazon S3 Tables offers a compelling, native solution for managing Iceberg table metadata directly within S3. We're excited to announce that RisingWave now fully integrates with Amazon S3 Tables, enabling seamless interaction for reading, writing, and even natively managing Iceberg tables cataloged by this service.
Why Integrate RisingWave with S3 Tables?
Native AWS Experience: Leverage a fully managed Iceberg catalog service tightly integrated with S3 and AWS IAM for simplified permissions and operations.
Simplified Management: Avoid the need to set up and manage separate catalog services like Hive Metastore specifically for Iceberg tables used with RisingWave (though other catalog options remain available).
Enhanced Interoperability: Easily share data processed or created by RisingWave with other AWS services and query engines that support S3 Tables (e.g., Athena, Redshift Spectrum, EMR).
Centralized Metadata: Use S3 Tables as a central hub for discovering and managing Iceberg tables within your AWS environment.
RisingWave's support for S3 Tables covers three key areas: ingesting data, sinking results, and natively managing tables. Let's look at how to configure each.
Reading Data from S3 Tables (Iceberg Source)
You can easily ingest data into RisingWave from existing Iceberg tables cataloged by Amazon S3 Tables.
How it Works: RisingWave uses the Iceberg source connector configured to communicate with the S3 Tables REST API catalog endpoint using SigV4 authentication for metadata discovery. Standard S3 credentials are used for accessing the underlying data files.
Key Configuration: When creating an Iceberg source (connector = 'iceberg'
), the crucial settings for S3 Tables integration are:
catalog.type = 'rest'
warehouse.path
: Use the specific S3 Tables Warehouse ARN.catalog.uri
: The S3 Tables REST API endpoint for your region.catalog.rest.*
parameters (signing_region
,sigv4_enabled=true
,signing_name='s3tables'
): These are required for SigV4 signing against the S3 Tables API.Standard
s3.*
credentials for data access.
Example:
CREATE SOURCE my_s3_tables_source (
-- Define columns matching your Iceberg table
event_id BIGINT,
event_timestamp TIMESTAMP,
user_id VARCHAR
)
WITH (
connector = 'iceberg',
-- *** S3 Tables Specific Config ***
warehouse.path = 'arn:aws:s3tables:us-east-1:123456789012:bucket/my-bucket', -- S3 Tables Warehouse ARN
catalog.type = 'rest',
catalog.uri = '<https://s3tables.us-east-1.amazonaws.com/iceberg>',
catalog.rest.signing_region = 'us-east-1',
catalog.rest.sigv4_enabled = true,
catalog.rest.signing_name = 's3tables',
-- *** Standard Config ***
s3.access.key = 'YOUR_ACCESS_KEY_ID',
s3.secret.key = 'YOUR_SECRET_ACCESS_KEY',
s3.region = 'us-east-1',
database.name = '<your-s3-tables-namespace-name>',
table.name = '<your-s3-tables-table-name>'
);
For additional details, see Use Amazon S3 Tables with the Iceberg source.
Writing Data to S3 Tables (Iceberg Sink)
RisingWave can sink processed data streams directly into Iceberg tables managed by Amazon S3 Tables.
How it Works: The Iceberg sink writes data files to S3 and commits metadata changes to the S3 Tables REST API catalog, again using SigV4 authentication.
Key Configuration: Similar to the source, use connector = 'iceberg' and catalog.type = 'rest' in your CREATE SINK statement. Provide the same S3 Tables-specific warehouse.path (ARN), catalog.uri, and catalog.rest.* signing parameters. You also need sink-specific options like type (append-only/upsert) and primary_key (for upsert).
Example:
CREATE SINK my_s3_tables_sink FROM my_materialized_view
WITH (
connector = 'iceberg',
type = 'upsert',
primary_key = 'user_id',
-- *** S3 Tables Specific Config ***
warehouse.path = 'arn:aws:s3tables:us-east-1:123456789012:bucket/my-bucket', -- S3 Tables Warehouse ARN
catalog.type = 'rest',
catalog.uri = '<https://s3tables.us-east-1.amazonaws.com/iceberg>',
catalog.rest.signing_region = 'us-east-1',
catalog.rest.sigv4_enabled = true,
catalog.rest.signing_name = 's3tables',
-- *** Standard Config ***
s3.access.key = 'YOUR_ACCESS_KEY_ID',
s3.secret.key = 'YOUR_SECRET_ACCESS_KEY',
s3.region = 'us-east-1',
database.name = '<your-s3-tables-namespace-name>',
table.name = '<target-s3-tables-table-name>',
create_table_if_not_exists = true -- Optional
);
For additional details, see Use Amazon S3 Tables with the Iceberg sink.
Natively Managing Tables with S3 Tables (Iceberg Engine)
RisingWave can also directly create and manage Iceberg tables using S3 Tables as the catalog via its native Iceberg table engine.
How it Works: Define a reusable CONNECTION object for your S3 Tables catalog, activate it for the Iceberg engine, and then create tables using ENGINE = iceberg.
Key Steps & Configuration:
- Create Connection: Define a connection (type = 'iceberg') containing the S3 Tables-specific warehouse.path (ARN), catalog.type = 'rest', catalog.uri, and required catalog.rest.* signing parameters, plus S3 credentials.
CREATE CONNECTION my_s3_tables_conn WITH (
type = 'iceberg',
warehouse.path = 'arn:aws:s3tables:us-east-1:123456789012:bucket/my-bucket', -- S3 Tables ARN
catalog.type = 'rest',
catalog.uri = '<https://s3tables.us-east-1.amazonaws.com/iceberg>',
catalog.rest.signing_region = 'us-east-1',
catalog.rest.sigv4_enabled = true,
catalog.rest.signing_name = 's3tables',
s3.access.key = 'YOUR_ACCESS_KEY_ID',
s3.secret.key = 'YOUR_SECRET_ACCESS_KEY',
s3.region = 'us-east-1'
);
- Set Active Connection: Tell the engine to use this connection.
-- For current session:
SET iceberg_engine_connection = 'public.my_s3_tables_conn';
-- Or globally (requires admin):
-- ALTER SYSTEM SET iceberg_engine_connection = 'public.my_s3_tables_conn';
- Create Table: Use
ENGINE = iceberg
.
CREATE TABLE my_native_iceberg_table (
id INT PRIMARY KEY,
data VARCHAR
) ENGINE = iceberg; -- Table metadata managed via my_s3_tables_conn
For additional details, see Use Amazon S3 Tables with the Iceberg engine.
Conclusion
RisingWave's integration with Amazon S3 Tables provides a powerful and seamless way for AWS users to leverage Iceberg within their streaming data pipelines. Whether you need to ingest data from existing S3 Tables, sink processed results back into them, or natively manage Iceberg tables using S3 Tables as the catalog, RisingWave offers a streamlined experience that fits naturally into the AWS ecosystem. This simplifies architecture, improves interoperability, and allows you to focus on building robust, scalable streaming applications.
Try RisingWave Today
Ready to dive in? Choose the option that works best for you:
Start a free trial of RisingWave Cloud: Our fully managed cloud service.
Test the open-source version locally: Get started quickly on your own machine.
Want to stay connected? Follow us on Twitter and LinkedIn for the latest updates, and join our Slack community to chat with our engineers and hundreds of fellow streaming enthusiasts.