Connector

In the context of data integration and stream processing systems like RisingWave, a Connector is a specialized software component designed to bridge the gap between the core system and external data sources or destinations (sinks). Connectors handle the specific protocols, APIs, authentication methods, and data format conversions required to interact with a particular external technology (like a message queue, database, file system, or SaaS application).

Connectors act as adapters, simplifying the process of getting data into (Source Connectors) or out of (Sink Connectors) the main processing system, allowing users to focus on the stream processing logic rather than the intricacies of interacting with diverse external systems.

The Problem: Integrating Diverse Systems

Modern data ecosystems involve a wide variety of technologies: different message queues (Kafka, Pulsar, Kinesis, RabbitMQ), databases (PostgreSQL, MySQL, MongoDB), data warehouses (Snowflake, BigQuery), data lakes (S3, GCS, Iceberg tables), and more. Each system has its own way of sending and receiving data.

Without connectors, integrating a stream processor with these systems would require writing custom code for every single interaction, handling:

Network connections and protocols (TCP, HTTP, specific binary protocols).
Authentication and authorization.
Data serialization and deserialization (JSON, Avro, Protobuf, CSV, etc.).
Schema management and mapping.
Error handling and retry logic specific to the external system.
Managing offsets or positions for reliable consumption (for sources).
Handling acknowledgments or transaction semantics (for sinks).

This custom integration code would be complex, brittle, time-consuming to develop and maintain, and would need to be replicated for each new external system added.

How Connectors Work: Abstraction and Configuration

Connectors provide a standardized abstraction layer:

Standardized Interface: They typically expose a common interface or configuration pattern within the stream processing framework (like RisingWave's 'CREATE SOURCE' or 'CREATE SINK' DDL).
System-Specific Implementation: Under the hood, each connector contains the specific logic needed to communicate with its target external system (e.g., a Kafka connector uses the Kafka client library, a JDBC connector uses a JDBC driver).
Configuration: Users configure connectors by providing necessary details like connection endpoints (URLs, host/port), credentials, topic/table names, data formats, and other system-specific parameters within the standardized framework syntax.
Lifecycle Management: The stream processing framework manages the lifecycle of the connector instances (starting, stopping, scaling, handling failures) as part of the overall streaming job.

Types of Connectors:

Source Connectors: Read data from an external system and convert it into a stream of records consumable by the stream processing engine. Examples: Kafka Source, Pulsar Source, Kinesis Source, Debezium CDC Source, JDBC Source.
Sink Connectors: Take a stream of processed records from the stream processing engine and write them to an external system. Examples: Kafka Sink, Iceberg Sink, JDBC Sink, Elasticsearch Sink, Print Sink (for debugging).

Key Benefits

Simplified Integration: Drastically reduces the effort needed to connect to external systems.
Reusability: A single connector can be used in multiple pipelines connecting to the same type of external system.
Modularity: Decouples the stream processing logic from the specifics of external system interaction.
Maintainability: Connectors are often maintained and updated by the framework provider or a community, handling changes in external system APIs or protocols.
Focus on Logic: Allows developers to concentrate on the core stream processing and business logic.

Connectors in RisingWave

Connectors are fundamental to getting data into and out of RisingWave:

'CREATE SOURCE': This SQL command defines a connection to an external data source. Users specify the connector type (e.g., 'kafka', 'pulsar', 'kinesis', 'postgres-cdc'), connection parameters, data format ('json', 'avro', 'protobuf', etc.), and schema. RisingWave uses the specified source connector to continuously ingest data.
'CREATE SINK': This SQL command defines a connection to an external data destination. Users specify the connector type (e.g., 'kafka', 'iceberg', 'jdbc', 'redis'), connection parameters, input data source (usually a Materialized View or Table in RisingWave), and data format. RisingWave uses the specified sink connector to push changes from the input source to the external system.
Built-in & Extensible: RisingWave provides a growing number of built-in connectors for common systems and aims to offer mechanisms for users to develop custom connectors in the future.

Connectors make RisingWave a highly integrated component within a larger data ecosystem, enabling it to participate in complex, end-to-end data pipelines.

Connector

The Problem: Integrating Diverse Systems

How Connectors Work: Abstraction and Configuration

Key Benefits

Connectors in RisingWave

Related Blog Posts

Related Glossary Terms