Towards Sub-100ms Latency Stream Processing with an S3-Based Architecture

Towards Sub-100ms Latency Stream Processing with an S3-Based Architecture

·

8 min read

Stream processing systems have been widely adopted across industries, especially in latency-sensitive scenarios - such as ad recommendation and bidding, stock/crypto trading systems, cybersecurity monitoring, sports betting, and more. In these domains, result freshness speed directly translates to business value: the lower the latency, the greater the impact.

However, to make these systems run reliably - especially at scale and in the cloud - the underlying system architecture must change. Systems that still rely on local disks and embedded databases (e.g., RocksDB) to manage internal states may not become obsolete immediately, but they will hit a wall eventually.

RisingWave is a high-performance streaming database built in Rust. It’s PostgreSQL-compatible and lets users write sophisticated stream processing logic using standard SQL - no need to learn a new DSL or framework.

From day one, our goal was clear: build a cloud-native streaming system with S3 as primary storage. Not “S3 as a backup,” not “S3 as an option” - but the foundation. No local disks. No RocksDB.

To make that possible, we rebuilt the entire state management layer. RisingWave is designed to support high concurrency, low latency, multi-tenancy, and elastic scaling - all while storing state directly in S3. The challenge was making the system real-time and robust under heavy workloads without relying on local state. We made it work.

Why We Chose S3: Not Because It’s Fast, But Because It’s Reliable

From the beginning, we made a conscious decision to store all persistent state in object storage like S3. S3 is highly available, cross-AZ, and practically infinite in capacity. It's also the default storage layer in most cloud environments.

The main advantage of S3 isn't speed - it's simplicity and reliability. You don’t need to deal with disk attachment, scheduling, failover, or instance-local state. Persistent data lives in S3, which naturally decouples compute from storage. That makes the system elastic and easier to recover.

100-200 millisecond latency is considered good in S3.

But S3 has trade-offs. Its average latency is fine, but tail latency is unpredictable - individual requests can spike to 200–300ms or more. For stream processing, that’s unacceptable. Even worse, S3 charges per request. Stream processors constantly read and write small chunks of state. Each operation might only touch a few hundred bytes, but high access frequency adds up fast. In some setups we tested, S3 request costs exceeded compute costs.

The takeaway is simple: you can write to S3, but you must avoid reading from it during queries. S3 is the durable layer - not the query path. That’s why we designed a three-level state architecture: in-memory cache, local disk cache, and S3 as persistent storage.

Why RocksDB-Based Systems Don’t Scale in the Cloud

Many stream processing systems today still rely on local disks and RocksDB to manage state. This model has been around for a while and works fine in simple, single-tenant setups. Apache Flink, for example, uses RocksDB as its default state backend - state is kept on local disks, and periodic checkpoints are written to external storage for recovery.

But this architecture doesn’t hold up at scale. Once you need multi-tenancy, cross-AZ availability, elastic scaling, complex queries, or clean state isolation, local disks become a bottleneck. Each node’s state is tied to its disk. Scaling out requires migrating state. Failover means state rebuilds. If a node crashes, you either remount disks or reload state from scratch. It’s fragile and operationally expensive.

Compaction makes things worse. RocksDB performs compaction on the same node as compute. When compaction kicks in, it contends with user queries for CPU and I/O. The result - latency spikes, jitter, backpressure, and in bad cases, system instability. This is especially painful in stream processing workloads involving joins, aggregations, or windowing.

The core problem isn’t RocksDB’s code. It’s that RocksDB was never built for cloud-native stream processing. It assumes local disks, couples state tightly to the instance lifecycle, and has no native support for remote compaction or object storage. These design choices no longer fit modern cloud infrastructure.

Hummock: A Clean Break from RocksDB

Instead of patching around RocksDB’s limitations, we started over. We built Hummock - a new state engine designed from the ground up for object storage and modern stream processing.

Hummock is a S3-based storage engine for state management in RisingWave.

Hummock follows an LSM-tree model, but every part of its design centers on a multi-tier architecture: in-memory cache, disk cache, and object storage. Writes are append-only to minimize random I/O. Each barrier checkpoint generates a new snapshot, providing natural multi-version isolation. Queries always run on the latest snapshot, so there's no read-write contention.

State is stored in immutable SSTs (Sorted String Tables). Once written, SSTs are uploaded to S3. The write-once nature of SSTs fits well with how object storage works. A metadata layer tracks indexing and compaction, keeping the system organized and easy to maintain.

To ensure performance, Hummock is tightly integrated with the local disk cache. All query-path reads hit the cache - S3 is never used during query execution. Even after SSTs are uploaded to S3, Hummock keeps them around locally to avoid unnecessary cache misses.

Most importantly, Hummock is natively distributed. It supports horizontal scaling without shuffling state, fast recovery without state rebuilds, and decoupled compute and storage via remote compaction. This isn’t a legacy engine patched for the cloud - it’s built for the cloud from day one.

Remote Compaction: Keeping Query Paths Clean

In traditional setups, RocksDB performs compaction on the same node that runs queries. This leads to resource contention - CPU and I/O are shared - and query latency becomes unstable under load.

That might be acceptable in OLTP systems, where occasional spikes are fine. But in stream processing, low and consistent latency is critical. Any jitter creates backpressure and disrupts the pipeline.

RisingWave takes a different approach. We offload compaction to remote nodes. Compute nodes only handle reads and writes. Dedicated compactor nodes pull data from S3, perform compaction in the background, and push results back - all outside the query path.

In RisingWave Cloud, compaction is serverless and elastic. A shared compaction pool handles multiple tenants. Resources scale up or down automatically, and idle workers shut down on their own. This keeps costs low while maintaining performance isolation.

In self-managed setups, you can run dedicated compactor nodes. These are lightweight, easy to manage, and have no impact on query performance.

Why We Default to EBS over NVMe

To avoid frequent reads from S3, Hummock uses a disk cache to store warm state. This layer is critical - it directly affects latency and system stability.

By default, we use EBS to cache internal states.

We support two options: NVMe local disks and block storage like AWS io2 EBS.

NVMe offers great performance. Latency is low, throughput is high. But it comes with trade-offs - not all instance types support NVMe, disk capacity is fixed, and data is wiped when the instance shuts down.

EBS, on the other hand, is slower in raw numbers but more flexible. It supports elastic scaling, survives instance terminations, works across AZs, and is available on nearly all instance types. With io2 volumes, IOPS and tail latency are good enough for most streaming workloads - especially when paired with Hummock’s in-memory and disk-level cache strategies.

That’s why we default to EBS - not for performance, but for operational stability. It scales better, is easier to manage, and fits more cloud environments.

We still support NVMe, and users can opt in if their use case demands ultra-low latency. But by default, we pick EBS for these reasons:

  1. Many instance types don’t support local NVMe disks.

  2. NVMe capacity is fixed and not expandable.

  3. Local disks are ephemeral - data is lost when instances stop.

  4. EBS is easier to standardize across clouds and instance families.

  5. io2 EBS delivers stable performance and acceptable latency for caching state.

Foyer: Smart Disk Cache Management

To make disk caching effective, we built Foyer - a dedicated cache manager that runs on each compute node. It handles all disk I/O, manages eviction, tracks hot and cold data, and monitors hit rates in real time.

Foyer brings much better performance. Source: https://foyer.rs/docs/overview

Its main goal is straightforward: keep S3 out of the query path. Every query first checks in-memory cache. If that misses, it falls back to the disk cache managed by Foyer. S3 is only touched when absolutely necessary - and even then, access is tightly controlled to avoid unnecessary costs.

Foyer is deeply integrated with Hummock. It understands the query DAG and can automatically identify which parts of the state are hot and which are archival. That insight lets it prioritize what stays on disk and what can be safely dropped.

This separation of responsibilities - compute, disk caching, and object storage - is what makes RisingWave able to hit low latency targets while scaling out reliably.

Final Thoughts

People have spent years trying to make RocksDB work in the cloud. But that’s never going to be clean. RocksDB was built for local storage, tight instance coupling, and single-tenant use. None of that fits how modern cloud systems work.

We took a different route. S3 is the shared base layer of the cloud. So we treated it as the foundation.

We're not optimizing for S3 performance. We're avoiding it on the query path.

  • Hot data stays in memory

  • Warm data sits on disk - EBS or NVMe - managed by Foyer

  • Cold data goes to S3 - only used during compaction or recovery

Writes are append-only. Queries see snapshot-isolated views. Compaction is remote and async. The cache system is smart enough to know what to keep warm and what to push out.

This is what a cloud-native stream processing system should look like: consistent latency, clean separation between compute and storage, and horizontal scalability without hacks.

Sub-100ms latency isn't a stretch goal. It's the default.

The Modern Backbone for Your
Event-Driven Infrastructure
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.