One Step Towards Cloud: The RisingWave Operator

To tackle the problem, we developed the RisingWave operator, an extension that features the operations of managing RisingWave on Kubernetes. It supports out-of-the-box restarting, scaling, and upgrading operations on a RisingWave cluster. The RisingWave operator is now open-sourced and it is one step further toward the cloud era for RisingWave.

Background and Motivation

“Cloud-native” is a word that nowadays everybody is mentioning. Essentially it means that a system is running on the cloud and is resilient, manageable, and observable. RisingWave is a cloud-native streaming database. It is designed to fully leverage the elasticity provided by the cloud. However, deploying and maintaining RisingWave on the cloud is still a challenging technical problem. In particular, think of the following questions:

How to deploy a streaming database on a cloud vendor, e.g., AWS, Azure, or GCP?
How to scale the database when there’s too much load? Can the scaling process be automatic?
How to perform a recovery quickly when some of the components fail?

Fortunately, there’s Kubernetes. Kubernetes is an open-source system for automating the deployment, scaling, and management of containerized applications over a cluster of machines. It provides a large set of APIs for abstracting the desired resources. With Kubernetes, people won’t worry about vendor lock-in when they deploy their applications. It is not surprising to bring RisingWave onto Kubernetes.

We developed the RisingWave operator, an extension that features the operations of managing RisingWave on Kubernetes. Writing an operator is a challenging task. In this blog, we give an overview of the features of the operator, the challenges when we implement it, and our solutions to these challenges.

Kubernetes Operator Basics

Before we get into the operator, let’s briefly introduce some core concepts in Kubernetes.

A Pod is the smallest unit to be scheduled in Kubernetes. Each Pod can have multiple containers and usually shares the same namespace isolation. A Pod is a fundamental component that Kubernetes provides to abstract the computing resources. It's no exaggeration to say that the whole world is built on it.
A Service is an abstraction to expose an application running on a set of Pods as a network service.
Deployment and StatefulSet are both workload resources. Workload resources help run applications on Kubernetes and benefit from their advanced features. For example, with Deployment, you can simply scale the application by setting the desired replicas, either in or out. The key difference between Deployment and StatefulSet is that Deployment is only suitable for stateless applications, while StatefulSet will try its best to keep the state alive, e.g., the persistent volumes mounted.

Aside from the built-in resources, Kubernetes introduces custom resources (i.e CustomResourceDefinition, CRD) to make extensions of its APIs. The CRD API allows users to customize their own resources. Usually, there is a controller for each resource type. But there’s no such controller for a custom resource in core Kubernetes components. The developers usually provide a controller alongside the custom resources.

So what exactly is a Kubernetes operator? A Kubernetes operator is a set of CRDs and the corresponding controllers in a broad sense. The controllers work to ensure the extended APIs eventually take effect as expected by following the operator pattern.

After introducing some basics in a Kubernetes operator, the following sections detail how exactly the RisingWave operator is built on top of these abstractions.

Implementing the RisingWave Operator

The RisingWave operator is a Kubernetes operator that supports the management of RisingWave clusters. It enables easy, efficient, and automatic operations for deploying, upgrading, monitoring, and scaling a RisingWave cluster. To implement the RisingWave operator, the technical key challenge is two-fold:

How to represent RisingWave in Kubernetes?
How to build up the workflow for managing different components in the controller?

Abstracting RisingWave in Operator

To abstract RisingWave in Kubernetes, there are three major problems to solve.

Abstracting components with workload resources

RisingWave has four main components: meta server, frontend, compute node, and compactor node. You can refer to the design documentation for more details. We can regard all component nodes as stateless because RisingWave is built on cloud-native shared storage. Then it is straightforward to use Deployment for every single component. However, considering the local caches in the compute nodes, we managed them with StatefulSet instead of Deployment. StatefulSet has many guarantees about the states, e.g., it will keep the order while upgrading replicas, but all we need is its storage guardian.

Stable network identifiers for communication

In Kubernetes, each Pod is assigned an internal IP when created. The IP is accessible inside Kubernetes unless further restrictions exist. However, those internal IPs cannot be used directly for processes to communicate with each other, as Pods are destroyed and re-created in many cases, for example, when a failover or an upgrade happens. In these cases, the IPs of the same component will be changed during the lifetime of the RisingWave instance. Therefore, we launch one separate Service for each component. These services withhold immutable internal IPs as a stable identifier. We leverage the service discovery mechanism to let the nodes know each other.

Connecting different resources

Are these resources enough for a RisingWave instance? The answer is, unfortunately, no for a production-ready operator implementation. To support more flexibility in the deployment and management, we also develop:

A ConfigMap, for storing the configurations. It will be mounted to the running Pods.
Workload resource groups. The workload resources of the same component can be divided into groups and managed individually. For example, we allow the compute nodes to run in two different availability zones.
A ServiceMonitor, for declaring the scrape tasks for Prometheus. This is an optional sub-resource. It will only be synced when the Prometheus operator is installed.

Here’s an overview of the relationship between different sub-resources in RisingWave.

The relationship between different sub-resources in RisingWave

Implementing the Controller

In Kubernetes, these workflows of managing different resources are defined in a controller. Below we discuss the details of implementing the controller in the RisingWave operator.

Reaction as a basic unit

Implementing a controller is quite involved. Each of the different sub-resources has its own state and logic. We divide the complex procedure into small pieces: each piece only cares about some of the sub-resources and tries to do its best. For example, the frontend Deployment should only be updated when there’s a related spec change, such as modifications on the replicas, resources, or node selectors. In this case, we only have to choose from the following actions:

Create one when there’s no such resource found.
Update it according to RisingWave’s specs.
Delete it if it’s outdated and not needed anymore.

These actions form a reaction to RisingWave’s specs and the real states. Each sub-resource has its own reaction. No matter what happens, a reaction will find a way to ensure that the Deployment is eventually online and is consistent with what the spec desires.

Reactive workflow

We designed a reactive workflow and provided two primitives to organize the reactions in a sequential procedure.

Join reactions to let the reactions all run but join the results. Essentially Join is for running independent reactions together.
Sequentialreactions to let the reactions run sequentially. It ensures that the dependent reactions are never run until conditions are fulfilled.

Alongside the primitives for organizing reactions into workflows, we also designed several decorators for enhancing the reactions but not changing their intents:

Timeout decorates the reaction and tries its best to timeout the procedure after a given time.
Retry retries the reaction if an unexpected error occurs until reaching the limit.
Parallel runs the underlying reaction in parallel.
Shared marks a reaction as shared so that different workflow branches can share it but only run it once.

Besides, the workflow primitives also support building a structured concurrent workflow, which is the best pattern for parallel programming in my opinion. Below is a diagram demonstrating the current working workflow:

The workflow in the controller of RisingWave operator

In the diagram above, Sync Components and Sub-resources is a shared sub-workflow like the following:

'Sync Components and Sub-resources' elaborated

With reactions and workflows, the developers can benefit from several perspectives:

Focusing on a small unit, which normally only considers the sub-resources of a single facet, makes the code more robust and easier to maintain.
Organizing the workflows with primitives gives the greatest flexibility for adjusting the behaviors at will and decouples the common decorators from the reactions. In the RisingWave operator’s workflow, we pursue extreme efficiency by parallelizing almost every independent reaction.

Miscellaneous

We have also met a lot of other technical issues, e.g., the problems caused by using a client cache, which caches the objects and serves the get operations (then the data freshness is always a problem). However, we will defer them to future posts as those topics are beside the main point.

Future Improvements

After all the efforts described above, the RisingWave operator has integrated the basic functionalities for managing a RisingWave cluster on the cloud. Nevertheless, there is much more to be done. At the time of speaking, a lot of exciting features are under heavy design and development:

A kubectl plugin that helps operate RisingWave on Kubernetes more naturally.
Helm charts for deploying the operator, RisingWave, and the dependents.
Out-of-the-box integration with Grafana and Prometheus. The RisingWave operator will provide production-useful dashboards along with it in the future.

Serverless RisingWave

Cloud-nativity brings the paradigm shift of system designing from on-premise deployment to serverless deployment. A serverless database must react to the workload by automatically adjusting its deployment and even automatically scaling with the fluctuation of the workload. We are designing the solution for serverless RisingWave. This solution includes an auto-scaler for scaling the RisingWave cluster on top of the RisingWave operator. The scaler should be aware of the running status of the cluster and automatically call for a scale-out(in) operation of different nodes based on the running workload. Chances are, as a cloud-native streaming database built from scratch, the auto scaler can access the low-level metrics of the streaming dataflow and schedule the actors inside the streaming engine directly. This provides us the design space to explore fine-grained scaling policies to achieve a better cost-efficiency ratio.

This blog gives an overview of the RisingWave operator, some of its major technical challenges, and our solutions. The RisingWave operator is a Kubernetes operator that helps operate RisingWave in Kubernetes. It supports out-of-the-box restarting, scaling, and upgrading operations on a RisingWave cluster. Furthermore, it will support serverless RisingWave in the future. The RisingWave operator is one step further toward the cloud. It is the foundation of the upcoming cloud service. The future of stream processing is coming! If you want to take a sneak peek at RisingWave Cloud, which is backed by the RisingWave operator, visit our website for early access. You can also try to deploy your RisingWave cluster with the RisingWave operator. It is fully open-sourced, and all contributions are extremely welcomed! Finally, follow our Twitter and LinkedIn, join our Slack community, and stay tuned with RisingWave!