Kubernetes operator for RisingWave.

Last update: Jan 5, 2023

Comments: 16

Introduction

The RisingWave Kubernetes Operator is a RisingWave deployment management tool based on kubernetes. The risingwave-operator currently supports the following custom resources:

risingwave.singularity-data.com
risingwave-monitor.singularity-data.com(not implement)

Quick Start

Install cert-manager

We need install cert-manager in cluster before install risingwave-operator.

The default static configuration cert-manager can be installed as follows:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.0/cert-manager.yaml

More information on this install cert-manager method can be found here.

Install risingwave-operator

Install risingwave-operator can be installed as follows:

kubectl apply -f https://raw.githubusercontent.com/singularity-data/risingwave-operator/main/config/risingwave-operator.yaml

Examples

You can deploy RisingWave which use MinIO on Linux/amd64 arch nodes as follows:

kubectl create namespace test
kubectl apply -f https://raw.githubusercontent.com/singularity-data/risingwave-operator/main/examples/minio-risingwave-amd.yaml

First Query

Install psql

To connect to the RisingWave server, you will need to install PostgreSQL shell (psql) in advance.

Query

We use kubernetes NodePort service for frontend.

Please get the nodePort of the frontend service as psql port and get the INTERNAL-IP address of any node as follows:

PHOST=`kubectl get node -o=jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}'`

PPORT=`kubectl get service -n test test-risingwave-amd64-frontend -o=jsonpath='{.spec.ports[0].nodePort}'`

Connect to the frontend by psql as follows:

psql -h $PHOST -p $PPORT -d dev

Configuration

You can get risingwave-operator configuration as follows:

kubectl get cm risingwave-operator-controller-manager-config -n risingwave-operator-system -oyaml

If you edit the configmap, please kill the risingwave-operator pods and configuration file will be load.

License

The risingwave-operator is under the Apache License 2.0. Please refer to LICENSE for more information.

Contributing

Thanks for your interest in contributing to the project! Please refer to Contribution and Development Guidelines for more information.

Owner

Singularity Data

Building the next-generation streaming database in the cloud.

https://github.com/singularity-data/risingwave-operator

Comments

chore: move the Prometheus stack to the monitoring namespace
What's changed and what's your intention?

PLEASE DO NOT LEAVE THIS EMPTY !!!

PR changes the monitoring namespace from default to monitoring (see https://github.com/risingwavelabs/risingwave-operator/issues/225)

We should merge this after merging https://github.com/risingwavelabs/risingwave-operator/pull/224 This PR will need updates after above PR is merged.

Checklist

[x] I have written the necessary docs and comments

[x] Test if stack installs in different ns if you use -n <namespace>

[ ] Test if prometheus remote write works.

[x] Add to readme that user should run install.sh -h to get more info e.g. about custom namespace

Refer to a related PR or issue link (optional)
feat: support S3-compatible object storages
Signed-off-by: arkbriar [email protected]

What's changed and what's your intention?

PLEASE DO NOT LEAVE THIS EMPTY !!!

Please explain IN DETAIL what the changes are in this PR and why they are needed:

Update: protocol between operator and kernel has been changed.

After a discussion with @wcy-fdu and @hzxa21, we decided to use the following environment variables to set the configurations for the s3-compatible backend type:

S3_COMPATIBLE_BUCKET

S3_COMPATIBLE_REGION

S3_COMPATIBLE_ACCESS_KEY_ID

S3_COMPATIBLE_SECRET_ACCESS_KEY

S3_COMPATIBLE_ENDPOINT

Path style (normal), e.g., https://cos.ap-guangzhou.myqcloud.com

Virtual-hosted style, commonly in a format of https://{bucket}.{some prefix}{region}.{domain suffix}, e.g., for the Aliyun OSS, it's https://{bucket}.oss-{region}.aliyuncs.com.

Add Aliyun OSS as one of the supported object storage types.

storages: object: aliyunOSS: secret: aliyun-oss-credentials region: cn-hangzhou bucket: hummock001 internalEndpoint: true

Add fields endpoint and virtualHostedStyle in the s3 spec to widely support the s3-compatible backend type. Also, add another field region for overriding the Region key in the secret, so that the region can not be changed by updating the secret, which may result in unstable object storage.

# When the endpoint is specified, the hummock type will be automatically set to `s3-compatible`. # Endpoint can contain two variables: ${REGION} and ${BUCKET}, and they will be automatically interpreted by the operator. storages: object: s3: secret: cos-credentials bucket: hummock001 region: ap-guangzhou endpoint: cos.${REGION}.myqcloud.com virtualHostedStyle: true

Improve the webhook for validating the newly supported object storage

Add unit tests

Example YAML:

RisingWave with Aliyun OSS

RisingWave with S3-compatible (COS)

Checklist

[x] I have written the necessary docs and comments

[x] I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

tracking issue: #246 kernel side PR: risingwavelabs/risingwave#6152
feat: expose metrics about the operator
What's changed and what's your intention?

WORK IN PROGRESS! DO NOT MERGE!

Implementing feature request from Issue 123

Summarize your change (mandatory)

How does this PR work? Need a brief introduction for the changed logic (optional)

Describe clearly one logical change and avoid lazy messages (optional)

Describe any limitations of the current code (optional)

Checklist

[ ] I have written the necessary docs and comments

[ ] I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)
fix: fix the label selector JSON path of scale view
Signed-off-by: arkbriar [email protected]

What's changed and what's your intention?

PLEASE DO NOT LEAVE THIS EMPTY !!!

Please explain IN DETAIL what the changes are in this PR and why they are needed:

Fix the selectorpath of RisingWaveScaleView required by the scale sub-resource

Checklist

[x] I have written the necessary docs and comments

[x] I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)
ci: prohibit PR from decreasing code coverage
What's changed and what's your intention?

See Issue https://github.com/risingwavelabs/risingwave-operator/issues/269

Checklist

[ ] I have written the necessary docs and comments

[ ] I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)
feat: create ServiceMonitor for Operator
What's changed and what's your intention?

Currently custom the metrics from the RW operator are not properly scraped. This PR will introduce the require K8s objects to ingest the metrics into prometheus.

Original comment by @arkbriar here:

In a deployment where the Prometheus operator is also installed. This should be an optional setting. We can add a doc about how to apply this. We can leave it to another PR.

Checklist

[x] I have written the necessary docs and comments

[x] I have added necessary unit tests and integration tests

Refer to a related PR or issue link

Exposed the metrics created for https://github.com/risingwavelabs/risingwave-operator/issues/123 Follow up for https://github.com/risingwavelabs/risingwave-operator/pull/214
feat: exposing metrics about the operator
As the project grows, we need to expose metrics about the webhooks and controllers to build the observability & maintenance stacks.

Metrics we'd like to have:

Webhooks:

webhook_request_count, labels:

type, the value should be mutating or validating

group, the target resource group of the webhook, e.g., risingwave.risingwavelabs.com

version, the target API version, e.g., v1alpha1

kind, the target API kind, e.g., risingwave, risingwavepodtemplate

namespace, the namespace of the object, e.g., default

name, the name of the object

verb, the verb (action) on the object which triggers the webhook, the value should be one of "create", "update", and "delete".

webhook_request_pass_count, with the same labels as webhook_request_count

webhook_request_reject_count, with the same labels as webhook_request_count

webhook_request_panic_count, with the same labels as webhook_request_count

Controllers:

controller_reconcile_count, labels:

group, version, kind mentioned above

namespace, name

controller_reconcile_requeue_count, with the same labels as controller_reconcile_count

controller_reconcile_error_count, with the same labels as controller_reconcile_count

controller_reconcile_requeue_after (could be a histogram), with additional labels:

after, the duration before the next requeue in milliseconds

controller_reconcile_duration, the time elapsed during the reconciliation, with the same labels as controller_reconcile_count

controller_reconcile_panic_count, with the same labels as controller_reconcile_count

The collectors of these metrics can all be implemented by using a proxy pattern.
docs(rfc): RFC-0004 expose scale subresource with `RisingWaveScaleView`
Signed-off-by: arkbriar [email protected]

What's changed and what's your intention?

PLEASE DO NOT LEAVE THIS EMPTY !!!

Please explain IN DETAIL what the changes are in this PR and why they are needed:

Propose a new CR to provide the scale subresource which is essential to most of the official and third-party auto-scaling services (HPA/KEDA). The details are described in the documentation.

Rendered at: https://github.com/risingwavelabs/risingwave-operator/blob/rfc/component-groups-scale-view/docs/rfc/0004_component_groups_scale_view.md

Checklist

[x] I have written the necessary docs and comments

[x] I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

to #195
proposal: new CR for tracking each component's replicas
The background is that we have a RisingWave CR, which supports scale but has four components, each of which can be scaled individually. Built-in workloads like Deployment have a sub-resource /scale for tracking and controlling the scale behaviors, and it's impossible to provide one in our RisingWave, as we said before. Furthermore, we can not leverage the /scale sub-resource to perform the scale in/out of some components due to the inconsistency it would cause. Imagine that you scaled the replicas of the frontend's Deployment from 2 to 3, but the frontend replicas in RisingWave is still 2. The operator will notice that and sync it back to 2 immediately. It completely doesn't work.

However, I think the /scale sub-resource is a really good abstraction. We can design some new CR to control the scale progress and ensure we're syncing the RisingWave and not touching its child workloads. It's impossible at the instance level but possible at the component (and group) level.

I think the new CR should:

target one RisingWave at a time and can not be changed (need some bindings)

target one component at a time

be able to target several groups of that component and create/delete new groups if necessary

provide the /scale sub-resource and supports scale with it

But there're still some things I haven't thought out:

What's the relationship between the new CR and RisingWave? Should RisingWave own this CR? Or just some implicit bindings there, like the PV and PVC?

How do we define the scaling behavior if there're multiple groups? For example, we prefer scaling in from one group, but what should we do if its replicas are 0?

...
Bug: Issue running psql commands on risingwave cluster
There is an issue running the example psql queries on the risingwave readme on the risingwave cluster. Running the example queries result in an rpc error. This error is not replicated when running the same queries with the docker image docker run -it --pull=always -p 4566:4566 -p 5691:5691 ghcr.io/singularity-data/risingwave:latest playground

Expected Behavior

No errors.

Current Behavior

RPC error which causes the compute node to crash.

Steps to Reproduce

Set up the operator as per the risingwave operator readme but use this yaml for the risingwave object. This is similar to the default settings but implements a nodeport to access the frontend node.

Enter psql shell

PHOST=`kubectl get node -o=jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}'` PPORT=`kubectl get service -n test test-risingwave-amd64-frontend-nodeport -o=jsonpath='{.spec.ports[0].nodePort}'` psql -h PHOST -p PPORT -d dev -U root

Run psql queries

/* create a table */ create table t1(v1 int not null); /* create a materialized view based on the previous table */ create materialized view mv1 as select sum(v1) as sum_v1 from t1; /* insert some data into the source table */ insert into t1 values (1), (2), (3); /* (optional) ensure the materialized view has been updated */ flush; /* the materialized view should reflect the changes in source table */ select * from mv1;):
refactor: rethink about the MinIO interface

We shouldn't care about how to deploy and operate the MinIO service. I think we can leverage the operator from the community.

https://github.com/minio/operator
Support the newly added command line arguments when starting the compute nodes
Two command line arguments were added for the compute nodes in this PR risingwavelabs/risingwave#6767:

--parallelism <PARALLELISM>

--total-memory-bytes <TOTAL_MEMORY_BYTES>

These two parameters will have a default value of the current container/system limits after merging the container awareness PR. But the operator could be the source of truth if there are CPU/memory resource limits set on the Pods.

Therefore, we should better specify the arguments when starting the compute nodes with the values of the resource limits.
ci: provide an action for starting a local k8s with risingwave-operator to use in GitHub Action workflows

How about providing a github action, which we can use it to create a kind cluster with risingwave-operator. ^_^

https://docs.github.com/en/actions/creating-actions/about-custom-actions
Resolve dependencies among environment variables and keep the Pod template consistent
See #258 for the related method and explanation.

Kubernetes allows dependencies between environment variables, and one can reference another in its value, like the following:

- name: ENV_A valueFrom: secretKeyRef: name: secret-name key: key - name: ENV_B value: "$(ENV_A)_suffix"

In the example above, the value of ENV_B depends on the value of ENV_A, which is loaded from a Secret. This makes it easier to dispatch configurations dynamically.

However, the variables' order must be enforced to resolve the values correctly: one variable must be after its dependencies. So here comes another order:

Compare the env a and b.

If a depends on b directly or transiently, then b < a.

Otherwise, compare the name of a and b in alphabetical order.
Test coverage should not decrease
Currently you can introduce PRs that decrease the overall code coverage. This could happen if you

Delete tests

Introduce new features without testing them

IMHO we should introduce a CI test that aborts PRs which decrease code coverage.
Warn developers about missing docstrings in PRs

IMHO we should warn (not block) developers if they do not provide docstrings. Warnings should be given for every function touched in that PR which does not have a valid docstring.

Also see go docs: comments

Related tags

DevOps Tools risingwave-operator

An operator which complements grafana-operator for custom features which are not feasible to be merged into core operator

Grafana Complementary Operator A grafana which complements grafana-operator for custom features which are not feasible to be merged into core operator

Aug 16, 2022

Terraform-operator - The Terraform Operator provides support to run Terraform modules in Kubernetes in a declaritive way as a Kubernetes manifest

Terraform Operator The Terraform Operator provides support to run Terraform modu

Oct 19, 2022

The Elastalert Operator is an implementation of a Kubernetes Operator, to easily integrate elastalert with gitops.

Elastalert Operator for Kubernetes The Elastalert Operator is an implementation of a Kubernetes Operator. Getting started Firstly, learn How to use el

Jun 28, 2022

Minecraft-operator - A Kubernetes operator for Minecraft Java Edition servers

Minecraft Operator A Kubernetes operator for dedicated servers of the video game

Dec 15, 2022

K8s-network-config-operator - Kubernetes network config operator to push network config to switches

Kubernetes Network operator Will add more to the readme later :D Operations The

May 16, 2022

Pulumi-k8s-operator-example - OpenGitOps Compliant Pulumi Kubernetes Operator Example

Pulumi GitOps Example OpenGitOps Compliant Pulumi Kubernetes Operator Example Pr

May 6, 2022

Kubernetes Operator Samples using Go, the Operator SDK and OLM

Kubernetes Operator Patterns and Best Practises This project contains Kubernetes operator samples that demonstrate best practices how to develop opera

Nov 24, 2022

Test Operator using operator-sdk 1.15

test-operator Test Operator using operator-sdk 1.15 operator-sdk init --domain rbt.com --repo github.com/ravitri/test-operator Writing kustomize manif

Dec 28, 2021

a k8s operator 、operator-sdk

helloworld-operator a k8s operator 、operator-sdk Operator 参考 https://jicki.cn/kubernetes-operator/ https://learnku.com/articles/60683 https://opensour

Jan 27, 2022

Operator Permissions Advisor is a CLI tool that will take a catalog image and statically parse it to determine what permissions an Operator will request of OLM during an install

Operator Permissions Advisor is a CLI tool that will take a catalog image and statically parse it to determine what permissions an Operator will request of OLM during an install. The permissions are aggregated from the following sources:

Apr 22, 2022

The OCI Service Operator for Kubernetes (OSOK) makes it easy to connect and manage OCI services from a cloud native application running in a Kubernetes environment.

OCI Service Operator for Kubernetes Introduction The OCI Service Operator for Kubernetes (OSOK) makes it easy to create, manage, and connect to Oracle

Sep 27, 2022

Kubernetes operator for RisingWave.

Introduction

Quick Start

Install cert-manager

Install risingwave-operator

Examples

First Query

Install psql

Query

Configuration

License

Contributing

Owner

Singularity Data

Comments

chore: move the Prometheus stack to the monitoring namespace

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link (optional)

feat: support S3-compatible object storages

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link (optional)

feat: expose metrics about the operator

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link (optional)

fix: fix the label selector JSON path of scale view

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link (optional)

ci: prohibit PR from decreasing code coverage

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link (optional)

feat: create ServiceMonitor for Operator

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link

feat: exposing metrics about the operator

docs(rfc): RFC-0004 expose scale subresource with `RisingWaveScaleView`

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link (optional)

proposal: new CR for tracking each component's replicas

Bug: Issue running psql commands on risingwave cluster

Expected Behavior

Current Behavior

Steps to Reproduce

refactor: rethink about the MinIO interface

Support the newly added command line arguments when starting the compute nodes

ci: provide an action for starting a local k8s with risingwave-operator to use in GitHub Action workflows

Resolve dependencies among environment variables and keep the Pod template consistent

Test coverage should not decrease

Warn developers about missing docstrings in PRs

Related tags

An operator which complements grafana-operator for custom features which are not feasible to be merged into core operator

Terraform-operator - The Terraform Operator provides support to run Terraform modules in Kubernetes in a declaritive way as a Kubernetes manifest

The Elastalert Operator is an implementation of a Kubernetes Operator, to easily integrate elastalert with gitops.

Minecraft-operator - A Kubernetes operator for Minecraft Java Edition servers

K8s-network-config-operator - Kubernetes network config operator to push network config to switches

Pulumi-k8s-operator-example - OpenGitOps Compliant Pulumi Kubernetes Operator Example

Kubernetes Operator Samples using Go, the Operator SDK and OLM

Test Operator using operator-sdk 1.15

a k8s operator 、operator-sdk

Operator Permissions Advisor is a CLI tool that will take a catalog image and statically parse it to determine what permissions an Operator will request of OLM during an install

The OCI Service Operator for Kubernetes (OSOK) makes it easy to connect and manage OCI services from a cloud native application running in a Kubernetes environment.

PolarDB-X Operator is a Kubernetes extension that aims to create and manage PolarDB-X cluster on Kubernetes.

Kubernetes Operator to sync secrets between different secret backends and Kubernetes

The NiFiKop NiFi Kubernetes operator makes it easy to run Apache NiFi on Kubernetes.

Kubernetes Operator for a Cloud-Native OpenVPN Deployment.

Modular Kubernetes operator to manage the lifecycle of databases

Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)

A Kubernetes Operator used for pre-scaling applications in anticipation of load

Kubernetes operator to autoscale Google's Cloud Bigtable clusters