Kubernetes operator for RisingWave.

Introduction

The RisingWave Kubernetes Operator is a RisingWave deployment management tool based on kubernetes. The risingwave-operator currently supports the following custom resources:

  • risingwave.singularity-data.com
  • risingwave-monitor.singularity-data.com(not implement)

Quick Start

Install cert-manager

We need install cert-manager in cluster before install risingwave-operator.

The default static configuration cert-manager can be installed as follows:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.0/cert-manager.yaml

More information on this install cert-manager method can be found here.

Install risingwave-operator

Install risingwave-operator can be installed as follows:

kubectl apply -f https://raw.githubusercontent.com/singularity-data/risingwave-operator/main/config/risingwave-operator.yaml

Examples

You can deploy RisingWave which use MinIO on Linux/amd64 arch nodes as follows:

kubectl create namespace test
kubectl apply -f https://raw.githubusercontent.com/singularity-data/risingwave-operator/main/examples/minio-risingwave-amd.yaml

First Query

Install psql

To connect to the RisingWave server, you will need to install PostgreSQL shell (psql) in advance.

Query

We use kubernetes NodePort service for frontend.

Please get the nodePort of the frontend service as psql port and get the INTERNAL-IP address of any node as follows:

PHOST=`kubectl get node -o=jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}'`
PPORT=`kubectl get service -n test test-risingwave-amd64-frontend -o=jsonpath='{.spec.ports[0].nodePort}'`

Connect to the frontend by psql as follows:

psql -h $PHOST -p $PPORT -d dev

Configuration

You can get risingwave-operator configuration as follows:

kubectl get cm risingwave-operator-controller-manager-config -n risingwave-operator-system -oyaml

If you edit the configmap, please kill the risingwave-operator pods and configuration file will be load.

License

The risingwave-operator is under the Apache License 2.0. Please refer to LICENSE for more information.

Contributing

Thanks for your interest in contributing to the project! Please refer to Contribution and Development Guidelines for more information.

Owner
Singularity Data
Building the next-generation streaming database in the cloud.
Singularity Data
Comments
  • chore: move the Prometheus stack to the monitoring namespace

    chore: move the Prometheus stack to the monitoring namespace

    What's changed and what's your intention?

    PLEASE DO NOT LEAVE THIS EMPTY !!!

    PR changes the monitoring namespace from default to monitoring (see https://github.com/risingwavelabs/risingwave-operator/issues/225)

    We should merge this after merging https://github.com/risingwavelabs/risingwave-operator/pull/224 This PR will need updates after above PR is merged.

    Checklist

    • [x] I have written the necessary docs and comments
    • [x] Test if stack installs in different ns if you use -n <namespace>
    • [ ] Test if prometheus remote write works.
    • [x] Add to readme that user should run install.sh -h to get more info e.g. about custom namespace

    Refer to a related PR or issue link (optional)

  • feat: support S3-compatible object storages

    feat: support S3-compatible object storages

    Signed-off-by: arkbriar [email protected]

    What's changed and what's your intention?

    PLEASE DO NOT LEAVE THIS EMPTY !!!

    Please explain IN DETAIL what the changes are in this PR and why they are needed:

    Update: protocol between operator and kernel has been changed.

    After a discussion with @wcy-fdu and @hzxa21, we decided to use the following environment variables to set the configurations for the s3-compatible backend type:

    • S3_COMPATIBLE_BUCKET
    • S3_COMPATIBLE_REGION
    • S3_COMPATIBLE_ACCESS_KEY_ID
    • S3_COMPATIBLE_SECRET_ACCESS_KEY
    • S3_COMPATIBLE_ENDPOINT
      • Path style (normal), e.g., https://cos.ap-guangzhou.myqcloud.com
      • Virtual-hosted style, commonly in a format of https://{bucket}.{some prefix}{region}.{domain suffix}, e.g., for the Aliyun OSS, it's https://{bucket}.oss-{region}.aliyuncs.com.
    1. Add Aliyun OSS as one of the supported object storage types.
    storages:
      object:
        aliyunOSS:
          secret: aliyun-oss-credentials
          region: cn-hangzhou
          bucket: hummock001
          internalEndpoint: true
    
    1. Add fields endpoint and virtualHostedStyle in the s3 spec to widely support the s3-compatible backend type. Also, add another field region for overriding the Region key in the secret, so that the region can not be changed by updating the secret, which may result in unstable object storage.
    # When the endpoint is specified, the hummock type will be automatically set to `s3-compatible`.
    # Endpoint can contain two variables: ${REGION} and ${BUCKET}, and they will be automatically interpreted by the operator.
    storages:
      object:
        s3:
          secret: cos-credentials
          bucket: hummock001
          region: ap-guangzhou
          endpoint: cos.${REGION}.myqcloud.com
          virtualHostedStyle: true
    
    1. Improve the webhook for validating the newly supported object storage
    2. Add unit tests

    Example YAML:

    Checklist

    • [x] I have written the necessary docs and comments
    • [x] I have added necessary unit tests and integration tests

    Refer to a related PR or issue link (optional)

    tracking issue: #246 kernel side PR: risingwavelabs/risingwave#6152

  • feat: expose metrics about the operator

    feat: expose metrics about the operator

    What's changed and what's your intention?

    WORK IN PROGRESS! DO NOT MERGE!

    Implementing feature request from Issue 123

    • Summarize your change (mandatory)
    • How does this PR work? Need a brief introduction for the changed logic (optional)
    • Describe clearly one logical change and avoid lazy messages (optional)
    • Describe any limitations of the current code (optional)

    Checklist

    • [ ] I have written the necessary docs and comments
    • [ ] I have added necessary unit tests and integration tests

    Refer to a related PR or issue link (optional)

  • fix: fix the label selector JSON path of scale view

    fix: fix the label selector JSON path of scale view

    Signed-off-by: arkbriar [email protected]

    What's changed and what's your intention?

    PLEASE DO NOT LEAVE THIS EMPTY !!!

    Please explain IN DETAIL what the changes are in this PR and why they are needed:

    • Fix the selectorpath of RisingWaveScaleView required by the scale sub-resource

    Checklist

    • [x] I have written the necessary docs and comments
    • [x] I have added necessary unit tests and integration tests

    Refer to a related PR or issue link (optional)

  • ci: prohibit PR from decreasing code coverage

    ci: prohibit PR from decreasing code coverage

    What's changed and what's your intention?

    See Issue https://github.com/risingwavelabs/risingwave-operator/issues/269

    Checklist

    • [ ] I have written the necessary docs and comments
    • [ ] I have added necessary unit tests and integration tests

    Refer to a related PR or issue link (optional)

  • feat: create ServiceMonitor for Operator

    feat: create ServiceMonitor for Operator

    What's changed and what's your intention?

    Currently custom the metrics from the RW operator are not properly scraped. This PR will introduce the require K8s objects to ingest the metrics into prometheus.

    Original comment by @arkbriar here:

    In a deployment where the Prometheus operator is also installed. This should be an optional setting. We can add a doc about how to apply this. We can leave it to another PR.

    Checklist

    • [x] I have written the necessary docs and comments
    • [x] I have added necessary unit tests and integration tests

    Refer to a related PR or issue link

    Exposed the metrics created for https://github.com/risingwavelabs/risingwave-operator/issues/123 Follow up for https://github.com/risingwavelabs/risingwave-operator/pull/214

  • feat: exposing metrics about the operator

    feat: exposing metrics about the operator

    As the project grows, we need to expose metrics about the webhooks and controllers to build the observability & maintenance stacks.

    Metrics we'd like to have:

    • Webhooks:
      • webhook_request_count, labels:
        • type, the value should be mutating or validating
        • group, the target resource group of the webhook, e.g., risingwave.risingwavelabs.com
        • version, the target API version, e.g., v1alpha1
        • kind, the target API kind, e.g., risingwave, risingwavepodtemplate
        • namespace, the namespace of the object, e.g., default
        • name, the name of the object
        • verb, the verb (action) on the object which triggers the webhook, the value should be one of "create", "update", and "delete".
      • webhook_request_pass_count, with the same labels as webhook_request_count
      • webhook_request_reject_count, with the same labels as webhook_request_count
      • webhook_request_panic_count, with the same labels as webhook_request_count
    • Controllers:
      • controller_reconcile_count, labels:
        • group, version, kind mentioned above
        • namespace, name
      • controller_reconcile_requeue_count, with the same labels as controller_reconcile_count
      • controller_reconcile_error_count, with the same labels as controller_reconcile_count
      • controller_reconcile_requeue_after (could be a histogram), with additional labels:
        • after, the duration before the next requeue in milliseconds
      • controller_reconcile_duration, the time elapsed during the reconciliation, with the same labels as controller_reconcile_count
      • controller_reconcile_panic_count, with the same labels as controller_reconcile_count

    The collectors of these metrics can all be implemented by using a proxy pattern.

  • docs(rfc): RFC-0004 expose scale subresource with `RisingWaveScaleView`

    docs(rfc): RFC-0004 expose scale subresource with `RisingWaveScaleView`

    Signed-off-by: arkbriar [email protected]

    What's changed and what's your intention?

    PLEASE DO NOT LEAVE THIS EMPTY !!!

    Please explain IN DETAIL what the changes are in this PR and why they are needed:

    • Propose a new CR to provide the scale subresource which is essential to most of the official and third-party auto-scaling services (HPA/KEDA). The details are described in the documentation.

    Rendered at: https://github.com/risingwavelabs/risingwave-operator/blob/rfc/component-groups-scale-view/docs/rfc/0004_component_groups_scale_view.md

    Checklist

    • [x] I have written the necessary docs and comments
    • [x] I have added necessary unit tests and integration tests

    Refer to a related PR or issue link (optional)

    to #195

  • proposal: new CR for tracking each component's replicas

    proposal: new CR for tracking each component's replicas

    The background is that we have a RisingWave CR, which supports scale but has four components, each of which can be scaled individually. Built-in workloads like Deployment have a sub-resource /scale for tracking and controlling the scale behaviors, and it's impossible to provide one in our RisingWave, as we said before. Furthermore, we can not leverage the /scale sub-resource to perform the scale in/out of some components due to the inconsistency it would cause. Imagine that you scaled the replicas of the frontend's Deployment from 2 to 3, but the frontend replicas in RisingWave is still 2. The operator will notice that and sync it back to 2 immediately. It completely doesn't work.

    However, I think the /scale sub-resource is a really good abstraction. We can design some new CR to control the scale progress and ensure we're syncing the RisingWave and not touching its child workloads. It's impossible at the instance level but possible at the component (and group) level.

    I think the new CR should:

    • target one RisingWave at a time and can not be changed (need some bindings)
    • target one component at a time
    • be able to target several groups of that component and create/delete new groups if necessary
    • provide the /scale sub-resource and supports scale with it

    But there're still some things I haven't thought out:

    • What's the relationship between the new CR and RisingWave? Should RisingWave own this CR? Or just some implicit bindings there, like the PV and PVC?
    • How do we define the scaling behavior if there're multiple groups? For example, we prefer scaling in from one group, but what should we do if its replicas are 0?
    • ...
  • Bug: Issue running psql commands on risingwave cluster

    Bug: Issue running psql commands on risingwave cluster

    There is an issue running the example psql queries on the risingwave readme on the risingwave cluster. Running the example queries result in an rpc error. This error is not replicated when running the same queries with the docker image docker run -it --pull=always -p 4566:4566 -p 5691:5691 ghcr.io/singularity-data/risingwave:latest playground

    Expected Behavior

    No errors.

    Current Behavior

    RPC error which causes the compute node to crash. Screenshot from 2022-07-11 17-27-36

    Steps to Reproduce

    1. Set up the operator as per the risingwave operator readme but use this yaml for the risingwave object. This is similar to the default settings but implements a nodeport to access the frontend node.
    2. Enter psql shell
    PHOST=`kubectl get node -o=jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}'`
    PPORT=`kubectl get service -n test test-risingwave-amd64-frontend-nodeport -o=jsonpath='{.spec.ports[0].nodePort}'`
    psql -h PHOST -p PPORT -d dev -U root
    
    1. Run psql queries
    /* create a table */
    create table t1(v1 int not null);
    
    /* create a materialized view based on the previous table */
    create materialized view mv1 as select sum(v1) as sum_v1 from t1;
    
    /* insert some data into the source table */
    insert into t1 values (1), (2), (3);
    
    /* (optional) ensure the materialized view has been updated */
    flush;
    
    /* the materialized view should reflect the changes in source table */
    select * from mv1;):
    
  • refactor: rethink about the MinIO interface

    refactor: rethink about the MinIO interface

    We shouldn't care about how to deploy and operate the MinIO service. I think we can leverage the operator from the community.

    https://github.com/minio/operator

  • Support the newly added command line arguments when starting the compute nodes

    Support the newly added command line arguments when starting the compute nodes

    Two command line arguments were added for the compute nodes in this PR risingwavelabs/risingwave#6767:

    • --parallelism <PARALLELISM>
    • --total-memory-bytes <TOTAL_MEMORY_BYTES>

    These two parameters will have a default value of the current container/system limits after merging the container awareness PR. But the operator could be the source of truth if there are CPU/memory resource limits set on the Pods.

    Therefore, we should better specify the arguments when starting the compute nodes with the values of the resource limits.

  • ci: provide an action for starting a local k8s with risingwave-operator to use in GitHub Action workflows

    ci: provide an action for starting a local k8s with risingwave-operator to use in GitHub Action workflows

    How about providing a github action, which we can use it to create a kind cluster with risingwave-operator. ^_^

    https://docs.github.com/en/actions/creating-actions/about-custom-actions

  • Resolve dependencies among environment variables and keep the Pod template consistent

    Resolve dependencies among environment variables and keep the Pod template consistent

    See #258 for the related method and explanation.

    Kubernetes allows dependencies between environment variables, and one can reference another in its value, like the following:

    - name: ENV_A
      valueFrom:
      secretKeyRef:
        name: secret-name
        key: key
    - name: ENV_B
      value: "$(ENV_A)_suffix"
    

    In the example above, the value of ENV_B depends on the value of ENV_A, which is loaded from a Secret. This makes it easier to dispatch configurations dynamically.

    However, the variables' order must be enforced to resolve the values correctly: one variable must be after its dependencies. So here comes another order:

    Compare the env a and b.

    1. If a depends on b directly or transiently, then b < a.
    2. Otherwise, compare the name of a and b in alphabetical order.
  • Test coverage should not decrease

    Test coverage should not decrease

    Currently you can introduce PRs that decrease the overall code coverage. This could happen if you

    • Delete tests
    • Introduce new features without testing them

    IMHO we should introduce a CI test that aborts PRs which decrease code coverage. Screenshot 2022-11-14 at 09 31 41

  • Warn developers about missing docstrings in PRs

    Warn developers about missing docstrings in PRs

    IMHO we should warn (not block) developers if they do not provide docstrings. Warnings should be given for every function touched in that PR which does not have a valid docstring.

    Also see go docs: comments

An operator which complements grafana-operator for custom features which are not feasible to be merged into core operator

Grafana Complementary Operator A grafana which complements grafana-operator for custom features which are not feasible to be merged into core operator

Aug 16, 2022
The Elastalert Operator is an implementation of a Kubernetes Operator, to easily integrate elastalert with gitops.

Elastalert Operator for Kubernetes The Elastalert Operator is an implementation of a Kubernetes Operator. Getting started Firstly, learn How to use el

Jun 28, 2022
Minecraft-operator - A Kubernetes operator for Minecraft Java Edition servers

Minecraft Operator A Kubernetes operator for dedicated servers of the video game

Dec 15, 2022
K8s-network-config-operator - Kubernetes network config operator to push network config to switches

Kubernetes Network operator Will add more to the readme later :D Operations The

May 16, 2022
Pulumi-k8s-operator-example - OpenGitOps Compliant Pulumi Kubernetes Operator Example

Pulumi GitOps Example OpenGitOps Compliant Pulumi Kubernetes Operator Example Pr

May 6, 2022
Kubernetes Operator Samples using Go, the Operator SDK and OLM
Kubernetes Operator Samples using Go, the Operator SDK and OLM

Kubernetes Operator Patterns and Best Practises This project contains Kubernetes operator samples that demonstrate best practices how to develop opera

Nov 24, 2022
Test Operator using operator-sdk 1.15

test-operator Test Operator using operator-sdk 1.15 operator-sdk init --domain rbt.com --repo github.com/ravitri/test-operator Writing kustomize manif

Dec 28, 2021
a k8s operator 、operator-sdk

helloworld-operator a k8s operator 、operator-sdk Operator 参考 https://jicki.cn/kubernetes-operator/ https://learnku.com/articles/60683 https://opensour

Jan 27, 2022
Operator Permissions Advisor is a CLI tool that will take a catalog image and statically parse it to determine what permissions an Operator will request of OLM during an install

Operator Permissions Advisor is a CLI tool that will take a catalog image and statically parse it to determine what permissions an Operator will request of OLM during an install. The permissions are aggregated from the following sources:

Apr 22, 2022
The OCI Service Operator for Kubernetes (OSOK) makes it easy to connect and manage OCI services from a cloud native application running in a Kubernetes environment.

OCI Service Operator for Kubernetes Introduction The OCI Service Operator for Kubernetes (OSOK) makes it easy to create, manage, and connect to Oracle

Sep 27, 2022
PolarDB-X Operator is a Kubernetes extension that aims to create and manage PolarDB-X cluster on Kubernetes.

GalaxyKube -- PolarDB-X Operator PolarDB-X Operator is a Kubernetes extension that aims to create and manage PolarDB-X cluster on Kubernetes. It follo

Dec 19, 2022
Kubernetes Operator to sync secrets between different secret backends and Kubernetes

Vals-Operator Here at Digitalis we love vals, it's a tool we use daily to keep secrets stored securely. We also use secrets-manager on the Kubernetes

Nov 13, 2022
The NiFiKop NiFi Kubernetes operator makes it easy to run Apache NiFi on Kubernetes.
The NiFiKop NiFi Kubernetes operator makes it easy to run Apache NiFi on Kubernetes.

The NiFiKop NiFi Kubernetes operator makes it easy to run Apache NiFi on Kubernetes. Apache NiFI is a free, open-source solution that support powerful and scalable directed graphs of data routing, transformation, and system mediation logic.

Dec 26, 2022
Kubernetes Operator for a Cloud-Native OpenVPN Deployment.

Meerkat is a Kubernetes Operator that facilitates the deployment of OpenVPN in a Kubernetes cluster. By leveraging Hashicorp Vault, Meerkat securely manages the underlying PKI.

Jan 4, 2023
Modular Kubernetes operator to manage the lifecycle of databases

Ensemble Ensemble is a simple and modular Kubernetes Operator to manage the lifecycle of a wide range of databases. Infrastructure as code with Kubern

Aug 12, 2022
Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)
Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)

flagger Flagger is a progressive delivery tool that automates the release process for applications running on Kubernetes. It reduces the risk of intro

Jan 5, 2023
A Kubernetes Operator used for pre-scaling applications in anticipation of load

Pre-Scaling Kubernetes Operator Built out of necessity, the Operator helps pre-scale applications in anticipation of load. At its core, it manages a c

Oct 14, 2021
Kubernetes operator to autoscale Google's Cloud Bigtable clusters
Kubernetes operator to autoscale Google's Cloud Bigtable clusters

Bigtable Autoscaler Operator Bigtable Autoscaler Operator is a Kubernetes Operator to autoscale the number of nodes of a Google Cloud Bigtable instanc

Nov 5, 2021