Kubernetes operator to autoscale Google's Cloud Bigtable clusters

CircleCI GitHub release

Bigtable Autoscaler Operator

Bigtable Autoscaler Operator is a Kubernetes Operator to autoscale the number of nodes of a Google Cloud Bigtable instance based on the CPU utilization.

Overview

Google Cloud Bigtable is designed to scale horizontally, meaning that the number of nodes of an instance can be increased to balance and reduce the average CPU utilization. For Bigtable applications dealing with high variances of workload, automating the cluster scaling allow handling short load bursts while keeping costs as low as possible. This operator automates the scaling by balancing the number of nodes to keep the CPU utilization under the manifest specifications.

The reconciler's responsibility is to keep the CPU utilization of the instance below the target specification respecting the minimum and maximum amount of nodes. When the CPU utilization is above the target, the reconciler will increase the amount of nodes in steps linearly proportional to how above it is from the target. For example, considering 100% of CPU utilization and only one node running, if the CPU target is 50%, it increases to 2 nodes, but if the CPU target is 25% it increases to 4 nodes.

The downscale also follows a linear rule, but it considers the maxScaleDownNodes specification which defines the maximum downscale step size in order to avoid aggressive downscale. Furthermore, the downscale step is calculated using the amount of current nodes running and the CPU target. For example, if there are two nodes running and the CPU target is 50%, in order to downscale occur the CPU utilization must go bellow 25%. This is important to avoid downscale that immediately causes upscale.

All scale operations are made respecting a reaction time window, which at time is not part of the manifest specification.

The image bellow shows how peaks above the CPU target of 50% are shortened by the automatic increase of nodes. Bigtable CPU utilization and nodes count

Usage

Create a k8s secret with your service account:

$ kubectl create secret generic bigtable-autoscaler-service-account --from-file=service-account=./your_service_account.json

Create an autoscaling manifest:

# my-autoscaler.yml
apiVersion: bigtable.bigtable-autoscaler.com/v1
kind: BigtableAutoscaler
metadata:
  name: my-autoscaler
spec:
  bigtableClusterRef:
    projectId: cool-project
    instanceId: my-instance-id
    clusterId: my-cluster-id
  serviceAccountSecretRef:
    name: example-service-account
    key: service-account
  minNodes: 1
  maxNodes: 10
  targetCPUUtilization: 50

Then you can install it on your k8s cluster:

$ kubectl apply -f my-autoscaler.yml

You can check your autoscaler running:

$ kubectl get bigtableautoscalers

image

Prerequisites

  1. Enable Bigtable and Monitoring APIs on your GCP project.
  2. Generate a service account secret with the role for Bigtable administrator.

Installation

  1. Visit the releases page, download the all-in-one.yml of the version of your choice and apply it
    kubectl apply -f all-in-one.yml

Development environment

These are the steps for setting up the development environment.

This project is using go version 1.13 and other tools with its respective version, we don't guarantee that using other versions can perform successful builds.

  1. Install kubebuilder version 2.3.2.

    1. Also make sure that you have its dependencies installed: controller-gen version 0.5.0 and kustomize version 3.10.0
  2. Follow Option 1 or Option 2 section.

Option 1: Run with Tilt (recomended)

Tilt is tool to automate development cycle and has features like hot deploy.

  1. Install tilt version 0.19.0 (follow the official instructions).

    1. Install its dependencies: ctlptl and kind (or other tool to create local k8s clusters) as instructed.
  2. If it doesn't exist, create your k8s cluster using ctlptl

    ctlptl create cluster kind --registry=ctlptl-registry
  3. Provide the secret with the service account credentials and role as described in section Secret setup.

  4. Run tilt up

Option 2: Manual run

Running manually requires some extra steps!

  1. If it doesn't exist, create your local k8s cluster. Here we will use kind to create it:

    kind create cluster
  2. Provide the secret with the service account credentials and role as described in section Secret setup.

  3. check that your cluster is correctly running

    kubectl cluster-info
  4. Apply Custom Resource Definition

    make install
  5. Build docker image with manger binary

    make docker-build
  6. Load this image to the cluster

    kind load docker-image controller:latest
  7. Deploy the operator to the local cluster

    make deploy
  8. Apply the autoscaler sample

    kubectl apply -f config/samples/bigtable_v1_bigtableautoscaler.yaml
  9. Check pods and logs

    kubectl -n bigtable-autoscaler-system logs $(kubectl -n bigtable-autoscaler-system get pods | tail -n1 | cut -d ' ' -f1) --all-containers

Secret setup

  1. Use the service account from the Prerequisites section to create the k8s secret

    kubectl create secret generic bigtable-autoscaler-service-account --from-file=service-account=./your_service_account.json
  2. Create role and rolebinding to read secret

    kubectl apply -f config/rbac/secret-role.yml

Running tests

go test ./... -v

or

gotestsum
Comments
  • Multi instance scaling

    Multi instance scaling

    Hello friends,

    This PR allows us to finally autoscale multiple Bigtable instances simultaneously! We're now able to add and remove as many autoscalers as we want without breaking the operator.

    Other relevant changes:

    1. Defined some default values on our API to help our users to know the default behavior.
    2. Simplified our main controller'ss implementation.

    image

  • Add job to build and push docker image

    Add job to build and push docker image

    Problem

    :ticket: Link to the issue

    TODO: Describe the problem

    • What was your customer problem?
    • What bug does it fix?
    • What code technical debt it removes?

    Solution

    TODO: Describe the solution

    • How this PR solve your customer problem?
    • How does the interface (UI or API) look like ?
    • How was the bug fixed?
    • How was the code improved?

    Testing

    Setup

    TODO: What does it take to test the changes locally?

    Main scenarios

    Scenario 1: TODO (repeat)

    TODO: What are the steps to reproduce this scenario?

    TODO: What is the expected outcome?

    Other scenarios

    Be creative! :sparkles:

    Here you can learn more about good PR description

  • Add cluster ref type

    Add cluster ref type

    Hello friends,

    This PR introduces a new type meant to represent the cluster reference we need to support multi-instance scaling. It also simplifies some implementations to make it easier to change them in the future.

    Now it is also possible to have multiple Syncers running simultaneously, as long as they are from different instances.

  • Interfaces refactor

    Interfaces refactor

    Hi,

    in this PR we do the following:

    1. Move all interfaces needed by mocks to its respective packages.
    2. Re-create all generated mock files and use snake naming (underscore naming) for the files.
    3. Update CRD yaml (it was annoying to see this file change all the time locally)
  • Status syncer

    Status syncer

    Hi,

    in this PR we do the following:

    1. Create a status syncer prototype that syncs both nodes count and CPU metrics statuses
    2. Fix some a bug to calc desired node method when should round it up.
    3. Create a wrapper for the writer status so we can use an interface

    Testing

    Use staging to see if the autoscale scales.

  • Clients wrapers

    Clients wrapers

    Hi,

    in this PR we do the following:

    1. Create Google Cloud client to manage GCP resources
    2. Wrap Metrics client so it's possible to mock it
    3. Wrap Bigtable client so it's possible to mock it
    4. Add interfaces package in order to avoid package cyclic dependency when generating mocks
    5. Introduce tests for Google Cloud client methods.

    Testing

    Run the operator and generate some trash on your Bigtable instance and see if it still works.

  • Add calculator abstraction

    Add calculator abstraction

    Hello!!

    On this PR we introduce an abstraction meant to calculate the number of nodes we need from a current state and a spec.

    We also make some formatting adjustments.

  • fix current cpu filtering by instance

    fix current cpu filtering by instance

    The request to retrieve the CPU might return time series from other instances and will just return the first one. I've added a filter using the instance ID

  • fix RoleBinding serviceaccount name

    fix RoleBinding serviceaccount name

    The RoleBinding should be applied to the default serviceaccount in the namespace:

    secrets "bigtable-autoscaler-service-account" is forbidden: User "system:serviceaccount:bigtable-autoscaler-system:default" cannot get resource "secrets" in API group "" in the namespace "bigtable-autoscaler-system"
    
Basic Kubernetes operator that have multiple versions in CRD. This operator can be used to experiment and understand Operator/CRD behaviors.

add-operator Basic Kubernetes operator that have multiple versions in CRD. This operator can be used to experiment and understand Operator/CRD behavio

Dec 15, 2021
An operator which complements grafana-operator for custom features which are not feasible to be merged into core operator

Grafana Complementary Operator A grafana which complements grafana-operator for custom features which are not feasible to be merged into core operator

Aug 16, 2022
KinK is a helper CLI that facilitates to manage KinD clusters as Kubernetes pods. Designed to ease clusters up for fast testing with batteries included in mind.
KinK is a helper CLI that facilitates to manage KinD clusters as Kubernetes pods. Designed to ease clusters up for fast testing with batteries included in mind.

kink A helper CLI that facilitates to manage KinD clusters as Kubernetes pods. Table of Contents kink (KinD in Kubernetes) Introduction How it works ?

Dec 10, 2022
Nebula Operator manages NebulaGraph clusters on Kubernetes and automates tasks related to operating a NebulaGraph cluster

Nebula Operator manages NebulaGraph clusters on Kubernetes and automates tasks related to operating a NebulaGraph cluster. It evolved from NebulaGraph Cloud Service, makes NebulaGraph a truly cloud-native database.

Dec 31, 2022
The OCI Service Operator for Kubernetes (OSOK) makes it easy to connect and manage OCI services from a cloud native application running in a Kubernetes environment.

OCI Service Operator for Kubernetes Introduction The OCI Service Operator for Kubernetes (OSOK) makes it easy to create, manage, and connect to Oracle

Sep 27, 2022
The Elastalert Operator is an implementation of a Kubernetes Operator, to easily integrate elastalert with gitops.

Elastalert Operator for Kubernetes The Elastalert Operator is an implementation of a Kubernetes Operator. Getting started Firstly, learn How to use el

Jun 28, 2022
Minecraft-operator - A Kubernetes operator for Minecraft Java Edition servers

Minecraft Operator A Kubernetes operator for dedicated servers of the video game

Dec 15, 2022
K8s-network-config-operator - Kubernetes network config operator to push network config to switches

Kubernetes Network operator Will add more to the readme later :D Operations The

May 16, 2022
Pulumi-k8s-operator-example - OpenGitOps Compliant Pulumi Kubernetes Operator Example

Pulumi GitOps Example OpenGitOps Compliant Pulumi Kubernetes Operator Example Pr

May 6, 2022
Kubernetes Operator Samples using Go, the Operator SDK and OLM
Kubernetes Operator Samples using Go, the Operator SDK and OLM

Kubernetes Operator Patterns and Best Practises This project contains Kubernetes operator samples that demonstrate best practices how to develop opera

Nov 24, 2022
provide api for cloud service like aliyun, aws, google cloud, tencent cloud, huawei cloud and so on

cloud-fitter 云适配 Communicate with public and private clouds conveniently by a set of apis. 用一套接口,便捷地访问各类公有云和私有云 对接计划 内部筹备中,后续开放,有需求欢迎联系。 开发者社区 开发者社区文档

Dec 20, 2022
Terraform module to provisison Kubernetes Clusters on Hetzner cloud (Based on KubeOne)

Terraform module template Terraform module which creates describe your intent resources on AWS. Usage Use this template to scaffold a new terraform mo

Nov 26, 2021
Cloud-gaming-operator - The one that manages VMs for cloud gaming built on GCE

cloud-gaming-operator GCE上に建てたクラウドゲーミング用のVMを管理するやつ 事前準備 GCEのインスタンスかマシンイメージを作成してお

Jan 22, 2022
An operator for managing ephemeral clusters in GKE

Test Cluster Operator for GKE This operator provides an API-driven cluster provisioning for integration and performance testing of software that integ

Oct 22, 2022
A k8s operator to reduce CO2 footprint of your clusters
A k8s operator to reduce CO2 footprint of your clusters

How many of your dev/preview pods stay on during weekends? Or at night? It's a waste of resources! And money! But fear not, kube-green is here to the

Jan 3, 2023
vcluster - Create fully functional virtual Kubernetes clusters - Each cluster runs inside a Kubernetes namespace and can be started within seconds
vcluster - Create fully functional virtual Kubernetes clusters - Each cluster runs inside a Kubernetes namespace and can be started within seconds

Website • Quickstart • Documentation • Blog • Twitter • Slack vcluster - Virtual Clusters For Kubernetes Lightweight & Low-Overhead - Based on k3s, bu

Jan 4, 2023
Kubernetes IN Docker - local clusters for testing Kubernetes
Kubernetes IN Docker - local clusters for testing Kubernetes

kind is a tool for running local Kubernetes clusters using Docker container "nodes".

Jan 5, 2023
provider-kubernetes is a Crossplane Provider that enables deployment and management of arbitrary Kubernetes objects on clusters

provider-kubernetes provider-kubernetes is a Crossplane Provider that enables deployment and management of arbitrary Kubernetes objects on clusters ty

Dec 14, 2022