Kubernetes operator to autoscale Google's Cloud Bigtable clusters

Last update: Nov 5, 2021

Comments: 9

Bigtable Autoscaler Operator

Bigtable Autoscaler Operator is a Kubernetes Operator to autoscale the number of nodes of a Google Cloud Bigtable instance based on the CPU utilization.

Bigtable Autoscaler Operator

Overview

Google Cloud Bigtable is designed to scale horizontally, meaning that the number of nodes of an instance can be increased to balance and reduce the average CPU utilization. For Bigtable applications dealing with high variances of workload, automating the cluster scaling allow handling short load bursts while keeping costs as low as possible. This operator automates the scaling by balancing the number of nodes to keep the CPU utilization under the manifest specifications.

The reconciler's responsibility is to keep the CPU utilization of the instance below the target specification respecting the minimum and maximum amount of nodes. When the CPU utilization is above the target, the reconciler will increase the amount of nodes in steps linearly proportional to how above it is from the target. For example, considering 100% of CPU utilization and only one node running, if the CPU target is 50%, it increases to 2 nodes, but if the CPU target is 25% it increases to 4 nodes.

The downscale also follows a linear rule, but it considers the maxScaleDownNodes specification which defines the maximum downscale step size in order to avoid aggressive downscale. Furthermore, the downscale step is calculated using the amount of current nodes running and the CPU target. For example, if there are two nodes running and the CPU target is 50%, in order to downscale occur the CPU utilization must go bellow 25%. This is important to avoid downscale that immediately causes upscale.

All scale operations are made respecting a reaction time window, which at time is not part of the manifest specification.

The image bellow shows how peaks above the CPU target of 50% are shortened by the automatic increase of nodes.

Usage

Create a k8s secret with your service account:

$ kubectl create secret generic bigtable-autoscaler-service-account --from-file=service-account=./your_service_account.json

Create an autoscaling manifest:

# my-autoscaler.yml
apiVersion: bigtable.bigtable-autoscaler.com/v1
kind: BigtableAutoscaler
metadata:
  name: my-autoscaler
spec:
  bigtableClusterRef:
    projectId: cool-project
    instanceId: my-instance-id
    clusterId: my-cluster-id
  serviceAccountSecretRef:
    name: example-service-account
    key: service-account
  minNodes: 1
  maxNodes: 10
  targetCPUUtilization: 50

Then you can install it on your k8s cluster:

$ kubectl apply -f my-autoscaler.yml

You can check your autoscaler running:

$ kubectl get bigtableautoscalers

Prerequisites

Enable Bigtable and Monitoring APIs on your GCP project.
Generate a service account secret with the role for Bigtable administrator.

Installation

Visit the releases page, download the all-in-one.yml of the version of your choice and apply it
```
kubectl apply -f all-in-one.yml
```

Development environment

These are the steps for setting up the development environment.

This project is using go version 1.13 and other tools with its respective version, we don't guarantee that using other versions can perform successful builds.

Install kubebuilder version 2.3.2.
1. Also make sure that you have its dependencies installed: controller-gen version 0.5.0 and kustomize version 3.10.0
Follow Option 1 or Option 2 section.

Option 1: Run with Tilt (recomended)

Tilt is tool to automate development cycle and has features like hot deploy.

Install tilt version 0.19.0 (follow the official instructions).
1. Install its dependencies: ctlptl and kind (or other tool to create local k8s clusters) as instructed.
If it doesn't exist, create your k8s cluster using ctlptl
```
ctlptl create cluster kind --registry=ctlptl-registry
```
Provide the secret with the service account credentials and role as described in section Secret setup.
Run tilt up

Option 2: Manual run

Running manually requires some extra steps!

If it doesn't exist, create your local k8s cluster. Here we will use kind to create it:
```
kind create cluster
```
Provide the secret with the service account credentials and role as described in section Secret setup.
check that your cluster is correctly running
```
kubectl cluster-info
```
Apply Custom Resource Definition
```
make install
```
Build docker image with manger binary
```
make docker-build
```

Load this image to the cluster

kind load docker-image controller:latest

Deploy the operator to the local cluster
```
make deploy
```

Apply the autoscaler sample

kubectl apply -f config/samples/bigtable_v1_bigtableautoscaler.yaml

Check pods and logs

kubectl -n bigtable-autoscaler-system logs $(kubectl -n bigtable-autoscaler-system get pods | tail -n1 | cut -d ' ' -f1) --all-containers

Secret setup

Use the service account from the Prerequisites section to create the k8s secret

kubectl create secret generic bigtable-autoscaler-service-account --from-file=service-account=./your_service_account.json

Create role and rolebinding to read secret

kubectl apply -f config/rbac/secret-role.yml

Running tests

go test ./... -v

gotestsum

Owner

RD Station

https://github.com/ResultadosDigitais/bigtable-autoscaler-operator

Comments

Multi instance scaling
Hello friends,

This PR allows us to finally autoscale multiple Bigtable instances simultaneously! We're now able to add and remove as many autoscalers as we want without breaking the operator.

Other relevant changes:

Defined some default values on our API to help our users to know the default behavior.

Simplified our main controller'ss implementation.
Add job to build and push docker image
Problem

:ticket: Link to the issue

TODO: Describe the problem

What was your customer problem?

What bug does it fix?

What code technical debt it removes?

Solution

TODO: Describe the solution

How this PR solve your customer problem?

How does the interface (UI or API) look like ?

How was the bug fixed?

How was the code improved?

Testing

Setup

TODO: What does it take to test the changes locally?

Main scenarios

Scenario 1: TODO (repeat)

TODO: What are the steps to reproduce this scenario?

TODO: What is the expected outcome?

Other scenarios

Be creative! :sparkles:

Here you can learn more about good PR description
Add cluster ref type

Hello friends,

This PR introduces a new type meant to represent the cluster reference we need to support multi-instance scaling. It also simplifies some implementations to make it easier to change them in the future.

Now it is also possible to have multiple Syncers running simultaneously, as long as they are from different instances.
Interfaces refactor
Hi,

in this PR we do the following:

Move all interfaces needed by mocks to its respective packages.

Re-create all generated mock files and use snake naming (underscore naming) for the files.

Update CRD yaml (it was annoying to see this file change all the time locally)
Status syncer
Hi,

in this PR we do the following:

Create a status syncer prototype that syncs both nodes count and CPU metrics statuses

Fix some a bug to calc desired node method when should round it up.

Create a wrapper for the writer status so we can use an interface

Testing

Use staging to see if the autoscale scales.
Clients wrapers
Hi,

in this PR we do the following:

Create Google Cloud client to manage GCP resources

Wrap Metrics client so it's possible to mock it

Wrap Bigtable client so it's possible to mock it

Add interfaces package in order to avoid package cyclic dependency when generating mocks

Introduce tests for Google Cloud client methods.

Testing

Run the operator and generate some trash on your Bigtable instance and see if it still works.
Add calculator abstraction

Hello!!

On this PR we introduce an abstraction meant to calculate the number of nodes we need from a current state and a spec.

We also make some formatting adjustments.
fix current cpu filtering by instance

The request to retrieve the CPU might return time series from other instances and will just return the first one. I've added a filter using the instance ID

fix RoleBinding serviceaccount name

The RoleBinding should be applied to the default serviceaccount in the namespace:

secrets "bigtable-autoscaler-service-account" is forbidden: User "system:serviceaccount:bigtable-autoscaler-system:default" cannot get resource "secrets" in API group "" in the namespace "bigtable-autoscaler-system"

Kubernetes operator to autoscale Google's Cloud Bigtable clusters

Bigtable Autoscaler Operator

Overview

Usage

Prerequisites

Installation

Development environment

Option 1: Run with Tilt (recomended)

Option 2: Manual run

Secret setup

Running tests

Owner

RD Station

Comments

Multi instance scaling

Add job to build and push docker image

Problem

Solution

Testing

Setup

Main scenarios

Scenario 1: TODO (repeat)

Other scenarios

Add cluster ref type

Interfaces refactor

Status syncer

Testing

Clients wrapers

Testing

Add calculator abstraction

fix current cpu filtering by instance

fix RoleBinding serviceaccount name

Related tags

Basic Kubernetes operator that have multiple versions in CRD. This operator can be used to experiment and understand Operator/CRD behaviors.

An operator which complements grafana-operator for custom features which are not feasible to be merged into core operator

Terraform-operator - The Terraform Operator provides support to run Terraform modules in Kubernetes in a declaritive way as a Kubernetes manifest

KinK is a helper CLI that facilitates to manage KinD clusters as Kubernetes pods. Designed to ease clusters up for fast testing with batteries included in mind.

Nebula Operator manages NebulaGraph clusters on Kubernetes and automates tasks related to operating a NebulaGraph cluster

The OCI Service Operator for Kubernetes (OSOK) makes it easy to connect and manage OCI services from a cloud native application running in a Kubernetes environment.

The Elastalert Operator is an implementation of a Kubernetes Operator, to easily integrate elastalert with gitops.

Minecraft-operator - A Kubernetes operator for Minecraft Java Edition servers

K8s-network-config-operator - Kubernetes network config operator to push network config to switches

Pulumi-k8s-operator-example - OpenGitOps Compliant Pulumi Kubernetes Operator Example

Kubernetes Operator Samples using Go, the Operator SDK and OLM

provide api for cloud service like aliyun, aws, google cloud, tencent cloud, huawei cloud and so on

Terraform module to provisison Kubernetes Clusters on Hetzner cloud (Based on KubeOne)

Cloud-gaming-operator - The one that manages VMs for cloud gaming built on GCE

An operator for managing ephemeral clusters in GKE

A k8s operator to reduce CO2 footprint of your clusters

vcluster - Create fully functional virtual Kubernetes clusters - Each cluster runs inside a Kubernetes namespace and can be started within seconds

Kubernetes IN Docker - local clusters for testing Kubernetes

provider-kubernetes is a Crossplane Provider that enables deployment and management of arbitrary Kubernetes objects on clusters