Cloud Spanner load generator to load test your application and pre-warm the database before launch

Last update: Nov 30, 2022

Comments: 7

GCSB

GCSB

It's like YCSB but with more Google. A simple tool meant to generate load against Google Spanner databases. The primary goals of the project are

Write randomized data to tables in order to
- Facilitate load testing
- Initiate database splits
Generate read/write load against user provided schemas

Quickstart

To initiate a simple load test against your spanner instance using one of our test schemas

Create a test table

You can use your own schema if you'd prefer, but we provide a few test schemas to help you get started. To get started, create a table named SingleSingers

gcloud spanner databases ddl update YOUR_DATABASE_ID --instance=YOUR_INSTANCE_ID --ddl-file=schemas/single_table.sql

Load data into table

Load some data into the table to seed the upcoming laod test. In the example below, we are loading 10,000 rows of random data into the table SingleSingers

gcsb load -p YOUR_GCP_PROJECT_ID -i YOUR_INSTANCE_ID -d YOUR_DATABASE_ID -t SingleSingers -o 10000

Run a load test

Now you can perform a load test operation by using the run sub command. The below command will generate 10,000 operations. Of these operations, 75% will be READ operations and 25% will be writes. These operations will be performed over 50 threads.

gcsb run -p YOUR_GCP_PROJECT_ID -i YOUR_INSTANCE_ID -d YOUR_DATABASE_ID -t SingleSingers -o 10000 --reads 75 --writes 25 --threads 50

Operations

The tool usage is generally broken down into two categories, load and run operations.

Load

GCSB proides batched loading functionality to facilitate load testing as well as assist with performing database splits on your tables.

At runtime, GCSB will detect the schema of the table you're loading data for an create data generators that are appropriate for the column types you have in your database. Each type of generator has some configurable funcationality that allows you to refine the type, length, or range of data the tool generates. For in depth information on the various configuration values, please read the comments in example_gcsb.yaml.

Single table load

By default, GCSB will detect the table schema and create default random data generators based on the columns it finds. In order to tune the values the generator creates, you must create override configurations in the gcsb.yaml file. Please see that file's documentation for more information.

gcsb load -t TABLE_NAME -o NUM_ROWS

Additionally, please see gcsb load --help for additional configuration options.

Multiple table load

Similar to the above Single table load, you may specify multiple tables by repeating the -t TABLE_NAME argument. By default, the number of operations is applied to each table. For example, specifying 2 tables with 1000 operations, will yield 2000 total operations. 1000 per table.

gcsb load -t TABLE1 -t TABLE2 -o NUM_ROWS

Operations per table can be configured in the yaml configuration. For example

tables:
  - name: TABLE1
    operations:
      total: 500
  - name: TABLE2
    operations:
      total: 500

Loading into interleaved tables

Loading data into interleaved tables is supported but has some behavioral side effects that should be known. When loading data into an interleaved table, GCSB will detect all tables in the hierarchy and begin loading data at the familial apex. The configured number of operations applies to this apex table. By default, the number of tables for each child table is multiplied by 5. For example:

Using our test INTERLEAVE schema, we see an INTERLEAVE relationship between the Singers, Albums, and Songs tables.

If we execute a load operation against these tables with total operations set to 10 we will see the following occur

gcsb load -t Songs -o 10

+---------+------------+------+-------+---------+
|  TABLE  | OPERATIONS | READ | WRITE | CONTEXT |
+---------+------------+------+-------+---------+
| Singers |         10 | N/A  | N/A   | LOAD    |
| Albums  |         50 | N/A  | N/A   | LOAD    |
| Songs   |        250 | N/A  | N/A   | LOAD    |
+---------+------------+------+-------+---------+

In this case, for each child table we take the number of operations for the parent and multiply it by the default value of 5.

To change this multiplier, we use the yaml configuration file for the table we want. The operations.total value becomes a multiplier.

tables:
  - name: Albums
    operations:
      total: 10
  - name: Songs
    operations:
      total: 20

At present, GCSB will sort it's operations from the apex down. Meaning it will populate the Singers table first and then it's child, and then the next child. Multiple table operations are not mixed within the same transaction.

Run

Single table run

gcsb run -p YOUR_GCP_PROJECT_ID -i YOUR_INSTANCE_ID -d YOUR_DATABASE_ID -t SingleSingers -o 10000 --reads 75 --writes 25 --threads 50

Additionally, please see gcsb run --help for additional configuration options.

Multiple table run

Similar to the above Single table run, you may specify multiple tables by repeating the -t TABLE_NAME argument. By default, the number of operations is applied to each table. For example, specifying 2 tables with 1000 operations, will yield 2000 total operations. 1000 per table.

gcsb run -t TABLE1 -t TABLE2 -o NUM_ROWS

Operations per table can be configured in the yaml configuration. For example

tables:
  - name: TABLE1
    operations:
      total: 500
  - name: TABLE2
    operations:
      total: 500

Running against interleaved tables

Run operations against interleaved tables are only supported on the APEX table.

Using our test INTERLEAVE schema, we see an INTERLEAVE relationship between the Singers, Albums, and Songs tables.

You will notice that if we try to run against any child table an error will occur.

gcsb run -t Songs -o 10

unable to execute run operation: can only execute run against apex table (try 'Singers')

Distributed testing

GCSB is intended to run in a stateless mannger. This design choice was to allow massive horizontal scaling of gcsb to stress your database to it's absolute limits. During development we've identified kubernetes as the prefered tool for the job. We've provided two separate tutorials for running gcsb inside of kubernetes

GKE - For running GCSB inside GKE using a service account key. This can be used for non-GKE clusters as well as it contains instructions for mounting a service account key into the container.
GKE with Workload Identity - For running GCSB inside a GKE cluster that has workload identity turned on. This is most useful in organizations that have security policies preventing you from generating or downloading a service account key.

Configuration

The tool can receive configuration input in several different ways. The tool will load the file gcsb.yaml if it detects it in the current working directory. Alternatively you can use the global flag -c to specify a path to the configuration file. Each sub-command has a number of configuration flags that are relevant to that operation. These values are bound to their counterparts in the yaml configuration file and take precedent over the config file. Think of them as overrides. The same is true for environment variables.

Please note, at present, the yaml conifguration file is the only way to specify generator overrides for data loading and write operations. Without this file, the tool will use a random data generator that is appropriate for the table schema it detects at runtime.

For in depth information on the various configuration values, please read the comments in example_gcsb.yaml

Roadmap

Not Supported (yet)

Generating read operations utilizing ReadByIndex
Generating NULL values for load operations. If a column is NULLable, gcsb will still generate a value for it.
JSON column types
STRUCT Objects.
VIEWS
Inserting data across multiple tables in the same transaction
No SCAN or DELETE operations are supported at this time
Tables with foreign key relationships
Testing multiple tables at once

Development

Build

make build

Test

make test

Owner

Cloud Spanner Ecosystem

A community contributed and supported ecosystem around Cloud Spanner.

https://github.com/cloudspannerecosystem/gcsb

Comments

Fix child tables are included when only parent table is specified

Currently even if we specify only a parent table, child tables are included in planning phase.

Example:

$ gcsb load -p ${PROJECT} -i ${INSTANCE} -d ${DATABASE} -t Singers -o 10
2022/03/16 19:45:18 Loading configuration
...
2022/03/16 19:45:26 Executing load phase
2022/03/16 19:45:26 +---------+------------+------+-------+---------+
2022/03/16 19:45:26 |  TABLE  | OPERATIONS | READ | WRITE | CONTEXT |
2022/03/16 19:45:26 +---------+------------+------+-------+---------+
2022/03/16 19:45:26 | Singers |         10 | N/A  | N/A   | LOAD    |
2022/03/16 19:45:26 | Albums  |         50 | N/A  | N/A   | LOAD    |
2022/03/16 19:45:26 | Songs   |        250 | N/A  | N/A   | LOAD    |
2022/03/16 19:45:26 +---------+------------+------+-------+---------+
...

Note that even though only Singers table is provided, Albums and Songs tables are included in the generated plan. (Singers is the parent table, Albums and Songs are interleaved child tables.)

This PR fixes this issue and only specified parent table will be executed.

After fix:

$ gcsb load -p ${PROJECT} -i ${INSTANCE} -d ${DATABASE} -t Singers -o 10
2022/03/16 19:47:24 Loading configuration
...
2022/03/16 19:47:32 Executing load phase
2022/03/16 19:47:32 +---------+------------+------+-------+---------+
2022/03/16 19:47:32 |  TABLE  | OPERATIONS | READ | WRITE | CONTEXT |
2022/03/16 19:47:32 +---------+------------+------+-------+---------+
2022/03/16 19:47:32 | Singers |         10 | N/A  | N/A   | LOAD    |
2022/03/16 19:47:32 +---------+------------+------+-------+---------+
...

UUID v4

Overview

This PR has the implementation for UUID v4 support.

The design is written in https://github.com/cloudspannerecosystem/gcsb/issues/1.

Test result (without config)

Schema:

CREATE TABLE t6 (
  id STRING(36) NOT NULL,
  c01 STRING(32) NOT NULL,
  c02 BYTES(16),
  c03 INT64,
) PRIMARY KEY(id)

gcsb command:

gcsb load -p $PROJECT_ID -i gcsb -d d01 -t t6 -o 5

Result: Note that UUID is automatically populated in id column.

spanner> select * from t6;
+--------------------------------------+----------------------------------+------------------------+---------------------+
| id                                   | c01                              | c02                    | c03                 |
+--------------------------------------+----------------------------------+------------------------+---------------------+
| 257b2976-2662-457e-95ef-383758bcb362 | iXQtLKJbOebReagfmLEcTDUIDpiFuwRv | ucoAyUe3DldqLJKJoo9RBQ | 7923410221155820266 |
| 29fd6eb2-c89c-42ee-84cf-f1f8227aa673 | JsxFzsPxTUARuIcUljxCwEohglAnQcJd | DELExeADf0zUSl3TuPJvkw | 1297510459945003338 |
| 4b8b15f4-ed71-434e-a1fe-74a4c33a4e5f | QpAaHXHdqGVpnMmffdQMjdyKhDlcmglm | RqgGV6RVGO/0OJlEy8i0EA | 7864384495677479186 |
| 4e8abe20-ee97-4c97-b663-1fda513121bd | JfXGVywdUDZpGfYcSRsSYfZsKqjHpxIh | Wooc+wOZqtEQoM6dUBeo2g | 6597062775239665051 |
| d7f0ab8c-bb05-450f-ab79-90b7780ad043 | rmbMpoNYavnfvAvAesKUJUMfkaNFnvIB | M+wNUnVAz7umOZ4msr7t2g | 4529811403292764668 |
+--------------------------------------+----------------------------------+------------------------+---------------------+
5 rows in set (4.28 msecs)

Test Result (with config)

Schema:

CREATE TABLE t6 (
  id STRING(36) NOT NULL,
  c01 STRING(32) NOT NULL,
  c02 BYTES(16),
  c03 INT64,
) PRIMARY KEY(id)

Config:

tables:
- name: t6
  columns:
  - name: id
    generator:
      type: UUID_V4
  - name: c01
    generator:
      type: UUID_V4
  - name: c02
    generator:
      type: UUID_V4

gcsb command:

gcsb load -p $PROJECT_ID -i gcsb -d d01 -t t6 -o 5 --config config.yaml

Result:

spanner> select * from t6;
+--------------------------------------+----------------------------------+------------------------+---------------------+
| id                                   | c01                              | c02                    | c03                 |
+--------------------------------------+----------------------------------+------------------------+---------------------+
| 37fcceac-74a0-4148-bb0f-d3d7c368aba9 | b52808b7cb9b4a7e9a4b549bd389ab4c | GmeTgrUlSRWGujFCApPuMQ | 3407031175830144234 |
| 3ce1bf51-49d9-49d4-a890-cc7962e11722 | 26f79e351b71410fb9d1cc021382cf96 | 9nfD6bBHR5iGS+RNV4lR8g | 1631416344178272893 |
| 5885f3f0-75ce-4ec1-9f24-79f13f949ada | f1f0b88778e0493ba6c23094b4006c13 | NizZ5UH/RZO+en1RUkzm2A | 7778643024011734724 |
| aec21540-70d9-49e1-8bfd-d63a7f4e5dda | 6bab4f29e0a94987a2b430ff486c912b | H5v2fFN4RrWkL3ehWGzuUA | 3821404214491236873 |
| f773c772-b430-4c26-bf6c-a8d7016e1a41 | 308dcb5cb75c48a18e205a49937ff9d6 | 2tZT5+d9QZSOutyDAhwyoQ | 6987019101635680182 |
+--------------------------------------+----------------------------------+------------------------+---------------------+
5 rows in set (2.98 msecs)

UUID v4 Support
Basic Idea

Supports UUID v4 data generation.

Design

Infer UUID v4 by default

Google recommends using UUID v4 for primary keys in Cloud Spanner and it has been widely adopted as a best practice.

Hence, it might make sense to infer if a certain column is supposed to have UUID v4 and if looks so, automatically generates UUID v4 value for the column. This inference would make it easy to load data without forcing users to write the configurations.

How to infer UUID v4

There are several ways for users to store UUID v4 value. For instances,

In a STRING(36) column (e.g. "f13c1af5-07cd-4db6-8891-ffa4acbd4991").

In a STRING(32) column without - (e.g. "f13c1af507cd4db68891ffa4acbd4991")

In a BYTES(16) column (e.g. <BYTE_ARRAY>).

Among them I guess most users would choose STRING(36) for UUID v4, so we can only infer that if the column type is exactly STRING(36).

GCSB Configuration

We can provide an explicit way to use the UUID v4 generation through the config. This is an example of the config for UUID v4 support.

# gcsb.yaml tables: - name: User columns: - name: UserId generator: type: UUID_V4

Currently there is a field called type in generator field. For this purpose we can create a new value for the type field: UUID_V4.

(Question: Can we use that field to specify the actual data type? Or should we treat it as a Spanner data type like STRING and define a new field like sub_type for UUID_V4?)

This UUID_V4 is only valid if the column type is either of the following.

STRING(36)

STRING(32)

BYTES(16)

The actual data generated by this tool will vary depending on the column type. For example, if the column type is STRING(32), we generate f13c1af507cd4db68891ffa4acbd4991 for the value. If the column type is not in the above list, we should report an error explicitly.

Other Details

This design only focuses on UUID v4. In other words, we will treat each UUID version differently. If users would like to use UUID v1, we can think of adding UUID_V1 type afterward.

Fix invalid bucket calculation for operations

Description

Fixed a bug that caused invalid bucket calculation for operations, specifically when the operations are not evenly distributed to the buckets.

For example, if a user specify --operations=5 --threads=10, current implementation produced [1, 1, 1, 1, 0, 0, 0, 0, 0, 0] buckets. Note that total operations are 4 and 1 operation was dropped from the buckets.

This caused missing database rows when using load operation (i.e. user expects 5 rows inserted, but actually only 4 rows inserted).

Test result

Before fix:

=== RUN   TestBucketOps
=== RUN   TestBucketOps/operations_are_evenly_distributed
=== RUN   TestBucketOps/operations_are_distributed_unequally
    core_test.go:35: bucketOps(10, 4) = [3 2 2 2], but want = [3 3 2 2]
=== RUN   TestBucketOps/some_buckets_have_empty_operations
    core_test.go:35: bucketOps(5, 10) = [1 1 1 1 0 0 0 0 0 0], but want = [1 1 1 1 1 0 0 0 0 0]
--- FAIL: TestBucketOps (0.00s)
    --- PASS: TestBucketOps/operations_are_evenly_distributed (0.00s)
    --- FAIL: TestBucketOps/operations_are_distributed_unequally (0.00s)
    --- FAIL: TestBucketOps/some_buckets_have_empty_operations (0.00s)
FAIL
FAIL	github.com/cloudspannerecosystem/gcsb/pkg/workload	0.203s
FAIL

After fix:

=== RUN   TestBucketOps
=== RUN   TestBucketOps/operations_are_evenly_distributed
=== RUN   TestBucketOps/operations_are_distributed_unequally
=== RUN   TestBucketOps/some_buckets_have_empty_operations
--- PASS: TestBucketOps (0.00s)
    --- PASS: TestBucketOps/operations_are_evenly_distributed (0.00s)
    --- PASS: TestBucketOps/operations_are_distributed_unequally (0.00s)
    --- PASS: TestBucketOps/some_buckets_have_empty_operations (0.00s)
PASS
ok  	github.com/cloudspannerecosystem/gcsb/pkg/workload	0.928s

Doesn't work multiple run

It cannot input multiple tables as in the readme, and actually accepts a single string instead of []string. The subsequent processing also seems to be unsupported.

https://github.com/cloudspannerecosystem/gcsb/blob/6e0d2da74ab986632044b8a457ec111a590e92d1/pkg/cmd/run.go#L33

Doesn't work table insert sample with comments

When I tried to create a table as per the readme, I could not do it due to an encoding error in the comment section.

When I erased the comments and ran it again, it worked fine. Is there something wrong with my settings or need to fix cooment or readme?

$ gcloud spanner databases ddl update {db-name} --instance={instance-name} --ddl-file=schemas/single_table.sql
ERROR: (gcloud.spanner.databases.ddl.update) INVALID_ARGUMENT: Error parsing Spanner DDL statement: /*\nCopyright 2022 Google LLC\n\nLicensed under the Apache License, Version 2.0 (the \"License\") : Syntax error on line 1, column 1: Encountered \'/\' while parsing: ddl_statement
- '@type': type.googleapis.com/google.rpc.LocalizedMessage
  locale: en-US
  message: |-
    Error parsing Spanner DDL statement: /*
    Copyright 2022 Google LLC

    Licensed under the Apache License, Version 2.0 (the "License") : Syntax error on line 1, column 1: Encountered '/' while parsing: ddl_statement

Ability to provide custom read queries during run phase

Customer feedback:

Looks like it doesn’t support providing your own queries to the load test. The solution seems to just read from a table. Would be neat if you could also do that as well to test out a difficult query involving joins of tables etc and how it performs - would this be possible in the future?

Related tags

Testing Frameworks gcsb

Fortio load testing library, command line tool, advanced echo server and web UI in go (golang). Allows to specify a set query-per-second load and record latency histograms and other useful stats.

Fortio Fortio (Φορτίο) started as, and is, Istio's load testing tool and now graduated to be its own project. Fortio is also used by, among others, Me

Jan 2, 2023

Flugel Test Documentation for steps to run and test the automatio

Flugel Test Documentation Documentation for steps to run and test the automation #Test-01 1 - Local Test Using Terratest (End To End) 1- By runing " t

Nov 13, 2022

This repository includes consumer driven contract test for provider, unit test and counter api.

Feb 1, 2022

go-test-trace is like go test but it also generates distributed traces.

go-test-trace go-test-trace is like go test but it also generates distributed traces. Generated traces are exported in OTLP to a OpenTelemetry collect

Jan 5, 2023

Test-assignment - Test assignment with golang

test-assignment We have a two steam of data and we need to save it in the map: I

Jan 19, 2022

Load generator for measuring overhead generated by EDRs and other logging tools on Linux

Simple load generator for stress-testing EDR software The purpose of this tool is to measure CPU overhead incurred by having active or passive securit

Nov 9, 2022

Check-load - Simple cross-platform load average check

Sensu load average check Table of Contents Overview Usage examples Configuration

Jun 16, 2022

A presentable test report generator for go packages

go-tprof Overview Prerequisites 1. Go version >= 1.12 2. Node.js version >= 8.0.0 (for building the UI) 3. Yarn 4. GOPATH and local bin setup (`expor

Dec 16, 2022

HTTP load generator, ApacheBench (ab) replacement, formerly known as rakyll/boom

hey is a tiny program that sends some load to a web application. hey was originally called boom and was influenced from Tarek Ziade's tool at tarekzia

Dec 31, 2022

Hive-fleet: a distributed, scalable load-testing tool built in go that leverages Google Cloud Functions

hive-fleet hive-fleet is a distributed, scalable load-testing tool, built on top

Jan 27, 2022

Test your command line interfaces on windows, linux and osx and nodes viá ssh and docker

Commander Define language independent tests for your command line scripts and programs in simple yaml files. It runs on windows, osx and linux It can

Dec 17, 2022

Ruby on Rails like test fixtures for Go. Write tests against a real database

testfixtures Warning: this package will wipe the database data before loading the fixtures! It is supposed to be used on a test database. Please, doub

Jan 8, 2023

Sql mock driver for golang to test database interactions

Sql driver mock for Golang sqlmock is a mock library implementing sql/driver. Which has one and only purpose - to simulate any sql driver behavior in

Dec 31, 2022

This is a simple test application that sends fake video data from one pion instance to another

Pion test app for BWE This is a simple test application that sends fake video data from one pion instance to another. It is a modified version of the

Jun 8, 2022

Test your code without writing mocks with ephemeral Docker containers 📦 Setup popular services with just a couple lines of code ⏱️ No bash, no yaml, only code 💻

Gnomock – tests without mocks ??️ Spin up entire dependency stack ?? Setup initial dependency state – easily! ?? Test against actual, close to product

Dec 29, 2022

Automatically generate Go test boilerplate from your source code.

gotests gotests makes writing Go tests easy. It's a Golang commandline tool that generates table driven tests based on its target source files' functi