sqlbench measures and compares the execution time of one or more SQL queries.

Felix Geisendörfer

Last update: Dec 6, 2022

Comments: 6

sqlbench

The main use case is benchmarking simple CPU-bound query variants against each other during local development.

Only PostgreSQL is supported at this point, but pull requests for MySQL or other databases are welcome.

Install

You can download a binary from the release page.

If you have Go 1.13 or later installed, you can install or update sqlbench from source:

$ go get -u github.com/felixge/sqlbench

Windows Users: You may also install the chocolatey package maintained by picolino:

$ choco install sqlbench

Examples

Below are a few one-liners to get you started. They assume you're running sqlbench from the directory of a clone of this repo.

# Benchmark a few queries until ctrl+c is hit. Output results in realtime.
sqlbench examples/sum/*.sql

# Benchmark using client wallclock time (instead of explain) until ctrl+c.
sqlbench -m client examples/sum/*.sql

# Run for 3 seconds and only print results once at the end.
sqlbench -t 3 -s examples/sum/*.sql

# Run for 1000 iterations and only print verbose results once at the end
sqlbench -n 1000 -s -v examples/sum/*.sql

# Record the results for 1000 iterations into a csv file.
sqlbench -n 1000 -o baseline.csv examples/sum/*.sql

# Compare 1000 iterations to a baseline recording.
sqlbench -n 1000 -i baseline.csv examples/sum/*.sql

Usage

Usage of sqlbench:
  -c string
    	Connection URL or DSN for connecting to PostgreSQL as understood by pgx [1].
    	E.g.: postgres://user:secret@localhost:5432/my_db?sslmode=disable
    	
    	Alternatively you can use standard PostgreSQL environment variables [2] such as
    	PGHOST, PGPORT, PGPASSWORD, ... .
    	
    	[1] https://pkg.go.dev/github.com/jackc/pgx/v4/stdlib?tab=doc
    	[2] https://www.postgresql.org/docs/current/libpq-envars.html
    	(default "postgres://")
  -i string
    	Input path for CSV file with baseline measurements.
  -m string
    	Method for measuring the query time. One of: "client", "explain" (default "explain")
  -n int
    	Terminate after the given number of iterations. (default -1)
  -o string
    	Output path for writing individual measurements in CSV format.
  -p	Include the query planning time. For -m explain this is accomplished by adding
    	the "Planning Time" to the measurement. For -m client this is done by not using
    	prepared statements.
  -s	Silent mode for non-interactive use, only prints stats once after terminating.
  -t float
    	Terminate after the given number of seconds. (default -1)
  -v	Verbose output. Print the content of all SQL queries, as well as the
    	PostgreSQL version.
  -version
    	Print version and exit.

How It Works

sqlbench takes a list of SQL files and keeps executing them sequentially, measuring their execution times. By default the execution time is measured by prefixing the query with EXPLAIN (ANALYZE, TIMING OFF) and capturing the total Execution Time for it.

The query columns are ordered by mean execution time in ascending order, and the relative difference compared to the fastest query is shown in parentheses. If you provide a baseline csv via -i, the relative differences are comparing the corresponding queries in the baseline rather than the current queries with each other.

If the -m client flag is given, the time is measured using the wallclock time of sqlbench which includes network overhead.

Planning time is excluded by default, but can be included using the -p flag.

The filenames init.sql and destroy.sql are special, and are executed once before and after the benchmark respectively. They can be used to setup or teardown tables, indexes, etc..

Tutorial

Let's say you want to compare three different queries for computing the running total of all numbers from 1 to 1000. Your first idea is to use a window function:

SELECT i, sum(i) OVER (ORDER BY i) AS sum
FROM generate_series(1, 1000) g(i);

Then you decide to get fancy and implement it as a recursive CTE:

WITH RECURSIVE sums AS (
	SELECT 1 AS i, 1 AS sum
	UNION
	SELECT i+1, sum+i FROM sums WHERE i <= 1000
)

SELECT * FROM sums;

And finally you become wise and remember that 9 year old Gauss could probably beat both approaches:

SELECT i, (i * (i + 1)) / 2 AS sum
FROM generate_series(1, 1000) g(i);

Now that you have your queries in window.sql, recursive.sql, gauss.sql, you want to summarize the performance differences for your colleagues. However, you know they're a pedantic bunch, and will ask you annoying questions such as:

How many times did you run each query?
Were you running other stuff on your laptop in the background?
How can I reproduce this on my local machine?
What version of PostgreSQL were you running on your local machine?
Are you sure you're not just measuring the overhead of EXPLAIN ANALYZE?

This could normally be quite annoying to deal with, but luckily there is sqlbench. The command below lets you run your three queries 1000 times with EXPLAIN ANALYZE and report the statistics, the PostgreSQL version and even the SQL of your queries:

$ sqlbench -v -s -n 1000 examples/sum/*.sql | tee explain-bench.txt

         | gauss |    window     |   recursive    
---------+-------+---------------+----------------
  n      |  1000 |          1000 |          1000  
  min    |  0.35 | 1.31 (3.79x)  | 1.80 (5.22x)   
  max    |  4.18 | 23.76 (5.68x) | 11.41 (2.73x)  
  mean   |  0.50 | 1.94 (3.85x)  | 2.67 (5.30x)   
  stddev |  0.16 | 0.81 (4.93x)  | 0.63 (3.87x)   
  median |  0.53 | 2.02 (3.80x)  | 2.91 (5.49x)   
  p90    |  0.67 | 2.53 (3.80x)  | 3.41 (5.12x)   
  p95    |  0.68 | 2.57 (3.81x)  | 3.50 (5.18x)   

Stopping after 1000 iterations as requested.

postgres version: PostgreSQL 11.6 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit
sqlbench -v -s -n 1000 examples/sum/gauss.sql examples/sum/recursive.sql examples/sum/window.sql

==> examples/sum/gauss.sql <==
SELECT i, (i * (i + 1)) / 2 AS sum
FROM generate_series(1, 1000) g(i);

==> examples/sum/window.sql <==
SELECT i, sum(i) OVER (ORDER BY i) AS sum
FROM generate_series(1, 1000) g(i);

==> examples/sum/recursive.sql <==
WITH RECURSIVE sums AS (
	SELECT 1 AS i, 1 AS sum
	UNION
	SELECT i+1, sum+i FROM sums WHERE i <= 1000
)

SELECT * FROM sums;

And finally, you can use the -m client flag to measure the query times without EXPLAIN ANALYZE to see if that had a significant overhead:

$ sqlbench -s -n 1000 -m client examples/sum/*.sql | tee client-bench.txt

         | gauss |    window    |  recursive    
---------+-------+--------------+---------------
  n      |  1000 |         1000 |         1000  
  min    |  0.66 | 1.44 (2.18x) | 2.03 (3.08x)  
  max    |  5.66 | 7.31 (1.29x) | 4.34 (0.77x)  
  mean   |  0.83 | 1.72 (2.08x) | 2.35 (2.83x)  
  stddev |  0.23 | 0.33 (1.41x) | 0.27 (1.18x)  
  median |  0.78 | 1.65 (2.11x) | 2.26 (2.89x)  
  p90    |  0.98 | 1.98 (2.03x) | 2.68 (2.75x)  
  p95    |  1.05 | 2.13 (2.03x) | 2.89 (2.76x)  

Stopping after 1000 iterations as requested.

Indeed, it appears that from the client's perspective the gauss query is a bit slower, while the others are a bit faster when measuring without EXPLAIN ANALYZE. Whether that's a rabbit hole worth exploring depends on you, but either way you now have a much better sense of the errors that might be contained in your measurements.

Todos

Below are a few ideas for todos that I might implement at some point or would welcome as pull requests.

License

sqlbench is licensed under the MIT license.

Owner

Felix Geisendörfer

https://github.com/felixge/sqlbench

Comments

Support installation via Chocolatey

Chocolatey - is a nice windows package manager, so it will be cool if that software can be distributed by it.

I have the experience to distribute packages through chocolatey, so if you do not mind with that - I willing to create and maintain that package in future
Some chores: CI and test fixes

Hi Felix,

Big fan of your work. I was about to raise an issue asking about adding support for another SQL engine, but I noticed that there wasn't yet any CI to validate that I wasn't breaking any tests - so I thought I'd add that. I also added goreleaser to publish binaries whenever you tag a release - I've included a screenshot of what it generates. Feel free to do with this PR whatever you want 😄

type reflect.Value has no field or method IsZero

Hi, when i try to install sqlbench it fails with the following message. I'm with debian in WSL $uname -a Linux krilin 4.4.0-18362-Microsoft #1049-Microsoft Thu Aug 14 12:01:00 PST 2020 x86_64 GNU/Linux and this version of go $go version go version go1.11.6 linux/amd64

$go get -u github.com/felixge/sqlbench
# github.com/jackc/pgtype
/home/julien/go/src/github.com/jackc/pgtype/aclitem_array.go:90:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
/home/julien/go/src/github.com/jackc/pgtype/bool_array.go:92:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
/home/julien/go/src/github.com/jackc/pgtype/bpchar_array.go:92:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
/home/julien/go/src/github.com/jackc/pgtype/bytea_array.go:73:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
/home/julien/go/src/github.com/jackc/pgtype/cidr_array.go:112:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
/home/julien/go/src/github.com/jackc/pgtype/date_array.go:93:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
/home/julien/go/src/github.com/jackc/pgtype/enum_array.go:90:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
/home/julien/go/src/github.com/jackc/pgtype/float4_array.go:92:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
/home/julien/go/src/github.com/jackc/pgtype/float8_array.go:92:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
/home/julien/go/src/github.com/jackc/pgtype/hstore_array.go:73:49: reflectedValue.IsZero undefined (type reflect.Value has no field or method IsZero)
/home/julien/go/src/github.com/jackc/pgtype/hstore_array.go:73:49: too many errors

Baseline

Support benchmarking against a recorded baseline. Useful when comparing different PostgreSQL versions, server configurations, indexes, etc.

Hopefully this will help with the use cases suggested by @anarazel on twitter: https://twitter.com/AndresFreundTec/status/1308146736917962752

# realtime output, and record the results for 1000 iterations into a csv file
sqlbench -n 1000 -o baseline.csv examples/sum/*.sql
# realtime output, and compare 1000 iterations to a baseline recording
sqlbench -n 1000 -i baseline.csv examples/sum/*.sql

Output from second run against the baseline (same PG version and settings, so no big diffs expected)

         |    gauss     |    window    |  recursive    
---------+--------------+--------------+---------------
  n      | 1000 (1.00x) | 1000 (1.00x) | 1000 (1.00x)  
  min    | 0.35 (0.99x) | 1.24 (0.98x) | 1.81 (0.98x)  
  max    | 0.80 (1.00x) | 2.20 (0.91x) | 3.26 (0.99x)  
  mean   | 0.39 (1.00x) | 1.36 (0.99x) | 2.00 (0.99x)  
  stddev | 0.04 (0.76x) | 0.09 (0.76x) | 0.14 (0.86x)  
  median | 0.37 (1.01x) | 1.32 (0.99x) | 1.95 (0.99x)  
  p90    | 0.42 (0.96x) | 1.47 (0.98x) | 2.17 (0.98x)  
  p95    | 0.45 (0.95x) | 1.54 (0.96x) | 2.28 (0.97x)

add p-value

Similar stat tools should have a p-value in output. It helps us to understand is difference is real or we are lucky.

Look at https://godoc.org/golang.org/x/perf/cmd/benchstat for example.
How to approach supporting alternative drivers

In the README (and I think I remember on Twitter) you mentioned being open to PRs for alternative SQL implementations. At $dayjob we use MS SQL Server and so I thought I might try my hand at a PR to add support for that. Before getting started, I thought it would be useful to see if you had any thoughts / opinions on how this should be done.

Some current code to think about

There are at least a few things worth thinking about. The first is how we would specify the driver name, e.g. currently "pgx" here: https://github.com/felixge/sqlbench/blob/a7422d928ac6fd10ee4e91e52619da3be738ca0e/main.go#L77

And how it relates to the connection string here: https://github.com/felixge/sqlbench/blob/a7422d928ac6fd10ee4e91e52619da3be738ca0e/main.go#L35

There's also the EXPLAIN .. FORMAT JSON duration thing which I assume has equivalents in SQL Server and others, but to be honest right now I don't care so much about that, I think it would be fine to just have client-measurement only support at least to begin with!

https://github.com/felixge/sqlbench/blob/a7422d928ac6fd10ee4e91e52619da3be738ca0e/query_duration.go#L76

Some options

I guess the "easy" option is to add another command-line flag where the user can specify a driver name if they don't want pgx. This is probably good enough for MVP - anyone using this tool at this stage is probably happy to get their hands dirty!

Another option could be to deduce the driver name based on the schema in the connection string URL. I'm used to this from my days of Rails, but come to think of it, I've never seen it at $dayjob for MS SQL Server.. so maybe it wouldn't be obvious, and we'd have to document what schemas are available 🤔

Any thoughts? I'm inclined to just go with the easy option and not over-engineer this thing and lock this project into unhelpful complexity on day 2 😂

sqlbench measures and compares the execution time of one or more SQL queries.

sqlbench

Install

Examples

Usage

How It Works

Tutorial

Todos

License

Owner

Felix Geisendörfer

Comments

Support installation via Chocolatey

Some chores: CI and test fixes

type reflect.Value has no field or method IsZero

Baseline

add p-value

How to approach supporting alternative drivers

Some current code to think about

Some options

Related tags

Run SQL queries against JSON, CSV, Excel, Parquet, and more.

Parses a file and associate SQL queries to a map. Useful for separating SQL from code logic

write APIs using direct SQL queries with no hassle, let's rethink about SQL

A tool to run queries in defined frequency and expose the count as prometheus metrics. Supports MongoDB and SQL

CLI tool that can execute SQL queries on CSV, LTSV, JSON and TBLN. Can output to various formats.

Querycrate - A simple library that loads and keeps SQL queries from files

Mergestat - a command-line tool for running SQL queries on git repositories and related data sources

pggen - generate type safe Go methods from Postgres SQL queries

SQL API is designed to be able to run queries on databases without any configuration by simple HTTP call.

Get data from .csv files use SQL-like queries.

Opionated sql formatter for use with .go files containing backticked queries

Use SQL to instantly query users, groups, applications and more from Okta. Open source CLI. No DB required.

Use SQL to instantly query instances, networks, databases, and more from Scaleway. Open source CLI. No DB required.

Use SQL to query information including Users, Groups, Clients, Roles and more from Keycloak.

Use SQL to query instances, domains and more from Prometheus.

Use SQL to instantly query resources, data sources and more from Terraform code. Open source CLI. No DB required.

Use SQL to query databases, logs and more from PlanetScale

Go-sql-reader - Go utility to read the externalised sql with predefined tags

This Service collects slow queries and returns them in response.