Dumpling is a fast, easy-to-use tool written by Go for dumping data from the database(MySQL, TiDB...) to local/cloud(S3, GCP...) in multifarious formats(SQL, CSV...).

🥟 Dumpling

Build Status codecov API Docs Go Report Card FOSSA Status Discuss in Slack

Dumpling is a tool and a Go library for creating SQL dump from a MySQL-compatible database. It is intended to replace mysqldump and mydumper when targeting TiDB.

You may read the design document, English user guide and 中文使用手册 for details.

Features

Dumpling is currently in early development stage, and most features are incomplete. Contributions are welcomed!

  • SQL dump is split into multiple files (like mydumper) for easy management.
  • Export multiple tables in parallel to speed up execution.
  • Multiple output formats: SQL, CSV, ...
  • Write to cloud storage (S3, GCS) natively
  • Advanced table filtering

Any questions? Let's discuss in #sig-migrate in Slack!

Building

  1. Install Go 1.13 or above
  2. Run make build to compile. The output is in bin/dumpling.
  3. Run make test to run the unit tests.
  4. Run make integration_test to run integration tests. For integration test:

License

Dumpling is under the Apache 2.0 license. See the LICENSE file for details.

FOSSA Status

Owner
PingCAP
The team behind TiDB TiKV, an open source MySQL compatible NewSQL HTAP database
PingCAP
Comments
  • Support clustered index

    Support clustered index

    Feature Request

    Is your feature request related to a problem? Please describe:

    Currently when clustered index is enabled, the _tidb_rowid column is missing (?). Dumpling will use _tidb_rowid to parallel dumping tables.

    Describe the feature you'd like:

    Support clustered index.

    Reference: https://github.com/pingcap/tidb/pull/17044/files

    Describe alternatives you've considered:

    No alternatives.

    Teachability, Documentation, Adoption, Optimization:

  • dump TiDB INFORMATION_SCHEMA failed

    dump TiDB INFORMATION_SCHEMA failed

    [2020/03/01 21:07:14.710 +08:00] [WARN] [session.go:1123] ["compile SQL failed"] [conn=51] [error="[planner:1054]Unknown column '_tidb_rowid' in 'field list'"] [SQL="SELECT _tidb_rowid from INFORMATION_SCHEMA.INSPECTION_RESULT LIMIT 0"]
    

    Seem table INSPECTION_RESULT has no _tidb_rowid

  • *: migrate test-infra to testify

    *: migrate test-infra to testify

    What problem does this PR solve?

    pingcap/check is no longer under maintenance, migrate to testify.

    This PR can be somewhat regarded as a subtask of https://github.com/pingcap/tidb/issues/26022

    This closes #346 .

    Release note

    • No release note
  • export: ignore unused parameters

    export: ignore unused parameters

    What problem does this PR solve?

    None

    What is changed and how it works?

    None

    Check List

    Tests

    • No code

    Side effects

    None

    Related changes

    None

    Release note

    • No release note
  • Tying to get dumpling to use GCS only

    Tying to get dumpling to use GCS only

    Question

    I have read the docs and know the GCS stuff is experimental but I wanted to see if I could replace mysqldump. I set the --gcs.endpoint URL and

    A) it never wrote the gcs. (did not error either) B) Is it possible to write straight to GCS? I don't have enough space on this host to write to disk.

    Thanks, Jeff

  • Write directly to S3

    Write directly to S3

    What problem does this PR solve?

    #8 Support S3

    What is changed and how it works?

    Switch to using VFS which supports local files, S3 and GCS. For now this only supports S3 but it should be easy to add support to e.g. GCS.

    You simply prefix the output directory with "s3://bucket" e.g.:

    dumpling -o s3://mybycket/dir
    

    For now all config is passed in using the standard AWS environment variables.

    Check List

    Tests

    • Unit test
    • Integration test
    • Manual test (add detailed scripts or steps below) Will test by running dumpling against our staging databases.

    Side effects

    • Introduces a new dependency on github.com/c2fo/vfs

    Related changes

    • Need to update the documentation

    Release note

    • support writing directly to S3
  • Let Dumpling set GC safepoint ttl with new pd interface

    Let Dumpling set GC safepoint ttl with new pd interface

    When Dumpling dumps large data it needs to set TiDB snapshot at a ts. When the dumping time is too long tikv may gc this TiDB snapshot ts and Dumpling will fail to dump, exits with an error.

    To avoid this problem, users need to set tikv_gc_life_time longer, which is explained in the end of https://pingcap.com/docs-cn/stable/export-or-backup-using-dumpling. However, it's not convenient. The users need to run

    update mysql.tidb set VARIABLE_VALUE = '720h' where VARIABLE_NAME = 'tikv_gc_life_time';
    

    before dumping and run

    update mysql.tidb set VARIABLE_VALUE = '10m' where VARIABLE_NAME = 'tikv_gc_life_time';
    

    after dumping.

    But, Dumpling shouldn't change this variable because dumping process may fail in the half.

    In v4.0.0, PD has a new interface UpdateServiceGCSafePoint which allows service to set TiKV GC safe point with a ttl, which is very suitable for dumpling's scene. What's more, TiCDC has already used this interface in their service.

    So, in summary, I think dumpling can also use this feature to make sure dumped ts is not "GCed".

  • export: don't +1 TSO when set GC safepoint

    export: don't +1 TSO when set GC safepoint

    What problem does this PR solve?

    this error is hard to reproduce, so I added a 10 minutes sleep before any dump indeed occurs. The error may be slightly different but I think the root cause is same . Here's the log

    [2021/06/03 19:20:30.262 +08:00] [INFO] [gc_worker.go:1531] ["[gc worker] sent safe point to PD"] [uuid=5e7451a0ff80001] ["safe point"=425385803120640001]
    ...
    [2021/06/03 19:22:21.578 +08:00] [INFO] [set.go:312] ["load snapshot info schema"] [conn=5813] [SnapshotTS=425385803120640000]
    ...
    [2021/06/03 19:22:21.579 +08:00] [INFO] [conn.go:812] ["command dispatched failed"] [conn=5813] [connInfo="id:5813, addr:127.0.0.1:62687 status:10, collation:utf8mb4_general_ci, user:root"] [command=Query] [status="inTxn:0, autocommit:1"] [sql="SET tidb_snapshot='2021-06-03 19:01:00'"] [txn_mode=PESSIMISTIC] [err="[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2021-06-03 19:01:00 +0800 CST, GC safe point is 2021-06-03 19:01:00 +0800 CST\ngithub.com/pingcap/errors.AddStack\n\t/Users/pingcap/gopkg/pkg/mod/github.com/pingcap/[email protected]/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\t/Users/pingcap/gopkg/pkg/mod/github.com/pingcap/[email protected]/normalize.go:156\ngithub.com/pingcap/tidb/store/tikv.(*KVStore).CheckVisibility\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/store/tikv/kv.go:104\ngithub.com/pingcap/tidb/store/tikv.(*tikvSnapshot).Get\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/store/tikv/snapshot.go:387\ngithub.com/pingcap/tidb/structure.(*TxStructure).Get\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/structure/string.go:36\ngithub.com/pingcap/tidb/structure.(*TxStructure).GetInt64\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/structure/string.go:45\ngithub.com/pingcap/tidb/meta.(*Meta).GetSchemaVersion\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/meta/meta.go:260\ngithub.com/pingcap/tidb/domain.(*Domain).loadInfoSchema\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/domain/domain.go:102\ngithub.com/pingcap/tidb/domain.(*Domain).GetSnapshotInfoSchema\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/domain/domain.go:316\ngithub.com/pingcap/tidb/executor.(*SetExecutor).loadSnapshotInfoSchemaIfNeeded\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/executor/set.go:314\ngithub.com/pingcap/tidb/executor.(*SetExecutor).setSysVariable\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/executor/set.go:195\ngithub.com/pingcap/tidb/executor.(*SetExecutor).Next\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/executor/set.go:108\ngithub.com/pingcap/tidb/executor.Next\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/executor/executor.go:277\ngithub.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelayExecutor\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/executor/adapter.go:537\ngithub.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelay\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/executor/adapter.go:418\ngithub.com/pingcap/tidb/executor.(*ExecStmt).Exec\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/executor/adapter.go:368\ngithub.com/pingcap/tidb/session.runStmt\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/session/session.go:1531\ngithub.com/pingcap/tidb/session.(*session).ExecuteStmt\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/session/session.go:1426\ngithub.com/pingcap/tidb/server.(*TiDBContext).ExecuteStmt\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/server/driver_tidb.go:218\ngithub.com/pingcap/tidb/server.(*clientConn).handleStmt\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/server/conn.go:1630\ngithub.com/pingcap/tidb/server.(*clientConn).handleQuery\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/server/conn.go:1503\ngithub.com/pingcap/tidb/server.(*clientConn).dispatch\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/server/conn.go:1037\ngithub.com/pingcap/tidb/server.(*clientConn).Run\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/server/conn.go:795\ngithub.com/pingcap/tidb/server.(*Server).onConn\n\t/Users/pingcap/workspace/optimization-build-tidb-darwin-amd/go/src/github.com/pingcap/tidb/server/server.go:477\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"]
    

    What is changed and how it works?

    as title

    Check List

    Tests

    • Manual test (TODO)

    Side effects

    Related changes

    • Need to cherry-pick to the release branch

    Release note

    What problem does this PR solve?

    To solve the #274.

    What is changed and how it works?

    STEP1:

    Add the dump phase network speed limit feature.
    Mainly modify the `writer` related code.
    

    STEP2:

    Modify the config code of DM later.
    

    STEP3:

    Update the document.
    

    This PR is STEP1.

    Check List

    Tests

    • Unit test
    • Integration test
    • Manual test (add detailed scripts or steps below)

    Observing whether the network traffic in Grafana is stable.

    Release note

    No release note

  • make version check more compatible

    make version check more compatible

    What problem does this PR solve?

    cherry-pick pingcap/tidb#29500 to release-5.2

    What is changed and how it works?

    Check List

    Tests

    • Unit test
    • Integration test
    • Manual test (add detailed scripts or steps below)
    • No code

    Side effects

    • Possible performance regression
    • Increased code complexity
    • Breaking backward compatibility

    Related changes

    • Need to cherry-pick to the release branch
    • Need to update the documentation

    Release note

  • *: use `show full tables` in ListAllDatabasesTables

    *: use `show full tables` in ListAllDatabasesTables

    What problem does this PR solve?

    close #322

    What is changed and how it works?

    when meet a specific buggy version of mysql [8.0.3 8.0.23), use SQL show full tables to get all table names from upstream database.

    As the average data size per row of data is not taken from the Show full table results, dumpling will set this value to 200000. This has the potential to cause the user to export too many or too few files, but I find this more acceptable than getting stuck in the dump stage.

    Check List

    Tests

    • Unit test
    • Integration test

    Side effects

    • Possible performance regression
    • Increased code complexity

    Related changes

    • Need to cherry-pick to the release branch
    • Need to update the documentation

    Release note

    • fix pending on show table status in some mysql version
  • Optimize the logic of exporting data from aurora using dumpling, so as not to affect the performance of aurora

    Optimize the logic of exporting data from aurora using dumpling, so as not to affect the performance of aurora

    aurora does not support ftwrl when export data from aurora using dumlping --consistency =none , it will start a big transaction, and this will affect aurora performance.

    Hope that when export data from aurora using dumlping --consistency =none, it will not start transactions, and affect aurora performance.

  • Dumpling cannot export tables on tidb-cloud better

    Dumpling cannot export tables on tidb-cloud better

    What did you do?

    dumpling part table on tidb-cloud(dbaas). gc_life_time not support auto set when dumpling can't ping pd-server

    What did you expect to see?

    gc_life_time can be set adaptively

    What did you see instead?

    As gc progresses, dumpling exports an error

    Versions of the cluster

    v5.4.0

  • dumpling database with -B flag failed

    dumpling database with -B flag failed

    What did you do?

    dumpling database using following commmand: tiup dumpling --host -u -p -P -B -o

    What did you expect to see?

    database being exported

    What did you see instead?

    Following error message: Error: use tiup install <database> to install component <database> first: component not installed.

    If we use --database instead of -B, dumpling commond is executed successfully.

    Versions of the cluster

    Release version: v5.1.2 Git commit hash: 0d8ccce9feb254494e17b16d56bce8a2fc3e2469 Git branch: heads/refs/tags/v5.1.2 Build timestamp: 2021-09-27 02:47:55Z Go version: go version go1.16.4 linux/amd64

  • add err message for dumpling

    add err message for dumpling

    What problem does this PR solve?

    Close #370

    What is changed and how it works?

    Add error message when --consistency is lock and --snapshot parameter is provided.

    Check List

    Tests

    • Unit test
    • Manual test (add detailed scripts or steps below)

    Release note

    No release note

  • unix_socket support

    unix_socket support

    Mariadb supports Unix Socket Authentication Plugin - https://mariadb.com/kb/en/authentication-plugin-unix-socket/ With it it is possible to login without password, for example:

    [root@localhost ~]# mysql
    Welcome to the MariaDB monitor.  Commands end with ; or \g.
    Your MariaDB connection id is 5672
    Server version: 10.4.19-MariaDB-log MariaDB Server
    
    Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
    
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    

    dumpling seems to not support it:

    [root@localhost ~]# ./dumpling -P3306
    Release version: 
    Git commit hash: 
    Git branch:      
    Build timestamp: 2021-06-14 04:26:05Z
    Go version:      go version go1.16.5 linux/amd64
    
    [2021/06/21 15:55:51.754 +00:00] [INFO] [versions.go:55] ["Welcome to dumpling"] ["Release Version"=] ["Git Commit Hash"=] ["Git Branch"=] ["Build timestamp"="2021-06-14 04:26:05"] ["Go Version"="go version go1.16.5 linux/amd64"]
    
    create dumper failed: sql: SELECT version(): Error 1045: Access denied for user 'root'@'127.0.0.1' (using password: NO)
    
    [root@localhost ~]# ./dumpling -hlocalhost -P3306
    Release version: 
    Git commit hash: 
    Git branch:      
    Build timestamp: 2021-06-14 04:26:05Z
    Go version:      go version go1.16.5 linux/amd64
    
    [2021/06/21 15:56:05.093 +00:00] [INFO] [versions.go:55] ["Welcome to dumpling"] ["Release Version"=] ["Git Commit Hash"=] ["Git Branch"=] ["Build timestamp"="2021-06-14 04:26:05"] ["Go Version"="go version go1.16.5 linux/amd64"]
    
    create dumper failed: sql: SELECT version(): Error 1045: Access denied for user 'root'@'::1' (using password: NO)
    

    It would be nice to support it.

Lightweight SQL database written in Go for prototyping and playing with text (CSV, JSON) data

gopicosql Lightweight SQL database written in Go for prototyping and playing wit

Jul 27, 2022
OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.

OctoSQL OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases, streaming sources and file formats using

Dec 29, 2022
Use SQL to data from CSV files. Open source CLI. No DB required.
Use SQL to data from CSV files. Open source CLI. No DB required.

CSV Plugin for Steampipe Use SQL to query data from CSV files. Get started → Documentation: Table definitions & examples Community: Slack Channel Get

Nov 10, 2022
Get data from .csv files use SQL-like queries.

csvql Get data from .csv files use SQL-like queries. Задание Необходимо написать консольную программу, которая по заданному клиентом запросу осуществл

Dec 7, 2021
simply SQL Parser for Go ( powered by vitess and TiDB )

vitess-sqlparser Simply SQL and DDL parser for Go (powered by vitess and TiDB ) this library inspired by https://github.com/xwb1989/sqlparser (origina

Jan 4, 2023
A tool for finding corrupted data rows in TiDB

tidb-bad-rows A tool for finding corrupted data rows in TiDB. It scans the target table and using a divide and conquer paradigm to locate all corrupte

Nov 17, 2021
PgSQL compatible on distributed database TiDB

TiDB for PostgreSQL Introduction TiDB for PostgreSQL is an open source launched by Digital China Cloud Base to promote and integrate into the open sou

Dec 26, 2022
A Go rest API project that is following solid and common principles and is connected to local MySQL database.
A Go rest API project that is following solid and common principles and is connected to local MySQL database.

This is an intermediate-level go project that running with a project structure optimized RESTful API service in Go. API's of that project is designed based on solid and common principles and connected to the local MySQL database.

Dec 25, 2022
BQB is a lightweight and easy to use query builder that works with sqlite, mysql, mariadb, postgres, and others.

Basic Query Builder Why Simple, lightweight, and fast Supports any and all syntax by the nature of how it works Doesn't require learning special synta

Dec 7, 2022
Google Cloud Spanner driver for Go's database/sql package.

go-sql-spanner Google Cloud Spanner driver for Go's database/sql package. This support is currently in the Preview release status. import _ "github.co

Dec 11, 2022
WAL-G is an archival restoration tool for PostgreSQL, MySQL/MariaDB, and MS SQL Server (beta for MongoDB and Redis).

WAL-G is an archival restoration tool for PostgreSQL, MySQL/MariaDB, and MS SQL Server (beta for MongoDB and Redis).

Jan 1, 2023
Run SQL queries against JSON, CSV, Excel, Parquet, and more.

Run SQL queries against JSON, CSV, Excel, Parquet, and more This is a CLI companion to DataStation (a GUI) for running SQL queries against data files.

Dec 31, 2022
Additions to Go's database/sql for super fast performance and convenience.

gocraft/dbr (database records) gocraft/dbr provides additions to Go's database/sql for super fast performance and convenience. $ go get -u github.com/

Jan 1, 2023
CRUD API example is written in Go using net/http package and MySQL database.
CRUD API example is written in Go using net/http package and MySQL database.

GoCrudBook CRUD API example is written in Go using net/http package and MySQL database. Requirements Go MySQL Code Editor Project Structure GoCrudBook

Dec 10, 2022
mysql to mysql 轻量级多线程的库表数据同步

goMysqlSync golang mysql to mysql 轻量级多线程库表级数据同步 测试运行 设置当前binlog位置并且开始运行 go run main.go -position mysql-bin.000001 1 1619431429 查询当前binlog位置,参数n为秒数,查询结

Nov 15, 2022
A go package to add support for data at rest encryption if you are using the database/sql.

go-lockset A go package to add support for data at rest encryption if you are using the database/sql to access your database. Installation In your Gol

Jan 30, 2022
Go package providing simple database and server interfaces for the CSV files produced by the sfomuseum/go-libraryofcongress package
Go package providing simple database and server interfaces for the CSV files produced by the sfomuseum/go-libraryofcongress package

go-libraryofcongress-database Go package providing simple database and server interfaces for the CSV files produced by the sfomuseum/go-libraryofcongr

Oct 29, 2021
A tool I made to quickly store bug bounty program scopes in a local sqlite3 database

GoScope A tool I made to quickly store bug bounty program scopes in a local sqlite3 database. Download or copy a Burpsuite configuration file from the

Nov 18, 2021