A high-speed data import tool for TiDB

Last update: Dec 20, 2022

Comments: 17

TiDB Lightning

TiDB Lightning is a tool for fast full import of large amounts of data into a TiDB cluster. Currently, we support reading SQL dump exported via mydumper.

Contributing

Contributions are welcomed and greatly appreciated. See CONTRIBUTING.md for details on submitting patches and the contribution workflow.

License

TiDB Lightning is under the Apache 2.0 license. See the LICENSE file for details.

More resources

TiDB documentation
- English
- 简体中文
TiDB blog
- English
- 简体中文
TiDB Monthly

Owner

PingCAP

The team behind TiDB TiKV, an open source MySQL compatible NewSQL HTAP database

https://github.com/pingcap/tidb-lightning

Comments

restore: Try to create tables in parallel

What problem does this PR solve?

Issue Number: close #434

What is changed and how it works?

add schemaStmt hold one statement(create db|table|view)
add schemaJob holds whole statements of one restore schema job
add restoreSchemaWorker produce a async goroutine to create restore schema jobs(as producer)
set "hardcoded"(16 goroutines) concurrency when restoreSchema#doJob called(as consumer)

Benchmark

(tests/restore/run.sh $TABLE_COUNT=300) time costs report as below:

Before


________________________________________________________
Executed in  211.51 secs   fish           external 
   usr time   76.28 secs  187.00 micros   76.28 secs 
   sys time   44.62 secs  617.00 micros   44.62 secs 

[2020/12/08 17:06:33.389 +08:00] [INFO] [restore.go:964] ["restore all tables data completed"] [takeTime=1m9.093660687s] []
[2020/12/08 17:06:33.389 +08:00] [INFO] [restore.go:745] ["everything imported, stopping periodic actions"]
[2020/12/08 17:06:33.389 +08:00] [INFO] [restore.go:1409] ["skip full compaction"]
[2020/12/08 17:06:33.411 +08:00] [INFO] [restore.go:294] ["the whole procedure completed"] [takeTime=1m45.325956477s] []
[2020/12/08 17:06:33.411 +08:00] [INFO] [main.go:95] ["tidb lightning exit"]
[2020/12/08 17:06:33.411 +08:00] [INFO] [checksum.go:425] ["service safe point keeper exited"]

After


________________________________________________________
Executed in  213.24 secs   fish           external 
   usr time   78.08 secs  140.00 micros   78.08 secs 
   sys time   44.92 secs  475.00 micros   44.92 secs 

[2020/12/08 16:55:15.821 +08:00] [INFO] [restore.go:820] ["restore all tables data completed"] [takeTime=1m9.754043571s] []
[2020/12/08 16:55:15.821 +08:00] [INFO] [restore.go:601] ["everything imported, stopping periodic actions"]
[2020/12/08 16:55:15.821 +08:00] [INFO] [restore.go:1265] ["skip full compaction"]
[2020/12/08 16:55:15.840 +08:00] [INFO] [restore.go:293] ["the whole procedure completed"] [takeTime=1m42.140242288s] []
[2020/12/08 16:55:15.840 +08:00] [INFO] [main.go:95] ["tidb lightning exit"]
[2020/12/08 16:55:15.840 +08:00] [INFO] [checksum.go:425] ["service safe point keeper exited"]

PS: the benchmark occurs from non-cluster TiDB, it maybe means the only one node(TiDB) as both DDL owner/non-owner stuck the total thread. we should benchmark again in TiDB cluster(multiple DDL non-cluster)

-------- Update ---------

Benchmark 1 * PD|3 * TiDB| 4 * TiKV cluster (single machine)

preset:

mysql> set @@global.tidb_scatter_region = "1";

Concurrency

[2020/12/29 14:55:09.523 +08:00] [INFO] [restore.go:503] ["restore schema completed"] [takeTime=2m15.052150251s] []

Serial

[2020/12/29 15:04:52.746 +08:00] [INFO] [restore.go:357] ["restore schema completed"] [takeTime=2m47.520433308s] []

Check List

Tests

Unit test
Integration test

Side effects

Increased code complexity

Related changes

Need to cherry-pick to the release branch
Need to be included in the release note

Try to create tables in parallel

Feature Request

Is your feature request related to a problem? Please describe:

Currently we perform CREATE TABLE (tidbMgr.InitSchema) in sequence. But experiences in BR shows that running them in parallel is faster pingcap/br#377

Describe the feature you'd like:

Execute the CREATE TABLE in restoreSchema in parallel over 16 connections.

Benchmark that by importing 300 small tables.

Describe alternatives you've considered:

Don't do it.

Teachability, Documentation, Adoption, Optimization:

N/A

Score

600

SIG slack channel

sig-migrate

Mentor

@glorv @lance6716
Update dependencies and remove juju/errors
Replaced juju/errors by pingcap/errors (exported as pkg/errors due to how pingcap/tidb imports it) (LGPL-v3 → BSD-2-clause)

Updated pingcap/tidb to v2.1.0-rc.4 to entirely remove juju/errors from the vendor.

Updated pingcap/pd to v2.1.0-rc.4

Updated pingcap/kvproto to certain master

Updated pingcap/tipb to certain master

Replaced golang/protobuf by gogo/protobuf (BSD-3-clause)

Added opentracing/basictracer-go (Apache-2.0)

Removed the golang.org/x/net dependency as we can use the built-in context package (the two are interchangeable after Go 1.7 anyway)

Removed the explicit dependency on pingcap/tidb-tools and siddontang/go, we're not using glide anymore

Updated some direct dependencies:

Updated BurntSushi/toml from v0.3.0 to v0.3.1 (WTFPL → MIT)

Updated prometheus/client_golang from v0.8.0 to v0.9.0

Updated sirupsen/logrus from v0.11.6 to v1.1.1

Updated golang.org/x/sys to certain master

Updated google.golang.org/grpc from v1.12.0 to v1.15.0

Added the commercial license
restore: update and restore GCLifeTime once when parallel
What problem does this PR solve?

DoChecksum doesn't prepare for parallel case

What is changed and how it works?

There may be multiple DoChecksum running, setting and resetting GC Life Time should not be done when there's unfinished DoChecksum tasks.

Lightning have restoreTables runs simultaneously, so we use this logic

func restoreTables() { // init_helper for some conurrency { go restoreTable() } } func restoreTable() { // call postProcess() -> ... -> DoChecksum() } func DoChecksum() { // using helper's pointers of lock, running jobs counter, original value }

Using variable to count running checksum jobs. Before a remote chechsum call starting, lock, increase this counter, check if it just rised from zero then backup original value and call set GC Life Time logic, unlock. After a remote chechsum finishing, lock, decrease this counter, check if it just drop to zero then call resetting GC Life Time logic, unlock.

The lock, counter, original value are pointed to same per location in one restoreTables.

Check List

Tests

Unit test

Integration test

Side effects

Related changes

Need to cherry-pick to the release branch
restore: check row value count to avoid unexpected encode result
What problem does this PR solve?

~Check row field count before encodes, if row value count is bigger than table field count, directly return an error.~

check column count in tidb encoder and return an error if column count doesn't match table column count

Be compatible with special field _tidb_rowid in getColumnNames and tidb encoder

What is changed and how it works?

Check List

Tests

Unit test

Integration test

Manual test (add detailed scripts or steps below)

No code

Side effects

Related changes

Release Note

Fix the bug that tidb backend will panics if source file columns are more than target table columns
restore: optimize SQL processing speed
DNM: Based on #109 for simplicity of development.

What problem does this PR solve?

Optimizing SQL processing speed

Result:

PR 109: TableConcurrency = 20, RegionConcurrency = 40, the metrics data has been lost due to cluster be cleanup.

Data size: 340G Rate: ~45MB/s Total time: ~= 2h30m (import time: 40m)

PR 110: TableConcurrency = 20, RegionConcurrency = 20, IOConcurrency = 5, Test1 metrics snapshot, Test2 metrics snapshot

CAN NOT REPREDUCE [IO Delay unstable]

Test1: Data size: 146G Rate: 90~160MB/s Total time: ~= 48m (import time: 27m) Test2: Data size: 146G Rate: 130~190MB/s Total time: ~= 46m (import time: 28m) Test3 coming ...

PR 110: TableConcurrency = 40, RegionConcurrency = 40, IOConcurrency = 10 160G, Metrics

2018/12/30 01:16:32.871 restore.go:477: [info] restore all tables data takes 52m59.59960385s 2018/12/30 01:16:32.871 restore.go:366: [info] Everything imported, stopping periodic actions 2018/12/30 01:16:32.871 restore.go:208: [error] run cause error : [types:1292]invalid time format: '{2038 1 19 4 4 36 0}' 2018/12/30 01:16:32.871 restore.go:214: [info] the whole procedure takes 53m7.986292573s

Early conclusion:

Concurrent IO causes delays to lengthen, which lengthens SQL processing time

What is changed and how it works?

Limiting IO concurrency

Check List

Tests

Unit test

Code changes

Side effects

Related changes
Support table routing rules (merging sharded tables)
What problem does this PR solve?

TOOL-142

(Note: still won't handle UNIQUE/PRIMARY key conflict. This needs column-mapping)

What is changed and how it works?

Rename the tables while the loading the files. Since we don't care about the table name when parsing the data files, it becomes very simple to support merging: just associate all those data files to the target table.

This PR supersedes #54. Note that #54 is very large because it attempts to do some refactoring at the same time.

Check List

Tests

Integration test

Code changes

Side effects

Related changes

Need to update the documentation

Need to be included in the release note
backend: add local kv storage backend to get rid of importer
What problem does this PR solve?

Use local key-value storage as a new backend to get ride of the dependency of tikv-importer, thus make lightning easier to use. In our benchmark, the performance of import speed with local mode is as good as importer mode, so this change won't bring any performance loss. Thus it will be a much better choice in compare with tidb backend.

What is changed and how it works?

The logic of local mode is as follows:

Write the data read for csv/mydumper to local kv-value storage pebble.

Batch write sorted kv pairs and generate sst file at each tikv instance. https://github.com/tikv/tikv/pull/7459

ingest the sst file to tikv cluster

In the local mode, because sorted kv data are managed by the lightning process, so there are some change for checkpoint in local mode:

In the restore phase with checkpoint enabled, we save all the chunk checkpoints after the engine is close. This is because when lightning exited before engine closed, there maybe some data written to kv store but not flushed, thus these data may lost. Another approach is to do a flush after each chunk processed, but this is so much slow.

Before we update an engine checkpoint to CheckpointStatusClosed, we will do a flush to the related index-engine to make sure related index kvs are saved.

Skip the CheckpointStatusAllWritten stage for local mode, because at the point , the data/index kv are not flushed, so if lightning exits at this point, we can't promise local kv store contains all the key-values for this engine, thus we have to restore the engine from start.

Add a new meta file for each engine store with the engine db files. This meta containes some data used in the import phase. The meta file is generated when engine is closed.

Changes for tidb-lightning-ctl:

the import-engine command is not supported in local mode. Because now import phase is done by lightning, so this command is not meanful anymore.

NOTE:

We recommend to separate the sorted-kv-dir from the data disk if possible, because the data-disk si read heavy, and the local storage dir is both read and write heavy, use another disk for this temp-store will make at least 10% performance gain.

TODO:

Maybe we should save chunk checkpoints more frequently. One possible approach is after writing specific bytes to pebble, we can arrange a flush for both data&index engine, thus we can save current chunk checkpoints fearlessly.

Check List

Tests

Unit test

Integration test

Manual test (add detailed scripts or steps below)

No code

Side effects

Reusing checkpoint may take more time than importer backend, thus make #303 even worse.

Related changes

Need to cherry-pick to the release branch

Need to update the documentation

Need to update the tidb-ansible repository

Need to be included in the release note
Rewrite data file parser; Insert _tidb_rowid when needed; Update checkpoint structure
What problem does this PR solve?

Completely fix TOOL-462, by recording the _tidb_rowid on non-PkIsHandle tables to ensure idempotence when importing the same chunk twice.

What is changed and how it works?

Assign a Row ID to every row of a table before import starts, so importing two chunks from the same table is no longer order-dependent.

To properly assign a Row ID, we need to know exactly how many rows each chunk has. So we replaced splitFuzzyRegion back to an exact version.

Since we need to read the whole file before importing, we want to make this step as fast as possible. Therefore, I replaced the MDDataReader by a ragel-based parser, which is about 8x faster on my machine.

We also need to record the RowIDs into the checkpoints. The checkpoint tables are modified to accommodate this change. Additionally, the checksums are stored as properties of a chunk instead of the whole table.

To ensure the only global property, the allocator, won't interfere with the data output in future updates, I've created a custom allocator which will panic on any unsupported operation.

Check List

Tests

[x] Unit test

[x] Integration test

[ ] Manual test (add detailed scripts or steps below)

[ ] No code

Code changes

[ ] Has exported function/method change

[ ] Has exported variable/fields change

[ ] Has interface methods change

[x] Has persistent data change

Side effects

[ ] Possible performance regression

[x] Increased code complexity

[x] Breaking backward compatibility (only if you update Lightning after it saved a checkpoint)

Related changes

[x] Need to cherry-pick to the release branch (2.1)

[ ] Need to update the tidb-ansible repository

[ ] Need to update the documentation

[ ] Need to be included in the release note
Restore from S3 compatible API?
A feature request for your roadmap:

Can it be possible to restore directly from a mydumper backup stored in S3? In most cloud deployments this is where user backups will be stored (the S3 API is implemented by many other object stores).

Value

Value description

Support restore to TiDB via S3.

Value score

(TBD) / 5

Workload estimation

(TBD)

Time

GanttStart: 2020-07-27 GanttDue: 2020-09-04 GanttProgress: 100%
restore: ensure the importer engine is closed before recycling the table worker

The close-engine operation is extracted out of Flush() (which now only does ImportEngine). The engine count should now be strictly limited by table-concurrency.
Add progress bar and the final result(pass or failed) in command output

Feature Request

Is your feature request related to a problem? Please describe:

When using lightning command to import data, now user cannot get the progress status nor the final result via command out, Now lightning log and monitor can display those information, but the cli not. For users, the most direct way to see the import progress and cli result also is through cli output, not log file or monitor. Describe the feature you'd like:

Add progress bar and the final result(pass or failed) in command output. Describe alternatives you've considered:

User friendly. Teachability, Documentation, Adoption, Optimization:
use system_time_zone to encode kv if tidb set it
What problem does this PR solve?

Resolve #562

What is changed and how it works?

if tidb time_zone is "SYSTEM", try to use "system_time_zone" to set time_zone for lightning session.

Check List

Tests

Manual test (add detailed scripts or steps below)

Related changes

Need to cherry-pick to the release branch

Need to update the documentation

Need to be included in the release note

Release note

Fix the issue that lightning didn't use tidb's time zone to encode timestamp data.
local backend oom
Bug Report

Please answer these questions before submitting your issue. Thanks!

What did you do? If possible, provide a recipe for reproducing the error. I used lightning to restore csv files (tpcc 5000 warehouses).

What did you expect to see? Restored successfully.

What did you see instead? Lightning oom. In January, the memory usage of lightning is about 20~30 GB. But now it costs at least 60GB.

Versions of the cluster

TiDB-Lightning version (run tidb-lightning -V):

Release Version: v5.0.0-rc-21-g230eef2 Git Commit Hash: 230eef2a6e16648a49a4c74910dca693781012c4 Git Branch: master UTC Build Time: 2021-02-04 03:10:38 Go Version: go version go1.15.6 linux/amd64

TiKV-Importer version (run tikv-importer -V):

none

TiKV version (run tikv-server -V):

TiKV Release Version: 5.0.0-rc.x Edition: Community Git Commit Hash: 81c4de98a9a21e4dcf3cce6d7783793b1238044e Git Commit Branch: limit-write-batch-ingest UTC Build Time: 2021-02-04 11:43:07 Rust Version: rustc 1.51.0-nightly (1d0d76f8d 2021-01-24) Enable Features: jemalloc mem-profiling portable sse protobuf-codec test-engines-rocksdb Profile: release

TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

Release Version: v4.0.0-beta.2-2067-g415d14b6a Edition: Community Git Commit Hash: 415d14b6ac65e3c73529d07b4331c2f4917b2701 Git Branch: master UTC Build Time: 2021-01-27 15:27:10 GoVersion: go1.13 Race Enabled: false TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306 Check Table Before Drop: false

Other interesting information (system version, hardware config, etc):

Operation logs

Please upload tidb-lightning.log for TiDB-Lightning if possible

Please upload tikv-importer.log from TiKV-Importer if possible

Other interesting logs

Configuration of the cluster and the task

tidb-lightning.toml for TiDB-Lightning if possible

tikv-importer.toml for TiKV-Importer if possible

inventory.ini if deployed by Ansible

Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus for TiDB-Lightning if possible

tidb-lightning alters the values of timestamp columns

Bug Report

What did you do? If possible, provide a recipe for reproducing the error. Using the tidb-lightning tool to restore a full backup data.

In the full backup data, there is a table that has timestamp columns like this:

CREATE TABLE `users` (
  `id` bigint(20) NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  ...
)

Set all servers (TiDB, PD, TiKV, Lightning, ...)' timezone to UTC

In the script deploy/scripts/start_lightning.sh, set timezone to Asia/Tokyo

#!/bin/bash
set -e
ulimit -n 1000000
cd "/home/ec2-user/deploy" || exit 1
mkdir -p status

export RUST_BACKTRACE=1

export TZ=Asia/Tokyo

echo -n 'sync ... '
stat=$(time sync)
echo ok
echo $stat

nohup ./bin/tidb-lightning -config ./conf/tidb-lightning.toml &> log/tidb_lightning_stderr.log &

echo $! > "status/tidb-lightning.pid"

Start the tidb-lightning tool

$ cd deploy/
$ scripts/start_lightning.sh

What did you expect to see? The tidb-lightning tool should respect the original data (in the full backup), import the original data as it is.

What did you see instead? The tidb-lightning tool has altered the values of timestamp columns. For examples,

Original data (from the full backup):

$ head -2 ./xxxxx.users.000000001.sql
INSERT INTO `users` VALUES
(123456789123456789,'2019-09-03 12:31:02','2019-09-03 12:35:18',...)

Imported data:

> select id, created_at, updated_at from users where id = 123456789123456789;
+--------------------+---------------------+---------------------+
| id                 | created_at          | updated_at          |
+--------------------+---------------------+---------------------+
| 123456789123456789 | 2019-09-03 03:31:02 | 2019-09-03 03:35:18 |
+--------------------+---------------------+---------------------+

So the tidb-lightning tool has altered the values of columns created_at and updated_at. The original values have been subtracted by 9 hours.

Versions of the cluster

TiDB-Lightning version (run tidb-lightning -V):

Release Version: v4.0.9
Git Commit Hash: 56bc32daad19b9dff10104c55300292de959fde3
Git Branch: heads/refs/tags/v4.0.9
UTC Build Time: 2020-12-19 04:48:01
Go Version: go version go1.13 linux/amd64

TiKV-Importer version (run tikv-importer -V)
```
Didn't use
```

TiKV version (run tikv-server -V):

TiKV
Release Version:   4.0.10
Edition:           Community
Git Commit Hash:   2ea4e608509150f8110b16d6e8af39284ca6c93a
Git Commit Branch: heads/refs/tags/v4.0.10
UTC Build Time:    2021-01-15 03:16:35
Rust Version:      rustc 1.42.0-nightly (0de96d37f 2019-12-19)
Enable Features:   jemalloc mem-profiling portable sse protobuf-codec
Profile:           dist_release

TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

+---------------------+
| version()           |
+---------------------+
| 5.7.25-TiDB-v4.0.10 |
+---------------------+

Other interesting information (system version, hardware config, etc):

> show variables like '%time_zone%';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| system_time_zone | UTC    |
| time_zone        | SYSTEM |
+------------------+--------+

$ cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"

Operation logs
- Please upload tidb-lightning.log for TiDB-Lightning if possible
- Please upload tikv-importer.log from TiKV-Importer if possible
- Other interesting logs

Configuration of the cluster and the task

tidb-lightning.toml for TiDB-Lightning if possible

 # lightning Configuration

 [lightning]
 file = "/home/tidb/deploy/log/tidb_lightning.log"
 index-concurrency = 2
 io-concurrency = 5
 level = "info"
 max-backups = 14
 max-days = 28
 max-size = 128
 pprof-port = 8289
 table-concurrency = 6

 [checkpoint]
 enable = true
 schema = "tidb_lightning_checkpoint"
 driver = "file"

 [tikv-importer]
 backend = "local"
 sorted-kv-dir = "/home/tidb/deploy/sorted-kv-dir"

 [mydumper]
 data-source-dir = "/home/tidb/deploy/mydumper/scheduled-backup-20210120-044816"
 no-schema = false
 read-block-size = 65536

 [tidb]
 build-stats-concurrency = 20
 checksum-table-concurrency = 16
 distsql-scan-concurrency = 100
 host = "TIDB_HOST"
 index-serial-scan-concurrency = 20
 log-level = "error"
 password = "xxxxx"
 port = 4000
 status-port = 10080
 user = "root"
 pd-addr = "PD_HOST:2379"

 [post-restore]
 analyze = true
 checksum = true

 [cron]
 log-progress = "5m"
 switch-mode = "5m"

inventory.ini if deployed by Ansible

Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus for TiDB-Lightning if possible

In the TPC-C test, the following error occurred when using local backend lighting

Question

Before asking a question, make sure you have: csv data:110G In the TPC-C test, the following error occurred when using local backend lighting to convert to a CSV file, and then import tidb

read metadata from /data/tikv-20160/import/4ff45145-3a21-4e78-81da-2bca45104be3_10513_5_350_default.sst "read metadata from /data/tikv-20160/import/4ff45145-3a21-4e78-81da-2bca45104be3_10513_5_350_default.sst: Os { code: 2, kind: NotFound, message: \"No such file or directory\" }")"]

panic in tidb backend in strict sql-mode

Bug Report

Please answer these questions before submitting your issue. Thanks!

What did you do? If possible, provide a recipe for reproducing the error. Use lightning tidb backend to import data with config:

[tidb]
sql-mode = "STRICT_ALL_TABLES"

panic backtrace:

goroutine 578 [running]:
github.com/pingcap/tidb/types.(*Datum).ConvertTo(0xc00a0fd220, 0xc0001dcdc0, 0xc0004a0be8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
  /go/pkg/mod/github.com/pingcap/[email protected]/types/datum.go:843 +0xcff
github.com/pingcap/tidb/table.CastValue(0x2b48760, 0xc000c6e000, 0x5, 0x0, 0x25442e2, 0xb, 0xc00cb05371, 0x8, 0x8, 0x0, ...)
  /go/pkg/mod/github.com/pingcap/[email protected]/table/column.go:244 +0xf2
github.com/pingcap/tidb-lightning/lightning/backend.(*tidbEncoder).appendSQL(0xc00810e040, 0xc000c22120, 0xc00a0fd660, 0xc008097770, 0x0, 0x0)
  /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/backend/tidb.go:181 +0x594
github.com/pingcap/tidb-lightning/lightning/backend.(*tidbEncoder).Encode(0xc00810e040, 0xc019f0a3c0, 0xc00062a480, 0xa, 0x10, 0x1, 0xc00a0ec060, 0xa, 0xb, 0x1f37685, ...)
  /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/backend/tidb.go:251 +0x32d
github.com/pingcap/tidb-lightning/lightning/restore.(*chunkRestore).encodeLoop(0xc000c46040, 0x2b02ec0, 0xc008097680, 0xc019f0a360, 0xc008063aa0, 0xc019f0a3c0, 0x2adba00, 0xc00810e040, 0xc00a0ec000, 0xc0192fc000, ...)
  /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:1902 +0x350
github.com/pingcap/tidb-lightning/lightning/restore.(*chunkRestore).restore(0xc000c46040, 0x2b02ec0, 0xc008097680, 0xc008063aa0, 0x0, 0xc000170e00, 0xc000bb2180, 0xc0192fc000, 0x0, 0x0)
  /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:1976 +0x7a4
github.com/pingcap/tidb-lightning/lightning/restore.(*TableRestore).restoreEngine.func1(0xc00c3f8fe0, 0xc0192fc000, 0x2b02ec0, 0xc008097680, 0xc008063aa0, 0xc000000000, 0xc000170e00, 0xc000bb2180, 0xc000c46020, 0xc014f65060, ...)
  /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:1086 +0x175
created by github.com/pingcap/tidb-lightning/lightning/restore.(*TableRestore).restoreEngine
  /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:1078 +0x64c
panic: should never happen

Root Cause: The tidb backend FetchRemoteTableModels implementation is not accurate, it only set Flag in The FieldType and ignore other fields. So when run tidb backend with strict sql-mode, the table.CastValue panic because the fieldtype.Tp is 0(undefined).

Related tags

📝 Go config manage(load,get,set). support JSON, YAML, TOML, INI, HCL, ENV and Flags. Multi file load, data override merge, parse ENV var. Go应用配置加载管理，支持多种格式，多文件加载，远程文件加载，支持数据合并，解析环境变量名

Config Golang's application config manage tool library. 中文说明 Features Support multi format: JSON(default), INI, YAML, TOML, HCL, ENV, Flags JSON conte

Jan 5, 2023

Golang library for managing configuration data from environment variables

envconfig import "github.com/kelseyhightower/envconfig" Documentation See godoc Usage Set some environment variables: export MYAPP_DEBUG=false export

Dec 26, 2022

Manage local application configuration files using templates and data from etcd or consul

confd confd is a lightweight configuration management tool focused on: keeping local configuration files up-to-date using data stored in etcd, consul,

Dec 27, 2022

YML2FSTAB - Convert from yml data to /etc/fstab configuration

Nov 1, 2021

Cfginterpolator is an interpolate library in golang allowing to include data from external sources in your configuration

cfginterpolator cfginterpolator is an interpolate library in golang allowing to include data from external sources in your configuration cfginterpolat

Dec 14, 2021

shops is a simple command-line tool written in Go that helps you simplify the way you manage configuration across a set of machines.

shops is a simple command-line tool written in Go that helps you simplify the way you manage configuration across a set of machines. shops is your configuration management tool of choice when Chef, Puppet, Ansible are all too complicated and all you really want to do is run a bunch of regular shell against a set of hosts.

Jul 5, 2021

Golang Configuration tool that support YAML, JSON, TOML, Shell Environment

Configor Golang Configuration tool that support YAML, JSON, TOML, Shell Environment (Supports Go 1.10+) Usage package main import ( "fmt" "github.c

Dec 29, 2022

An awesome command-line tool to manage Wireguard configurations.

wg-manage A command line tool to centrally manage Wireguard configuration files - all config options are stored in one YAML file that is then used to

Sep 9, 2022

Little Go tool to infer an uncrustify config file from an expected format

uncrustify-infer Little Go tool to infer an uncrustify config file from an expected format Install This tool relies on an uncrustify executable, you m

Oct 8, 2021

A simple tool that utilizes already existing libraries such as joho/godotenv to add .env-files to global path

Go dotenv A simple tool that utilizes already existing libraries such as joho/godotenv to add .env-files to global path. Created as a practical way to

Nov 15, 2021

Tmpl - A tool to apply variables from cli, env, JSON/TOML/YAML files to templates

tmpl allows to apply variables from JSON/TOML/YAML files, environment variables or CLI arguments to template files using Golang text/template and functions from the Sprig project.

Nov 14, 2022

formicidate is a small tool for Go application can update the value of environment variables in a .env file with code

formicidae Update .env files in Go with code. What is fomicidae? formicidate is a small tool for Go application. You can update the value of environme

Jan 23, 2022

Efficient moving window for high-speed data processing.

Sep 4, 2022

Data Connector is a Google Sheets Add-on that lets you import (and export) data to/from Google Sheets

Data Connector Data Connector is a Google Sheets Add-on that lets you import (and export) data to/from Google Sheets. Our roadmap: Connect to JSON/XML

Jul 30, 2022

Dumpling is a fast, easy-to-use tool written by Go for dumping data from the database(MySQL, TiDB...) to local/cloud(S3, GCP...) in multifarious formats(SQL, CSV...).

?? Dumpling Dumpling is a tool and a Go library for creating SQL dump from a MySQL-compatible database. It is intended to replace mysqldump and mydump

Nov 9, 2022

A tool for finding corrupted data rows in TiDB

tidb-bad-rows A tool for finding corrupted data rows in TiDB. It scans the target table and using a divide and conquer paradigm to locate all corrupte

Nov 17, 2021

A high-performance timeline tracing library for Golang, used by TiDB

Minitrace-Go A high-performance, ergonomic timeline tracing library for Golang. Basic Usage package main import ( "context" "fmt" "strcon

Nov 28, 2022

High-speed, flexible tree-based HTTP router for Go.

httptreemux High-speed, flexible, tree-based HTTP router for Go. This is inspired by Julien Schmidt's httprouter, in that it uses a patricia tree, but

Dec 28, 2022

Package mafsa implements Minimal Acyclic Finite State Automata in Go, essentially a high-speed, memory-efficient, Unicode-friendly set of strings.

MA-FSA for Go Package mafsa implements Minimal Acyclic Finite State Automata (MA-FSA) with Minimal Perfect Hashing (MPH). Basically, it's a set of str

Oct 27, 2022

CryptoPump is a cryptocurrency trading bot that focuses on high speed and flexibility

CryptoPump is a cryptocurrency trading bot that focuses on high speed and flexibility. The algorithms utilize Go Language and WebSockets to react in real-time to market movements based on Bollinger statistical analysis and pre-defined profit margins.

Nov 24, 2022

A high-speed data import tool for TiDB

TiDB Lightning

Contributing

License

More resources

Owner

PingCAP

Comments

restore: Try to create tables in parallel

What problem does this PR solve?

What is changed and how it works?

Benchmark

Before

After

Benchmark 1 * PD|3 * TiDB| 4 * TiKV cluster (single machine)

Concurrency

Serial

Check List

Try to create tables in parallel

Feature Request

Score

SIG slack channel

Mentor

Update dependencies and remove juju/errors

restore: update and restore GCLifeTime once when parallel

What problem does this PR solve?

What is changed and how it works?

Check List

restore: check row value count to avoid unexpected encode result

What problem does this PR solve?

What is changed and how it works?

Check List

restore: optimize SQL processing speed

What problem does this PR solve?

What is changed and how it works?

Check List

Support table routing rules (merging sharded tables)

What problem does this PR solve?

What is changed and how it works?

Check List

backend: add local kv storage backend to get rid of importer

What problem does this PR solve?

What is changed and how it works?

Check List

Rewrite data file parser; Insert _tidb_rowid when needed; Update checkpoint structure

What problem does this PR solve?

What is changed and how it works?

Check List

Restore from S3 compatible API?

Value

Value description

Value score

Workload estimation

Time

restore: ensure the importer engine is closed before recycling the table worker

Add progress bar and the final result(pass or failed) in command output

Feature Request

use system_time_zone to encode kv if tidb set it

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

local backend oom

Bug Report

tidb-lightning alters the values of timestamp columns

Bug Report

In the TPC-C test, the following error occurred when using local backend lighting

Question

panic in tidb backend in strict sql-mode

Bug Report

Related tags

📝 Go config manage(load,get,set). support JSON, YAML, TOML, INI, HCL, ENV and Flags. Multi file load, data override merge, parse ENV var. Go应用配置加载管理，支持多种格式，多文件加载，远程文件加载，支持数据合并，解析环境变量名

Golang library for managing configuration data from environment variables

Manage local application configuration files using templates and data from etcd or consul

YML2FSTAB - Convert from yml data to /etc/fstab configuration

Cfginterpolator is an interpolate library in golang allowing to include data from external sources in your configuration

shops is a simple command-line tool written in Go that helps you simplify the way you manage configuration across a set of machines.

Golang Configuration tool that support YAML, JSON, TOML, Shell Environment

An awesome command-line tool to manage Wireguard configurations.

Little Go tool to infer an uncrustify config file from an expected format