A high-speed data import tool for TiDB

TiDB Lightning

Build Status Coverage Status FOSSA Status

TiDB Lightning is a tool for fast full import of large amounts of data into a TiDB cluster. Currently, we support reading SQL dump exported via mydumper.

TiDB Lightning architecture

Contributing

Contributions are welcomed and greatly appreciated. See CONTRIBUTING.md for details on submitting patches and the contribution workflow.

License

TiDB Lightning is under the Apache 2.0 license. See the LICENSE file for details.

FOSSA Status

More resources

Owner
PingCAP
The team behind TiDB TiKV, an open source MySQL compatible NewSQL HTAP database
PingCAP
Comments
  • restore: Try to create tables in parallel

    restore: Try to create tables in parallel

    What problem does this PR solve?

    Issue Number: close #434

    What is changed and how it works?

    • add schemaStmt hold one statement(create db|table|view)
    • add schemaJob holds whole statements of one restore schema job
    • add restoreSchemaWorker produce a async goroutine to create restore schema jobs(as producer)
    • set "hardcoded"(16 goroutines) concurrency when restoreSchema#doJob called(as consumer)

    Benchmark

    (tests/restore/run.sh $TABLE_COUNT=300) time costs report as below:

    Before

    
    ________________________________________________________
    Executed in  211.51 secs   fish           external 
       usr time   76.28 secs  187.00 micros   76.28 secs 
       sys time   44.62 secs  617.00 micros   44.62 secs 
    
    [2020/12/08 17:06:33.389 +08:00] [INFO] [restore.go:964] ["restore all tables data completed"] [takeTime=1m9.093660687s] []
    [2020/12/08 17:06:33.389 +08:00] [INFO] [restore.go:745] ["everything imported, stopping periodic actions"]
    [2020/12/08 17:06:33.389 +08:00] [INFO] [restore.go:1409] ["skip full compaction"]
    [2020/12/08 17:06:33.411 +08:00] [INFO] [restore.go:294] ["the whole procedure completed"] [takeTime=1m45.325956477s] []
    [2020/12/08 17:06:33.411 +08:00] [INFO] [main.go:95] ["tidb lightning exit"]
    [2020/12/08 17:06:33.411 +08:00] [INFO] [checksum.go:425] ["service safe point keeper exited"]
    

    After

    
    ________________________________________________________
    Executed in  213.24 secs   fish           external 
       usr time   78.08 secs  140.00 micros   78.08 secs 
       sys time   44.92 secs  475.00 micros   44.92 secs 
    
    [2020/12/08 16:55:15.821 +08:00] [INFO] [restore.go:820] ["restore all tables data completed"] [takeTime=1m9.754043571s] []
    [2020/12/08 16:55:15.821 +08:00] [INFO] [restore.go:601] ["everything imported, stopping periodic actions"]
    [2020/12/08 16:55:15.821 +08:00] [INFO] [restore.go:1265] ["skip full compaction"]
    [2020/12/08 16:55:15.840 +08:00] [INFO] [restore.go:293] ["the whole procedure completed"] [takeTime=1m42.140242288s] []
    [2020/12/08 16:55:15.840 +08:00] [INFO] [main.go:95] ["tidb lightning exit"]
    [2020/12/08 16:55:15.840 +08:00] [INFO] [checksum.go:425] ["service safe point keeper exited"]
    

    PS: the benchmark occurs from non-cluster TiDB, it maybe means the only one node(TiDB) as both DDL owner/non-owner stuck the total thread. we should benchmark again in TiDB cluster(multiple DDL non-cluster)

    -------- Update ---------

    Benchmark 1 * PD|3 * TiDB| 4 * TiKV cluster (single machine)

    preset:

    mysql> set @@global.tidb_scatter_region = "1";
    

    Concurrency

    [2020/12/29 14:55:09.523 +08:00] [INFO] [restore.go:503] ["restore schema completed"] [takeTime=2m15.052150251s] []

    Serial

    [2020/12/29 15:04:52.746 +08:00] [INFO] [restore.go:357] ["restore schema completed"] [takeTime=2m47.520433308s] []

    Check List

    Tests

    • Unit test
    • Integration test

    Side effects

    • Increased code complexity

    Related changes

    • Need to cherry-pick to the release branch
    • Need to be included in the release note
  • Try to create tables in parallel

    Try to create tables in parallel

    Feature Request

    Is your feature request related to a problem? Please describe:

    Currently we perform CREATE TABLE (tidbMgr.InitSchema) in sequence. But experiences in BR shows that running them in parallel is faster pingcap/br#377

    Describe the feature you'd like:

    Execute the CREATE TABLE in restoreSchema in parallel over 16 connections.

    Benchmark that by importing 300 small tables.

    Describe alternatives you've considered:

    Don't do it.

    Teachability, Documentation, Adoption, Optimization:

    N/A

    Score

    600

    SIG slack channel

    sig-migrate

    Mentor

    @glorv @lance6716

  • Update dependencies and remove juju/errors

    Update dependencies and remove juju/errors

    1. Replaced juju/errors by pingcap/errors (exported as pkg/errors due to how pingcap/tidb imports it) (LGPL-v3 → BSD-2-clause)

    2. Updated pingcap/tidb to v2.1.0-rc.4 to entirely remove juju/errors from the vendor.

      • Updated pingcap/pd to v2.1.0-rc.4
      • Updated pingcap/kvproto to certain master
      • Updated pingcap/tipb to certain master
      • Replaced golang/protobuf by gogo/protobuf (BSD-3-clause)
      • Added opentracing/basictracer-go (Apache-2.0)
    3. Removed the golang.org/x/net dependency as we can use the built-in context package (the two are interchangeable after Go 1.7 anyway)

    4. Removed the explicit dependency on pingcap/tidb-tools and siddontang/go, we're not using glide anymore

    5. Updated some direct dependencies:

      • Updated BurntSushi/toml from v0.3.0 to v0.3.1 (WTFPL → MIT)
      • Updated prometheus/client_golang from v0.8.0 to v0.9.0
      • Updated sirupsen/logrus from v0.11.6 to v1.1.1
      • Updated golang.org/x/sys to certain master
      • Updated google.golang.org/grpc from v1.12.0 to v1.15.0
    6. Added the commercial license

  • restore: update and restore GCLifeTime once when parallel

    restore: update and restore GCLifeTime once when parallel

    What problem does this PR solve?

    DoChecksum doesn't prepare for parallel case

    What is changed and how it works?

    There may be multiple DoChecksum running, setting and resetting GC Life Time should not be done when there's unfinished DoChecksum tasks.

    Lightning have restoreTables runs simultaneously, so we use this logic

    func restoreTables() {
        // init_helper
        for some conurrency {
            go restoreTable()
        }
    }
    
    func restoreTable() {
        // call postProcess() -> ... -> DoChecksum()
    }
    
    func DoChecksum() {
        // using helper's pointers of lock, running jobs counter, original value
    }
    

    Using variable to count running checksum jobs. Before a remote chechsum call starting, lock, increase this counter, check if it just rised from zero then backup original value and call set GC Life Time logic, unlock. After a remote chechsum finishing, lock, decrease this counter, check if it just drop to zero then call resetting GC Life Time logic, unlock.

    The lock, counter, original value are pointed to same per location in one restoreTables.

    Check List

    Tests

    • Unit test
    • Integration test

    Side effects

    Related changes

    • Need to cherry-pick to the release branch
  • restore: check row value count to avoid unexpected encode result

    restore: check row value count to avoid unexpected encode result

    What problem does this PR solve?

    ~Check row field count before encodes, if row value count is bigger than table field count, directly return an error.~

    • check column count in tidb encoder and return an error if column count doesn't match table column count
    • Be compatible with special field _tidb_rowid in getColumnNames and tidb encoder

    What is changed and how it works?

    Check List

    Tests

    • Unit test
    • Integration test
    • Manual test (add detailed scripts or steps below)
    • No code

    Side effects

    Related changes

    Release Note

    • Fix the bug that tidb backend will panics if source file columns are more than target table columns
  • restore: optimize SQL processing speed

    restore: optimize SQL processing speed

    DNM: Based on #109 for simplicity of development.

    What problem does this PR solve?

    Optimizing SQL processing speed

    Result:

    • PR 109: TableConcurrency = 20, RegionConcurrency = 40, the metrics data has been lost due to cluster be cleanup.

      Data size: 340G
      Rate: ~45MB/s
      Total time: ~= 2h30m (import time: 40m)
      
    • PR 110: TableConcurrency = 20, RegionConcurrency = 20, IOConcurrency = 5, Test1 metrics snapshot, Test2 metrics snapshot

      CAN NOT REPREDUCE [IO Delay unstable]

      Test1: 
      Data size: 146G
      Rate: 90~160MB/s
      Total time: ~= 48m (import time: 27m)
      
      Test2:
      Data size: 146G
      Rate: 130~190MB/s
      Total time: ~= 46m (import time: 28m)
      
      Test3
      coming ...
      
    • PR 110: TableConcurrency = 40, RegionConcurrency = 40, IOConcurrency = 10 160G, Metrics

      	2018/12/30 01:16:32.871 restore.go:477: [info] restore all tables data takes 52m59.59960385s
      	2018/12/30 01:16:32.871 restore.go:366: [info] Everything imported, stopping periodic actions
      	2018/12/30 01:16:32.871 restore.go:208: [error] run cause error : [types:1292]invalid time format: '{2038 1 19 4 4 36 0}'
      	2018/12/30 01:16:32.871 restore.go:214: [info] the whole procedure takes 53m7.986292573s
      

    Early conclusion:

    Concurrent IO causes delays to lengthen, which lengthens SQL processing time

    What is changed and how it works?

    Limiting IO concurrency

    Check List

    Tests

    • Unit test

    Code changes

    Side effects

    Related changes

  • Support table routing rules (merging sharded tables)

    Support table routing rules (merging sharded tables)

    What problem does this PR solve?

    TOOL-142

    (Note: still won't handle UNIQUE/PRIMARY key conflict. This needs column-mapping)

    What is changed and how it works?

    Rename the tables while the loading the files. Since we don't care about the table name when parsing the data files, it becomes very simple to support merging: just associate all those data files to the target table.

    This PR supersedes #54. Note that #54 is very large because it attempts to do some refactoring at the same time.

    Check List

    Tests

    • Integration test

    Code changes

    Side effects

    Related changes

    • Need to update the documentation
    • Need to be included in the release note
  • backend: add local kv storage backend to get rid of importer

    backend: add local kv storage backend to get rid of importer

    What problem does this PR solve?

    Use local key-value storage as a new backend to get ride of the dependency of tikv-importer, thus make lightning easier to use. In our benchmark, the performance of import speed with local mode is as good as importer mode, so this change won't bring any performance loss. Thus it will be a much better choice in compare with tidb backend.

    What is changed and how it works?

    The logic of local mode is as follows:

    1. Write the data read for csv/mydumper to local kv-value storage pebble.
    2. Batch write sorted kv pairs and generate sst file at each tikv instance. https://github.com/tikv/tikv/pull/7459
    3. ingest the sst file to tikv cluster

    In the local mode, because sorted kv data are managed by the lightning process, so there are some change for checkpoint in local mode:

    • In the restore phase with checkpoint enabled, we save all the chunk checkpoints after the engine is close. This is because when lightning exited before engine closed, there maybe some data written to kv store but not flushed, thus these data may lost. Another approach is to do a flush after each chunk processed, but this is so much slow.
    • Before we update an engine checkpoint to CheckpointStatusClosed, we will do a flush to the related index-engine to make sure related index kvs are saved.
    • Skip the CheckpointStatusAllWritten stage for local mode, because at the point , the data/index kv are not flushed, so if lightning exits at this point, we can't promise local kv store contains all the key-values for this engine, thus we have to restore the engine from start.
    • Add a new meta file for each engine store with the engine db files. This meta containes some data used in the import phase. The meta file is generated when engine is closed.

    Changes for tidb-lightning-ctl:

    • the import-engine command is not supported in local mode. Because now import phase is done by lightning, so this command is not meanful anymore.

    NOTE:

    • We recommend to separate the sorted-kv-dir from the data disk if possible, because the data-disk si read heavy, and the local storage dir is both read and write heavy, use another disk for this temp-store will make at least 10% performance gain.

    TODO:

    • Maybe we should save chunk checkpoints more frequently. One possible approach is after writing specific bytes to pebble, we can arrange a flush for both data&index engine, thus we can save current chunk checkpoints fearlessly.

    Check List

    Tests

    • Unit test
    • Integration test
    • Manual test (add detailed scripts or steps below)
    • No code

    Side effects

    • Reusing checkpoint may take more time than importer backend, thus make #303 even worse.

    Related changes

    • Need to cherry-pick to the release branch
    • Need to update the documentation
    • Need to update the tidb-ansible repository
    • Need to be included in the release note
  • Rewrite data file parser; Insert _tidb_rowid when needed; Update checkpoint structure

    Rewrite data file parser; Insert _tidb_rowid when needed; Update checkpoint structure

    What problem does this PR solve?

    Completely fix TOOL-462, by recording the _tidb_rowid on non-PkIsHandle tables to ensure idempotence when importing the same chunk twice.

    What is changed and how it works?

    1. Assign a Row ID to every row of a table before import starts, so importing two chunks from the same table is no longer order-dependent.
    2. To properly assign a Row ID, we need to know exactly how many rows each chunk has. So we replaced splitFuzzyRegion back to an exact version.
    3. Since we need to read the whole file before importing, we want to make this step as fast as possible. Therefore, I replaced the MDDataReader by a ragel-based parser, which is about 8x faster on my machine.
    4. We also need to record the RowIDs into the checkpoints. The checkpoint tables are modified to accommodate this change. Additionally, the checksums are stored as properties of a chunk instead of the whole table.
    5. To ensure the only global property, the allocator, won't interfere with the data output in future updates, I've created a custom allocator which will panic on any unsupported operation.

    Check List

    Tests

    • [x] Unit test
    • [x] Integration test
    • [ ] Manual test (add detailed scripts or steps below)
    • [ ] No code

    Code changes

    • [ ] Has exported function/method change
    • [ ] Has exported variable/fields change
    • [ ] Has interface methods change
    • [x] Has persistent data change

    Side effects

    • [ ] Possible performance regression
    • [x] Increased code complexity
    • [x] Breaking backward compatibility (only if you update Lightning after it saved a checkpoint)

    Related changes

    • [x] Need to cherry-pick to the release branch (2.1)
    • [ ] Need to update the tidb-ansible repository
    • [ ] Need to update the documentation
    • [ ] Need to be included in the release note
  • Restore from S3 compatible API?

    Restore from S3 compatible API?

    A feature request for your roadmap:

    Can it be possible to restore directly from a mydumper backup stored in S3? In most cloud deployments this is where user backups will be stored (the S3 API is implemented by many other object stores).


    Value

    Value description

    Support restore to TiDB via S3.

    Value score

    • (TBD) / 5

    Workload estimation

    • (TBD)

    Time

    GanttStart: 2020-07-27 GanttDue: 2020-09-04 GanttProgress: 100%

  • restore: ensure the importer engine is closed before recycling the table worker

    restore: ensure the importer engine is closed before recycling the table worker

    The close-engine operation is extracted out of Flush() (which now only does ImportEngine). The engine count should now be strictly limited by table-concurrency.

  • Add  progress bar and the final result(pass or failed) in command output

    Add progress bar and the final result(pass or failed) in command output

    Feature Request

    Is your feature request related to a problem? Please describe:

    When using lightning command to import data, now user cannot get the progress status nor the final result via command out, Now lightning log and monitor can display those information, but the cli not. For users, the most direct way to see the import progress and cli result also is through cli output, not log file or monitor. Describe the feature you'd like:

    Add progress bar and the final result(pass or failed) in command output. Describe alternatives you've considered:

    User friendly. Teachability, Documentation, Adoption, Optimization:

  • use system_time_zone to encode kv if tidb set it

    use system_time_zone to encode kv if tidb set it

    What problem does this PR solve?

    Resolve #562

    What is changed and how it works?

    if tidb time_zone is "SYSTEM", try to use "system_time_zone" to set time_zone for lightning session.

    Check List

    Tests

    • Manual test (add detailed scripts or steps below)

    Related changes

    • Need to cherry-pick to the release branch
    • Need to update the documentation
    • Need to be included in the release note

    Release note

    • Fix the issue that lightning didn't use tidb's time zone to encode timestamp data.
  • local backend oom

    local backend oom

    Bug Report

    Please answer these questions before submitting your issue. Thanks!

    1. What did you do? If possible, provide a recipe for reproducing the error. I used lightning to restore csv files (tpcc 5000 warehouses).

    2. What did you expect to see? Restored successfully.

    3. What did you see instead? Lightning oom. In January, the memory usage of lightning is about 20~30 GB. But now it costs at least 60GB.

    4. Versions of the cluster

      • TiDB-Lightning version (run tidb-lightning -V):

        Release Version: v5.0.0-rc-21-g230eef2
        Git Commit Hash: 230eef2a6e16648a49a4c74910dca693781012c4
        Git Branch: master
        UTC Build Time: 2021-02-04 03:10:38
        Go Version: go version go1.15.6 linux/amd64
        
      • TiKV-Importer version (run tikv-importer -V):

        none
        
      • TiKV version (run tikv-server -V):

        TiKV
        Release Version:   5.0.0-rc.x
        Edition:           Community
        Git Commit Hash:   81c4de98a9a21e4dcf3cce6d7783793b1238044e
        Git Commit Branch: limit-write-batch-ingest
        UTC Build Time:    2021-02-04 11:43:07
        Rust Version:      rustc 1.51.0-nightly (1d0d76f8d 2021-01-24)
        Enable Features:   jemalloc mem-profiling portable sse protobuf-codec test-engines-rocksdb
        Profile:           release
        
      • TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

        Release Version: v4.0.0-beta.2-2067-g415d14b6a
        Edition: Community
        Git Commit Hash: 415d14b6ac65e3c73529d07b4331c2f4917b2701
        Git Branch: master
        UTC Build Time: 2021-01-27 15:27:10
        GoVersion: go1.13
        Race Enabled: false
        TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
        Check Table Before Drop: false
        
      • Other interesting information (system version, hardware config, etc):

    5. Operation logs

      • Please upload tidb-lightning.log for TiDB-Lightning if possible
      • Please upload tikv-importer.log from TiKV-Importer if possible
      • Other interesting logs
    6. Configuration of the cluster and the task

      • tidb-lightning.toml for TiDB-Lightning if possible
      • tikv-importer.toml for TiKV-Importer if possible
      • inventory.ini if deployed by Ansible
    7. Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus for TiDB-Lightning if possible image

  • tidb-lightning alters the values of timestamp columns

    tidb-lightning alters the values of timestamp columns

    Bug Report

    1. What did you do? If possible, provide a recipe for reproducing the error. Using the tidb-lightning tool to restore a full backup data.
    • In the full backup data, there is a table that has timestamp columns like this:

      CREATE TABLE `users` (
        `id` bigint(20) NOT NULL,
        `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
        `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
        ...
      )
      
    • Set all servers (TiDB, PD, TiKV, Lightning, ...)' timezone to UTC

    • In the script deploy/scripts/start_lightning.sh, set timezone to Asia/Tokyo

      #!/bin/bash
      set -e
      ulimit -n 1000000
      cd "/home/ec2-user/deploy" || exit 1
      mkdir -p status
      
      export RUST_BACKTRACE=1
      
      export TZ=Asia/Tokyo
      
      echo -n 'sync ... '
      stat=$(time sync)
      echo ok
      echo $stat
      
      nohup ./bin/tidb-lightning -config ./conf/tidb-lightning.toml &> log/tidb_lightning_stderr.log &
      
      echo $! > "status/tidb-lightning.pid"
      
    • Start the tidb-lightning tool

      $ cd deploy/
      $ scripts/start_lightning.sh
      
    1. What did you expect to see? The tidb-lightning tool should respect the original data (in the full backup), import the original data as it is.

    2. What did you see instead? The tidb-lightning tool has altered the values of timestamp columns. For examples,

      Original data (from the full backup):

      $ head -2 ./xxxxx.users.000000001.sql
      INSERT INTO `users` VALUES
      (123456789123456789,'2019-09-03 12:31:02','2019-09-03 12:35:18',...)
      

      Imported data:

      > select id, created_at, updated_at from users where id = 123456789123456789;
      +--------------------+---------------------+---------------------+
      | id                 | created_at          | updated_at          |
      +--------------------+---------------------+---------------------+
      | 123456789123456789 | 2019-09-03 03:31:02 | 2019-09-03 03:35:18 |
      +--------------------+---------------------+---------------------+
      

      So the tidb-lightning tool has altered the values of columns created_at and updated_at. The original values have been subtracted by 9 hours.

    3. Versions of the cluster

      • TiDB-Lightning version (run tidb-lightning -V):

        Release Version: v4.0.9
        Git Commit Hash: 56bc32daad19b9dff10104c55300292de959fde3
        Git Branch: heads/refs/tags/v4.0.9
        UTC Build Time: 2020-12-19 04:48:01
        Go Version: go version go1.13 linux/amd64
        
      • TiKV-Importer version (run tikv-importer -V)

        Didn't use
        
      • TiKV version (run tikv-server -V):

        TiKV
        Release Version:   4.0.10
        Edition:           Community
        Git Commit Hash:   2ea4e608509150f8110b16d6e8af39284ca6c93a
        Git Commit Branch: heads/refs/tags/v4.0.10
        UTC Build Time:    2021-01-15 03:16:35
        Rust Version:      rustc 1.42.0-nightly (0de96d37f 2019-12-19)
        Enable Features:   jemalloc mem-profiling portable sse protobuf-codec
        Profile:           dist_release
        
      • TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

        +---------------------+
        | version()           |
        +---------------------+
        | 5.7.25-TiDB-v4.0.10 |
        +---------------------+
        
      • Other interesting information (system version, hardware config, etc):

        > show variables like '%time_zone%';
        +------------------+--------+
        | Variable_name    | Value  |
        +------------------+--------+
        | system_time_zone | UTC    |
        | time_zone        | SYSTEM |
        +------------------+--------+
        
        $ cat /etc/os-release
        NAME="Amazon Linux"
        VERSION="2"
        ID="amzn"
        ID_LIKE="centos rhel fedora"
        VERSION_ID="2"
        PRETTY_NAME="Amazon Linux 2"
        ANSI_COLOR="0;33"
        CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
        HOME_URL="https://amazonlinux.com/"
        
    4. Operation logs

      • Please upload tidb-lightning.log for TiDB-Lightning if possible
      • Please upload tikv-importer.log from TiKV-Importer if possible
      • Other interesting logs
    5. Configuration of the cluster and the task

      • tidb-lightning.toml for TiDB-Lightning if possible
       # lightning Configuration
      
       [lightning]
       file = "/home/tidb/deploy/log/tidb_lightning.log"
       index-concurrency = 2
       io-concurrency = 5
       level = "info"
       max-backups = 14
       max-days = 28
       max-size = 128
       pprof-port = 8289
       table-concurrency = 6
      
       [checkpoint]
       enable = true
       schema = "tidb_lightning_checkpoint"
       driver = "file"
      
       [tikv-importer]
       backend = "local"
       sorted-kv-dir = "/home/tidb/deploy/sorted-kv-dir"
      
       [mydumper]
       data-source-dir = "/home/tidb/deploy/mydumper/scheduled-backup-20210120-044816"
       no-schema = false
       read-block-size = 65536
      
       [tidb]
       build-stats-concurrency = 20
       checksum-table-concurrency = 16
       distsql-scan-concurrency = 100
       host = "TIDB_HOST"
       index-serial-scan-concurrency = 20
       log-level = "error"
       password = "xxxxx"
       port = 4000
       status-port = 10080
       user = "root"
       pd-addr = "PD_HOST:2379"
      
       [post-restore]
       analyze = true
       checksum = true
      
       [cron]
       log-progress = "5m"
       switch-mode = "5m"
      
      • inventory.ini if deployed by Ansible
    6. Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus for TiDB-Lightning if possible

  • In the TPC-C test, the following error occurred when using local backend lighting

    In the TPC-C test, the following error occurred when using local backend lighting

    Question

    Before asking a question, make sure you have: csv data:110G In the TPC-C test, the following error occurred when using local backend lighting to convert to a CSV file, and then import tidb

    read metadata from /data/tikv-20160/import/4ff45145-3a21-4e78-81da-2bca45104be3_10513_5_350_default.sst "read metadata from /data/tikv-20160/import/4ff45145-3a21-4e78-81da-2bca45104be3_10513_5_350_default.sst: Os { code: 2, kind: NotFound, message: \"No such file or directory\" }")"]

  • panic in tidb backend in strict sql-mode

    panic in tidb backend in strict sql-mode

    Bug Report

    Please answer these questions before submitting your issue. Thanks!

    1. What did you do? If possible, provide a recipe for reproducing the error. Use lightning tidb backend to import data with config:
    [tidb]
    sql-mode = "STRICT_ALL_TABLES"
    

    panic backtrace:

    goroutine 578 [running]:
    github.com/pingcap/tidb/types.(*Datum).ConvertTo(0xc00a0fd220, 0xc0001dcdc0, 0xc0004a0be8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
      /go/pkg/mod/github.com/pingcap/[email protected]/types/datum.go:843 +0xcff
    github.com/pingcap/tidb/table.CastValue(0x2b48760, 0xc000c6e000, 0x5, 0x0, 0x25442e2, 0xb, 0xc00cb05371, 0x8, 0x8, 0x0, ...)
      /go/pkg/mod/github.com/pingcap/[email protected]/table/column.go:244 +0xf2
    github.com/pingcap/tidb-lightning/lightning/backend.(*tidbEncoder).appendSQL(0xc00810e040, 0xc000c22120, 0xc00a0fd660, 0xc008097770, 0x0, 0x0)
      /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/backend/tidb.go:181 +0x594
    github.com/pingcap/tidb-lightning/lightning/backend.(*tidbEncoder).Encode(0xc00810e040, 0xc019f0a3c0, 0xc00062a480, 0xa, 0x10, 0x1, 0xc00a0ec060, 0xa, 0xb, 0x1f37685, ...)
      /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/backend/tidb.go:251 +0x32d
    github.com/pingcap/tidb-lightning/lightning/restore.(*chunkRestore).encodeLoop(0xc000c46040, 0x2b02ec0, 0xc008097680, 0xc019f0a360, 0xc008063aa0, 0xc019f0a3c0, 0x2adba00, 0xc00810e040, 0xc00a0ec000, 0xc0192fc000, ...)
      /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:1902 +0x350
    github.com/pingcap/tidb-lightning/lightning/restore.(*chunkRestore).restore(0xc000c46040, 0x2b02ec0, 0xc008097680, 0xc008063aa0, 0x0, 0xc000170e00, 0xc000bb2180, 0xc0192fc000, 0x0, 0x0)
      /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:1976 +0x7a4
    github.com/pingcap/tidb-lightning/lightning/restore.(*TableRestore).restoreEngine.func1(0xc00c3f8fe0, 0xc0192fc000, 0x2b02ec0, 0xc008097680, 0xc008063aa0, 0xc000000000, 0xc000170e00, 0xc000bb2180, 0xc000c46020, 0xc014f65060, ...)
      /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:1086 +0x175
    created by github.com/pingcap/tidb-lightning/lightning/restore.(*TableRestore).restoreEngine
      /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb-lightning/lightning/restore/restore.go:1078 +0x64c
    panic: should never happen
    

    Root Cause: The tidb backend FetchRemoteTableModels implementation is not accurate, it only set Flag in The FieldType and ignore other fields. So when run tidb backend with strict sql-mode, the table.CastValue panic because the fieldtype.Tp is 0(undefined).

Golang library for managing configuration data from environment variables

envconfig import "github.com/kelseyhightower/envconfig" Documentation See godoc Usage Set some environment variables: export MYAPP_DEBUG=false export

Dec 26, 2022
Manage local application configuration files using templates and data from etcd or consul

confd confd is a lightweight configuration management tool focused on: keeping local configuration files up-to-date using data stored in etcd, consul,

Dec 27, 2022
YML2FSTAB - Convert from yml data to /etc/fstab configuration

YML2FSTAB - Convert from yml data to /etc/fstab configuration

Nov 1, 2021
Cfginterpolator is an interpolate library in golang allowing to include data from external sources in your configuration

cfginterpolator cfginterpolator is an interpolate library in golang allowing to include data from external sources in your configuration cfginterpolat

Dec 14, 2021
shops is a simple command-line tool written in Go that helps you simplify the way you manage configuration across a set of machines.

shops is a simple command-line tool written in Go that helps you simplify the way you manage configuration across a set of machines. shops is your configuration management tool of choice when Chef, Puppet, Ansible are all too complicated and all you really want to do is run a bunch of regular shell against a set of hosts.

Jul 5, 2021
Golang Configuration tool that support YAML, JSON, TOML, Shell Environment

Configor Golang Configuration tool that support YAML, JSON, TOML, Shell Environment (Supports Go 1.10+) Usage package main import ( "fmt" "github.c

Dec 29, 2022
An awesome command-line tool to manage Wireguard configurations.

wg-manage A command line tool to centrally manage Wireguard configuration files - all config options are stored in one YAML file that is then used to

Sep 9, 2022
Little Go tool to infer an uncrustify config file from an expected format

uncrustify-infer Little Go tool to infer an uncrustify config file from an expected format Install This tool relies on an uncrustify executable, you m

Oct 8, 2021
A simple tool that utilizes already existing libraries such as joho/godotenv to add .env-files to global path

Go dotenv A simple tool that utilizes already existing libraries such as joho/godotenv to add .env-files to global path. Created as a practical way to

Nov 15, 2021
Tmpl - A tool to apply variables from cli, env, JSON/TOML/YAML files to templates

tmpl allows to apply variables from JSON/TOML/YAML files, environment variables or CLI arguments to template files using Golang text/template and functions from the Sprig project.

Nov 14, 2022
formicidate is a small tool for Go application can update the value of environment variables in a .env file with code

formicidae Update .env files in Go with code. What is fomicidae? formicidate is a small tool for Go application. You can update the value of environme

Jan 23, 2022
Efficient moving window for high-speed data processing.

Moving Window Data Structure Copyright (c) 2012. Jake Brukhman. ([email protected]). All rights reserved. See the LICENSE file for BSD-style license. I

Sep 4, 2022
Data Connector is a Google Sheets Add-on that lets you import (and export) data to/from Google Sheets

Data Connector Data Connector is a Google Sheets Add-on that lets you import (and export) data to/from Google Sheets. Our roadmap: Connect to JSON/XML

Jul 30, 2022
Dumpling is a fast, easy-to-use tool written by Go for dumping data from the database(MySQL, TiDB...) to local/cloud(S3, GCP...) in multifarious formats(SQL, CSV...).

?? Dumpling Dumpling is a tool and a Go library for creating SQL dump from a MySQL-compatible database. It is intended to replace mysqldump and mydump

Nov 9, 2022
A tool for finding corrupted data rows in TiDB

tidb-bad-rows A tool for finding corrupted data rows in TiDB. It scans the target table and using a divide and conquer paradigm to locate all corrupte

Nov 17, 2021
A high-performance timeline tracing library for Golang, used by TiDB

Minitrace-Go A high-performance, ergonomic timeline tracing library for Golang. Basic Usage package main import ( "context" "fmt" "strcon

Nov 28, 2022
High-speed, flexible tree-based HTTP router for Go.

httptreemux High-speed, flexible, tree-based HTTP router for Go. This is inspired by Julien Schmidt's httprouter, in that it uses a patricia tree, but

Dec 28, 2022
Package mafsa implements Minimal Acyclic Finite State Automata in Go, essentially a high-speed, memory-efficient, Unicode-friendly set of strings.

MA-FSA for Go Package mafsa implements Minimal Acyclic Finite State Automata (MA-FSA) with Minimal Perfect Hashing (MPH). Basically, it's a set of str

Oct 27, 2022
CryptoPump is a cryptocurrency trading bot that focuses on high speed and flexibility
CryptoPump is a cryptocurrency trading bot that focuses on high speed and flexibility

CryptoPump is a cryptocurrency trading bot that focuses on high speed and flexibility. The algorithms utilize Go Language and WebSockets to react in real-time to market movements based on Bollinger statistical analysis and pre-defined profit margins.

Nov 24, 2022