PgSQL compatible on distributed database TiDB

Last update: Dec 26, 2022

Comments: 16

TiDB for PostgreSQL

Introduction

TiDB for PostgreSQL is an open source launched by Digital China Cloud Base to promote and integrate into the open source community of TiDB. In addition to its high availability, horizontal scalability, and cloud-native features, TiDB for PostgreSQL is compatible with the PostgreSQL protocol. On the one hand, you can use the PostgreSQL client to connect to TiDB for PostgreSQL, which we have basically completed. On the other hand, you can also use the unique grammatical features of PostgreSQL in your business system, which is in the development process.

More details can be found in the Design Document

Background

As the general trend of the massive explosion of information in the current data era becomes more and more obvious, the database is more and more likely to become the bottleneck of the upper-level business system. For this problem, TiDB solves most of migration problems of business system based on MySQL database through its highly compatibility of MySQL protocol. In view of its distributed transactions with strong consistency, flexible scalability, and excellent disaster recovery backup architecture, users prefer to use mature solutions of TiDB rather than scale MySQL by sharding. However, there is no mature solution for the upper-level business system based on PostgreSQL to migrate to TiDB. If you want to migrate a PostgreSQL-based business system to TiDB, you have to modify lots of code of business system. To solve this，we try to refactor the underlying source code of TiDB to make it compatible with the PostgreSQL protocol, making it possible to migrate PostgreSQL-based business systems to TiDB without modifying much code.

Development progress

The largest amount of work for the development of TiDB for PostgreSQL is compatibility with PostgreSQL, which has two things to do. One is to implement the PostgreSQL connection protocol, which we have basically achieved, So PostgreSQL client can connect to TiDB for PostgreSQL. The other is the unique PostgreSQL syntax. TiDB for PostgreSQL can support general sql syntax, but there are a certain amount of work to do to make it compatible with PostgreSQL’s unique syntax. This is one of the reasons why TiDB for PostgreSQL is open sourced. We hope that through the open source, engineers interested in our TiDB for PostgreSQL can work together to make our project bigger and better.

Quick start

First, make sure you have a Go environment, because TiDB for PostgreSQL is based on the Go language.

TiDB for PostgreSQL can be started on a single node without pd and tikv.

If there is no pd and tikv, it will create mock pd and mock tikv to maintain the stable operation of the system.

The following is an example of locally compiling and running TiDB for PostgreSQL on localhost.

mkdir -p  $GOPATH/src/github.com/digitalchina

cd  $GOPATH/src/github.com/digitalchina

git clone https://github.com/DigitalChinaOpenSource/TiDB-for-PostgreSQL.git

cd TiDB-for-PostgreSQL/tidb-server

go run main.go

# If you get an error: export ordinal to larger
# Please run the following cmd:
go run -buildmode=exe  main.go

After starting the main program of TiDB for PostgreSQL , it will run on port 4000 of the host.

How to use PostgreSQL client to connect to TiDB for PostgreSQL? Here, we take the command line tool psql that comes with PostgreSQL as an example.

Server [localhost]:
Database [postgres]: test
Port [5433]: 4000
Username [postgres]: root
psql (13.1, server 8.3.11)
Type "help" to get help information.

test=# show tables;
 Tables_in_test
----------------
 t1
 t2
(2 行记录)

Docker

# Log in to your dockerhub account
docker login

# Pull the image and start the container
docker pull dcleeray/tidb-for-pg
docker run -it --name tidbforpg -p 4000:4000 -d  dcleeray/tidb-for-pg:latest

Cluster deployment

TiDB for PostgreSQL also supports cluster deployment.

Our current modification work does not involve the communication module of each component in the cluster. Therefore, the connection of TiDB for PostgreSQL to pd and tikv will not be affected in any way.

We recommend using binary file to deploy TiDB for PostgreSQL cluster.

First, download the official binary package file and unzip it.

wget http://download.pingcap.org/tidb-v4.0.11-linux-amd64.tar.gz
wget http://download.pingcap.org/tidb-v4.0.11-linux-amd64.sha256

sha256sum -c tidb-v4.0.11-linux-amd64.sha256

tar -xzf tidb-v4.0.11-linux-amd64.tar.gz
cd tidb-v4.0.11-linux-amd64/bin

Second, deploy each node in the cluster in order. According to the cluster architecture of TiDB for PostgreSQL, pd nodes are deployed first, then the tikv node, and finally the TiDB for PostgreSQL node. The number of nodes in the cluster is not specified, but there is at least one of each type.

Deploy a pd node

./pd-server --name=pd1 --data-dir=pd1 --client-urls="http://pdhost:2379" --peer-urls="http://host:2380" -L "info" --log-file=pd.log

Deploy three tikv nodes

./tikv-server --pd="pdhost:2379" --addr="kvhost1:20160"  --data-dir=tikv  --log-file=tikv1.log
./tikv-server --pd="pdhost:2379" --addr="kvhost2:20160"  --data-dir=tikv  --log-file=tikv2.log
./tikv-server --pd="pdhost:2379" --addr="kvhost3:20160"  --data-dir=tikv  --log-file=tikv3.log

Deploy a TiDB for PostgreSQL node. You need to compile the TiDB for PostgreSQL project into a binary file named tidb-server and upload it to this server for replace the original tidb-server

./tidb-server --store=tikv  --path="pdhost:2379" --log-file=tidb.log

With the above done, a TiDB for PostgreSQL cluster is successfully deployed, and you can connect to TiDB for PostgreSQL cluster in the same way as you did with single TiDB for PostgreSQL demonstrated earlier.

Contribute code to TiDB for postgresql

We greatly welcome and appreciate developers who are interested in TiDB for PostgreSQL to make contributions. At present, we provide several directions to get started.

Learning Guide

Contribution Guide

Schema structure

The database structure of PostgreSQL is very different from that of MySQL, which is also a big problem we encountered in our development. It involves the realization of database system tables, system views, and system functions. It’s a lot of work and very hard. Developers are required to be very familiar with the database table structure of both PostgreSQL and MySQL.

Postgresql grammatical features

Compared with MySQL, PostgreSQL has many new syntax features. For example, Returning can return specified column or all columns of the modified row. Implementing a syntax involves modifying codes of the Parser module, TiDB for PostgreSQL's internal plan structure, plan optimization, plan execution, and data write-back, and many other parts. It’s really very hard. We recommend that developers start with clauses such as RETURNING and modify them on the basis of the original code. After being familiar with the logic of TiDB for PostgreSQL planning and execution, you can try to implement a brand new statement.

Version notice

To keep our code improvement running stable, we currently choose a fixed version for development. The following is our selected version:

TiDB v4.0.11

PostgreSQL 13

License

Apache-2.0 License

Owner

DigitalChinaOpenSource

Digital China Group's open source projects, we welcome you to show your talents and grow together here.

https://github.com/DigitalChinaOpenSource/TiDB-for-PostgreSQL

Comments

Executor: reworked SetInsertParamType
Signed-off-by: David [email protected]

What's changed

Reworked SetInsertParamType function in the executor module

Effects

Now more unit tests can be passed, in particular, the following tests should no longer fail or panic:

testSuite.TestBatchInsertWithOnDuplicate

testSuite8.TestShardRowIDBits

seqTestSuite.TestPrepared

seqTestSuite.TestPreparedIssue8644

testSessionSuite.TestAutoIncrementWithRetry

testSessionSuite.TestLastInsertID

testSessionSuite.TestPrepareZero

testSessionSuite.TestPrimaryKeyAutoIncrement
core: add the implemet of setparamtype at show, showddl, datasource, limit plan ; executor: add new case 'LogicalLimit' at Next()
What problem does this PR solve?

Issue Number: close #28

Problem Summary

some interface have not be implemented，so some sql statement can't get parameter type。

What is changed and how it works?

Proposal: xxx

What's Changed:

New implement method 'setparamtype' at show ,showddl, datasource, limit plan

How it Works:

set the parameter type from the plan tree

Related changes

Need to cherry-pick to the release branch

Check List

Tests

Unit test

Integration test

Manual test (add detailed scripts or steps below)

No code

Side effects

Performance regression

Consumes more CPU

Consumes more MEM

Breaking backward compatibility

Release note
makefile: upload unit tests coverage to codecov

What problem does this PR solve?

Problem Summary: upload unit tests coverage to codecov

What is changed and how it works?

Proposal: codecov unit test coverage

What's Changed: add scripts in makefile

How it Works: make upload-coverage
Support for Postgres transaction syntax in Parser

Development Task

Description

The SQL standard syntax for starting a transaction is START TRANSACTION, which both MySQL and PostgreSQL supports. They also support syntax like START TRANSACTION + [modifier], where modifier can be READ WRITE, READ ONLY, etc. For the ease of use, they also support use BEGIN instead of START TRANSACTION However, they treat BEGIN + [modifier] differently. (like BEGIN READ WRITE)

In Postgres this is completely fine, BEGIN/BEGIN WORK/BEGIN TRANSACTION are all treated as aliases for START TRANSACTION

MySQL, on the other hand, does not allow modifier after BEGIN, a limitation which TiDB inherits.

In TiDB, the standard way to start new transaction via go code is

txn, err := db.Begin()

which calls underlying go-mysql-driver via go's database module, and return a transaction handle (the txn above). After we switch the driver from lib/pq, because pq starts transaction by BEGIN READ WRITE when no modifier provided, it will cause parser to complain.

Previously in unit tests fixing, we have been solving the problem by skipping Begin()method and execute SQL statement START TRANSACTION; and execute COMMIT; at the end of query block. But since we are not using Begin(), we can't get txn handle. It will cause problem when multiple transactions are nested or intersected.

The ideal solution is for parser to support begin + modifier syntax.

References

PostgreSQL: Documentation: 9.1: BEGIN MySQL :: MySQL 8.0 Reference Manual :: 13.3.1 START TRANSACTION, COMMIT, and ROLLBACK Statements MySQL Difference between BEGIN and START TRANSACTION - Database Administrators Stack Exchange
Support return result set after deletion
What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

What is changed and how it works?

Proposal: xxx

What's Changed:

How it Works:

Related changes

Need to cherry-pick to the release branch

Check List

Tests

Unit test

Integration test

Manual test (add detailed scripts or steps below)

No code

Side effects

Performance regression

Consumes more CPU

Consumes more MEM

Breaking backward compatibility

Release note
server: fixed a bug that params description will be sent when no params in handleStmtDescription

What problem does this PR solve?

Fixed a bug that params description will be sent when no params in handleStmtDescription. Translated some comments. Handled flush message.
variable,session: add search_path system variable.

What problem does this PR solve?

Add a PostgreSQL system variable named "search_path".

What is changed and how it works?

What's Changed:

Add a system variable,named 'search_path',default value is 'public'.

How it Works:

Now，we can run the related SQL statements properly，like ' set search_path to public'
Executor: Added support for setting isolation level in begin

What problem does this PR solve?

Issue Number: close #53 Added support for setting isolation level in begin statement

What is changed and how it works?

Changed the executor for begin statement

Test

Manual test
executor: uncommented some previously skipped test
Signed-off-by: AmoebaProtozoa [email protected]

What's Changed:

Uncommented some previously skipped test.

They work fine as long as fail-point is enabled

Check List

Tests

make dev-tmp passed
Is the TiDB going to support ISOLATION operation?

Current I run odoo , an open source ERP with postgresql, and it report that

psycopg2.errors.SyntaxError: syntax error at or near "ISOLATION" LINE 1: BEGIN ISOLATION LEVEL REPEATABLE READ ^

So I'd like to know if it is going to support this key word ?
Explaintest, Server: Changed Explaintest's driver

What's changed

Changed explaintest's driver to lib/pq

Changed command tag of Begin statement

Improved mysql server status translation

Test

make dev-tmp Passed
TiDB4PG panic: runtime error: invalid memory address or nil pointer dereference

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. What did you do?

send the sourcecode to linux. cd to tidb-server directory run: go run main.go &

2. What did you expect to see?

It can keep alive long

3. What did you see instead?

It can start up ,but panic some hours ago.

the error: [root@loader bubble]# panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0xc0 pc=0x2e01153]

goroutine 327269 [running]: github.com/pingcap/tidb/server.(*clientConn).writeError(0xc000113700, 0x3b29c80, 0xc001baedb0, 0x3acad40, 0xc001584050, 0x1, 0x0) /opt/TiDB-for-PostgreSQL/server/conn.go:1138 +0x1b3 github.com/pingcap/tidb/server.(*Server).onConn(0xc0013b46e0, 0xc000113700) /opt/TiDB-for-PostgreSQL/server/server.go:439 +0x147 created by github.com/pingcap/tidb/server.(*Server).Run /opt/TiDB-for-PostgreSQL/server/server.go:370 +0x7c9 exit status 2

[1]- Exit 1 go run main.go (wd: /opt/TiDB-for-PostgreSQL/tidb-server)
Parameter processing in bind message

Development Task

In the extend query, backend server will get parameter values through the bind messages. And tidb-server will use the datum structure to store the values of these parameters.

The datum is only used to store value of mysql data type and each type will have a different datum to store in TiDB.

We need to parse the bind message and store the parameter values in different datum according to different types. That is to improve the function parseBindArgs.

Related Files

server/conn_stmt.go
TiDB dashboard can not login in

Bug Report

I can not open tidb dashboard

1. What did you do?

When i deploy tidb for postgresql，i tried to open tidb dashboard on pdid:2379/dashboard。 I can see login web page。but I can not login in it。

2. What did you expect to see?

I expect to login in to tidb dashboard

3. What did you see instead?

It told that u failed login
Update server information from TiDB to TiDB for PostgreSQL

description

Update some console info like "Welcome to TiDB"。 When I start TiDB for PostgreSQL source code on my local PC。I see “Welcome to TiDB” on the console output

task

Update console inforamtion，It should be like “Welcome to TiDB for PostgreSQL” and so on

expected

It will show TiDB for PostgreSQL server information when started locally or connected by cmd or alomost anywhere
To support PG compliant SQL DML statement
Development Task

Description

Since TiDB is Mysql compliant database, we need to adjust DCParser yacc paser files to remove mysql keywords and add pg keywords, and then implement the keywords logic functions in TiDB4PG project.

Task List

DCParser:

Please check the issue: https://github.com/DigitalChinaOpenSource/DCParser/issues/21

TiDB4PG

Keywords Implement

[ ] Returning Keyword

Related tags

Dumpling is a fast, easy-to-use tool written by Go for dumping data from the database(MySQL, TiDB...) to local/cloud(S3, GCP...) in multifarious formats(SQL, CSV...).

?? Dumpling Dumpling is a tool and a Go library for creating SQL dump from a MySQL-compatible database. It is intended to replace mysqldump and mydump

Nov 9, 2022

simply SQL Parser for Go ( powered by vitess and TiDB )

vitess-sqlparser Simply SQL and DDL parser for Go (powered by vitess and TiDB ) this library inspired by https://github.com/xwb1989/sqlparser (origina

Jan 4, 2023

A tool for finding corrupted data rows in TiDB

tidb-bad-rows A tool for finding corrupted data rows in TiDB. It scans the target table and using a divide and conquer paradigm to locate all corrupte

Nov 17, 2021

A MySQL-compatible relational database with a storage agnostic query engine. Implemented in pure Go.

go-mysql-server go-mysql-server is a SQL engine which parses standard SQL (based on MySQL syntax) and executes queries on data sources of your choice.

Jan 2, 2023

GoRose(go orm), a mini database ORM for golang, which inspired by the famous php framwork laravle's eloquent. It will be friendly for php developer and python or ruby developer. Currently provides six major database drivers: mysql,sqlite3,postgres,oracle,mssql, Clickhouse.

GoRose ORM _______ ______ .______ ______ _______. _______ / _____| / __ \ | _ \ / __ \ / || ____| |

Dec 24, 2022

[mirror] the database client and tools for the Go vulnerability database

The Go Vulnerability Database golang.org/x/vulndb This repository is a prototype of the Go Vulnerability Database. Read the Draft Design. Neither the

Dec 29, 2022

Database - Example project of database realization using drivers and models

database Golang based database realization Description Example project of databa

Feb 10, 2022

An experimental toolkit for injecting alternate authentication strategies into a PostgreSQL-compatible wire format.

PG Auth Proxy This is an experimental toolkit for injecting alternate authentication strategies into a PostgreSQL-compatible wire format. This is a pr

Jan 20, 2022

go mysql driver， support distributed transaction

Go-MySQL-Driver A MySQL-Driver for Go's database/sql package Features Requirements Installation Usage DSN (Data Source Name) Password Protocol Address

Jul 23, 2022

a lightweight distributed transaction management service, support xa tcc saga

Dec 29, 2022

Distributed tracing using OpenTelemetry and ClickHouse

Distributed tracing backend using OpenTelemetry and ClickHouse Uptrace is a dist

Jan 2, 2023

🏋️ dbbench is a simple database benchmarking tool which supports several databases and own scripts

dbbench Table of Contents Description Example Installation Supported Databases Usage Custom Scripts Troubeshooting Development Acknowledgements Descri

Dec 30, 2022

Database wrapper that manage read write connections

rwdb Database wrapper that manage read write connections Install go get github.com/andizzle/rwdb Create connections package main import "github.com/

Dec 10, 2022

Vitess is a database clustering system for horizontal scaling of MySQL.

Vitess Vitess is a database clustering system for horizontal scaling of MySQL through generalized sharding. By encapsulating shard-routing logic, Vite

Jan 3, 2023

sqlx is a library which provides a set of extensions on go's standard database/sql library

sqlx is a library which provides a set of extensions on go's standard database/sql library. The sqlx versions of sql.DB, sql.TX, sql.Stmt, et al. all leave the underlying interfaces untouched, so that their interfaces are a superset on the standard ones. This makes it relatively painless to integrate existing codebases using database/sql with sqlx.

Jan 7, 2023

PgSQL compatible on distributed database TiDB

TiDB for PostgreSQL

Introduction

Background

Development progress

Quick start

Docker

Cluster deployment

Contribute code to TiDB for postgresql

Schema structure

Postgresql grammatical features

Version notice

License

Owner

DigitalChinaOpenSource

Comments

Executor: reworked SetInsertParamType

What's changed

Effects

core: add the implemet of setparamtype at show, showddl, datasource, limit plan ; executor: add new case 'LogicalLimit' at Next()

What problem does this PR solve?

What is changed and how it works?

Related changes

Check List

Release note

makefile: upload unit tests coverage to codecov

What problem does this PR solve?

What is changed and how it works?

Support for Postgres transaction syntax in Parser

Development Task

Description

References

Support return result set after deletion

What problem does this PR solve?

What is changed and how it works?

Related changes

Check List

Release note

server: fixed a bug that params description will be sent when no params in handleStmtDescription

What problem does this PR solve?

variable,session: add search_path system variable.

What problem does this PR solve?

What is changed and how it works?

Executor: Added support for setting isolation level in begin

What problem does this PR solve?

What is changed and how it works?

Test

executor: uncommented some previously skipped test

What's Changed:

Check List

Is the TiDB going to support ISOLATION operation?

Explaintest, Server: Changed Explaintest's driver

What's changed

Test

TiDB4PG panic: runtime error: invalid memory address or nil pointer dereference

Bug Report

1. What did you do?

2. What did you expect to see?

3. What did you see instead?

Parameter processing in bind message

Development Task

Related Files

TiDB dashboard can not login in

Bug Report

1. What did you do?

2. What did you expect to see?

3. What did you see instead?

Update server information from TiDB to TiDB for PostgreSQL

description

task

expected

To support PG compliant SQL DML statement

Development Task

Description

Task List

DCParser:

TiDB4PG

Keywords Implement

Related tags

Dumpling is a fast, easy-to-use tool written by Go for dumping data from the database(MySQL, TiDB...) to local/cloud(S3, GCP...) in multifarious formats(SQL, CSV...).