PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage.

中文 | English

License

PaddleDTX

PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage. It solves the difficulties when massive private data needs to be securely stored and exchanged, also helps different parties break through isolated data islands to maximize the value of their data.

Overview of PaddleDTX

The computing layer of PaddleDTX is a network that composed of nodes of three kinds: Requester, Executor and DataOwner. The training samples and prediction dataset are stored in a decentralized storage network composed of DataOwner and Storage nodes. This decentralized storage network and the computing layer are supported by a underlying blockchain network.

Secure Multi-party Computation Network

The Requester is a party with prediction demand, and the Executor is a party that is authorized by the DataOwner to gain access permit to the sample data for possible model training and result predicting. Multiple Executor nodes form an SMPC (secure multi-party computation) network. The Requester nodes publish the task to the blockchain network, and Executor nodes execute the task after authorization. The Executor nodes obtain sample data through the DataOwner, and usually they are deployed together.

SMPC network is the framework that supports multiple distributed learning processes running in parallel. More vertical federated learning and horizontal federated learning algorithms will be supported in the future.

Decentralized Storage Network

A DataOwner node processes its private data, and encryption, segmentation and replication related algorithms are used in this procedure, and finally encrypted fragments are distributed to multiple Storage nodes. A Storage node proves that it honestly holds the data fragments by answering the challenges generated by the DataOwner. Through these mechanisms, storage resources can be safely maintained without violating any data privacy. Please refer to XuperDB for more about design principle and implementation.

Blockchain Network

Training tasks and prediction tasks will be broadcasted to the Executor nodes by a blockchain network. Then the Executor nodes involved will execute these tasks. The DataOwner node and the Storage node exchange information through the blockchain network when monitoring files and nodes health status, and also in the challenge-answer-verify process of replicas holding proof.

Currently, XuperChain is the only blockchain framework that PaddleDTX supported.

Image text

Vertical Federated Learning

The open source version of PaddleDTX supports two-party vertical federated learning(VFL) algorithms, including Linear Regression and Logistic Regression, more algorithms such as two-party Neural Network will be open sourced soon, along with multi-party VFL and multi-party HFL(horizontal federated learning) algorithms. Please refer to crypto/ml for more about background and implementation of these two algorithms.

Training and predicting steps of VFL are shown as follows:

Image text

Sample Preparation

A FL task needs to specify sample files that will be used in computation or prediction, and these files are stored in the decentralized storage system(XuperDB). Before executing a task, executor(often data owner) needs to fetch its own sample files from XuperDB.

Sample Alignment

Both VFL training and prediction tasks require a sample alignment process. That is, to find sample intersections by using all the participants' ID lists. Training and predicting are performed on intersected samples.

The project implemented PSI(Private Set Intersection) for sample alignment without leaking any participant's ID. Refer to crypto/psi for more details about PSI.

Training Process

Model training is an iterated process, which relies on collaborative computing of two parities' samples. Participants need to exchange intermediate parameters during many training epochs, in order to get proper local model for each party.

To ensure confidentiality of each participant's data, Paillier cryptosystem is used for parameters encryption and decryption. Paillier is an additive homomorphic algorithm, which enables us to do addition or scalar multiplication on ciphertext directly. Refer to crypto/paillier for more details about Paillier.

Prediction Process

Prediction task requires a model, so related training task needs to be done before prediction task starts. Models are separately stored in participants' local storage. Participants compute local prediction result using their own model, and then gather all partial prediction results to deduce final result.

For linear regression, destandardization process can be performed after gathering all partial results. This process is only able to be done by the party has labels. So all partial results will be sent to the party has labels, which will deduce final result and store it as a file in XuperDB for requester to use.

Installation

There are two ways of installing PaddleDTX:

Run PaddleDTX in docker

We highly recommend to run PaddleDTX in Docker. You could install all the components with docker images provided by us. Please refer to starting network. If you want to build docker images locally, please refer to building image of PaddleDTX and building image of XuperDB.

Install PaddleDTX from source code

To build PaddleDTX from source code, you need:

  • go 1.13 or greater
# In dai directory
make

# In xdb directory 
make

You could get installation package from ./output and install it manually.

Testing

We provide test scripts for you to test, understand and use PaddleDTX.

Related Work

[1] Konečný J, McMahan H B, Yu F X, et al. Federated learning: Strategies for improving communication efficiency[J]. arXiv preprint arXiv:1610.05492, 2016.

[2] Yang Q, Liu Y, Chen T, et al. Federated machine learning: Concept and applications[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2019, 10(2): 1-19.

[3] Goodfellow I, Bengio Y, Courville A. Deep learning[M]. MIT press, 2016.

[4] Goodfellow I, Bengio Y, Courville A. Machine learning basics[J]. Deep learning, 2016, 1(7): 98-164.

[5] Paillier P. Public-key cryptosystems based on composite degree residuosity classes[C]//International conference on the theory and applications of cryptographic techniques. Springer, Berlin, Heidelberg, 1999: 223-238.

[6] Lo H K. Insecurity of quantum secure computations[J]. Physical Review A, 1997, 56(2): 1154.

[7] Chen H, Laine K, Rindal P. Fast private set intersection from homomorphic encryption[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017: 1243-1255.

[8] Shamir A. How to share a secret[J]. Communications of the ACM, 1979, 22(11): 612-613.

[9] https://xuper.baidu.com/n/xuperdoc/general_introduction/brief.html

Comments
  • 训练提交后长时间处于Processing并最终在几十分钟之后Failed失败

    训练提交后长时间处于Processing并最终在几十分钟之后Failed失败

    采用单xuperchain,双执行节点,双数据持有节点,双数据存储节点构建的PaddleDTX环境。 采用AIStudio BMLcolab环境,在普通版和高端版(gpu版)下都是一样的情况。 根据手册https://paddledtx.readthedocs.io/zh_CN/latest/quickstart/client.html 执行任务, 不同命令之后任务状态也不同:

    发布任务
    TaskStatus: Confirming
    
    一个节点确认之后还是
    confirming
    
    两个节点确认后成为ready
    TaskStatus: Ready
    
    启动任务之后进入Processing
    TaskStatus: Processing
    
    

    但是Processing执行的时间非常长,在大约几十分钟到1个多小时后,Failed失败

    在本机docker环境下测试,训练是几分钟就可以完成的。

  • XuperDB源代码编译安装在XuperDB这里碰到了问题

    XuperDB源代码编译安装在XuperDB这里碰到了问题

    首先Docker快速案例是可以跑通的。

    但是源代码编译安装碰到了问题,在AIStudio的BML环境下: 按照手册源码编译安装,执行到部署 XuperDB 启动数据存储节点这块, 执行命令: ./xdb -c conf/config-storage.toml > storage.log后报错:

    ERRO[0000] app exit error="missing log config"

    按照手册里修改成如下配置文件

    # vim conf/config-dataowner.toml
    # 
    listenAddress = ":8123"
    publicAddress = "127.0.0.1:8123"
    
    # genkey创建的私钥, 对账户使用不熟悉的话建议使用默认账户
    privateKey = "5572e2fa0c259fe798e5580884359a4a6ac938cfff62d027b90f2bac3eceef79"
    
    [dataOwner.blockchain]
        [dataOwner.blockchain.xchain]
            # 助记词为用户安装合约过程中创建的区块链账户,取值./ukeys/mnemonic
            mnemonic = "充 雄 孔 坝 低 狠 争 短 摸 拜 晨 造"
            contractName = "paddlempc"
            contractAccount = "XC1234567890123456@xuper"
            chainAddress = "127.0.0.1:37101"
            chainName = "xuper"
    

    则执行后会报错:

    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0xe8 pc=0xa8bb81]
    
    goroutine 1 [running]:
    github.com/spf13/viper.(*Viper).AllKeys(0x0)
    	/home/aistudio/go/pkg/mod/github.com/spf13/[email protected]/viper.go:1824 +0xa1
    github.com/spf13/viper.(*Viper).AllSettings(0x0)
    	/home/aistudio/go/pkg/mod/github.com/spf13/[email protected]/viper.go:1904 +0x45
    github.com/spf13/viper.(*Viper).Unmarshal(0xc0005f1cd8, {0x1165a40, 0xc000a4a8e0}, {0x0, 0x0, 0x0})
    	/home/aistudio/go/pkg/mod/github.com/spf13/[email protected]/viper.go:908 +0x32
    github.com/PaddlePaddle/PaddleDTX/xdb/config.InitConfig({0x7fff43948c94, 0x12})
    	/home/aistudio/PaddleDTX/xdb/config/config.go:87 +0x1aa
    main.init.0()
    	/home/aistudio/PaddleDTX/xdb/main.go:63 +0xbe
    

    因为对go语言报错实在看不懂,所以不知道该怎么查找和纠错。

    我的项目已经公开,放在这里:https://aistudio.baidu.com/aistudio/projectdetail/3255074 go version go1.17.5 linux/amd64 GNU Make 4.1

  • 新版本docker script模式启动失败

    新版本docker script模式启动失败

    在启动执行节点的时候报错,在docker里面看到的报错信息为:

    
    executor2.node.com  | 2022/02/08 21:21:47 Config yamlFile get error #open ./conf/sdk.yaml: no such file or directory
    
    executor2.node.com  | 2022/02/08 21:21:47 GetConfig: &{10.144.94.18:8848 {false false 10 XBbhR82cB6PvaLJs3D4uB9f12bhmKkHeX TYyA3y8wdFZyzExtcbRNVd7ZZ2XXcfjdw} 100 xchain}
    
    executor2.node.com  | 2022/02/08 21:22:20 Config yamlFile get error #open ./conf/sdk.yaml: no such file or directory
    
    executor2.node.com  | 2022/02/08 21:22:20 GetConfig: &{10.144.94.18:8848 {false false 10 XBbhR82cB6PvaLJs3D4uB9f12bhmKkHeX TYyA3y8wdFZyzExtcbRNVd7ZZ2XXcfjdw} 100 xchain}
    
    executor1.node.com  | 2022/02/08 21:21:40 Config yamlFile get error #open ./conf/sdk.yaml: no such file or directory
    
    executor1.node.com  | 2022/02/08 21:21:40 GetConfig: &{10.144.94.18:8848 {false false 10 XBbhR82cB6PvaLJs3D4uB9f12bhmKkHeX TYyA3y8wdFZyzExtcbRNVd7ZZ2XXcfjdw} 100 xchain}
    
    executor1.node.com  | 2022/02/08 21:22:26 Config yamlFile get error #open ./conf/sdk.yaml: no such file or directory
    
    executor1.node.com  | 2022/02/08 21:22:26 GetConfig: &{10.144.94.18:8848 {false false 10 XBbhR82cB6PvaLJs3D4uB9f12bhmKkHeX TYyA3y8wdFZyzExtcbRNVd7ZZ2XXcfjdw} 100 xchain}
    
    executor2.node.com exited with code 255
    
    executor2.node.com  | 2022/02/08 21:21:47 Config yamlFile get error #open ./conf/sdk.yaml: no such file or directory
    
    executor2.node.com  | 2022/02/08 21:21:47 GetConfig: &{10.144.94.18:8848 {false false 10 XBbhR82cB6PvaLJs3D4uB9f12bhmKkHeX TYyA3y8wdFZyzExtcbRNVd7ZZ2XXcfjdw} 100 xchain}
    
    executor2.node.com  | 2022/02/08 21:22:20 Config yamlFile get error #open ./conf/sdk.yaml: no such file or directory
    
    executor2.node.com  | 2022/02/08 21:22:20 GetConfig: &{10.144.94.18:8848 {false false 10 XBbhR82cB6PvaLJs3D4uB9f12bhmKkHeX TYyA3y8wdFZyzExtcbRNVd7ZZ2XXcfjdw} 100 xchain}
    
    executor2.node.com  | 2022/02/08 21:25:16 Config yamlFile get error #open ./conf/sdk.yaml: no such file or directory
    
    executor2.node.com  | 2022/02/08 21:25:16 GetConfig: &{10.144.94.18:8848 {false false 10 XBbhR82cB6PvaLJs3D4uB9f12bhmKkHeX TYyA3y8wdFZyzExtcbRNVd7ZZ2XXcfjdw} 100 xchain}
    
    executor2.node.com  | 2022/02/08 21:25:33 Config yamlFile get error #open ./conf/sdk.yaml: no such file or directory
    
    executor2.node.com  | 2022/02/08 21:25:33 GetConfig: &{10.144.94.18:8848 {false false 10 XBbhR82cB6PvaLJs3D4uB9f12bhmKkHeX TYyA3y8wdFZyzExtcbRNVd7ZZ2XXcfjdw} 100 xchain}
    
    executor2.node.com exited with code 255
    

    docker里面手动启动执行节点也是报错的。

  • 启动数据存储节点报错

    启动数据存储节点报错

    xuper和PaddleDTX更新版本后,启动数据存储节点报错 。

    2022/01/29 11:11:41 Config yamlFile get error #open ./conf/sdk.yaml: no such file or directory
    2022/01/29 11:11:41 GetConfig: &{10.144.94.18:8848 {false false 10 XBbhR82cB6PvaLJs3D4uB9f12bhmKkHeX TYyA3y8wdFZyzExtcbRNVd7ZZ2XXcfjdw} 100 xchain}
    t=2022-01-29T11:11:41+0800 lvl=info msg="xchain rpc access request" module=xchain r_call=server.go:1003 r_pid=36569 r_logid=1643425901739830258_286_3387 r_ntce=false rpc_method=/pb.Xchain/PreExec
    t=2022-01-29T11:11:41+0800 lvl=info msg=MetaReservedContracts module=xchain reservedContracts=[]
    t=2022-01-29T11:11:41+0800 lvl=info msg="xchain rpc service done" module=xchain r_call=server.go:1016 r_pid=36569 r_logid=1643425901739830258_286_3387 r_ntce=true cost_time="total: 0.18ms" rpc_method=/pb.Xchain/PreExec resp_error="Vm not exist in vm manager"
    

    存储在log文件中的内容:

    time="2022-01-29T11:11:41+08:00" level=info msg="monitor initialize..." answer-interval=10m0s monitor=challenging request-interval=1h7m0s
    time="2022-01-29T11:11:41+08:00" level=info msg="monitor initialize..." fileclear-interval=24h0m0s fileretain-interval=168h0m0s heartbeat-interval=1m0s monitor=nodemaintainer
    time="2022-01-29T11:11:41+08:00" level=error msg="failed to read blockchain: {\"code\":\"XDAT0001\",\"message\":\"failed to QueryNativeContract: rpc error: code = Unknown desc = Vm not exist in vm manager\"}"
    time="2022-01-29T11:11:41+08:00" level=error msg="app exit" error="failed to read blockchain: {\"code\":\"XDAT0001\",\"message\":\"failed to QueryNativeContract: rpc error: code = Unknown desc = Vm not exist in vm manager\"}"
    
    

    使用docker方式,启动第二个数据存储节点的时候报错(命令bash network_up.sh start):

    ==========> Install paddlempc contract successfully ! 
    ==========> Decentralized storage network start ...
    Creating storage1.node.com   ... done
    Creating dataowner2.node.com ... done
    Creating dataowner1.node.com ... done
    Creating storage2.node.com   ... 
    
    ERROR: for storage2.node.com  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.
    
    ERROR: for storage2  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.
    ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
    If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).
    ==========> Decentralized storage network start error ...
    

    后将docker内存从2G升级到4G,报错信息改为:

    ==========> Install paddlempc contract successfully ! 
    ==========> Decentralized storage network start ...
    Creating storage1.node.com   ... done
    Creating dataowner2.node.com ... done
    Creating dataowner1.node.com ... done
    Creating storage2.node.com   ... done
    Creating storage3.node.com   ... done
    ==========> Decentralized storage network start error ...
    
  • 编译区域链合约报错

    编译区域链合约报错

    编译区域链合约命令: !cd ~/PaddleDTX/dai && go build -o paddlempc ./blockchain/xchain/contract

    报错:

    ../../go/pkg/mod/github.com/!paddle!paddle/!paddle!d!t!x/[email protected]/blockchain/xchain/contract/core/challenging.go:25:2: missing go.sum entry for module providing package github.com/xuperchain/xuperchain/core/contractsdk/go/code (imported by github.com/PaddlePaddle/PaddleDTX/dai/blockchain/xchain/contract); to add:
    	go get github.com/PaddlePaddle/PaddleDTX/dai/blockchain/xchain/contract
    blockchain/xchain/contract/main.go:17:2: missing go.sum entry for module providing package github.com/xuperchain/xuperchain/core/contractsdk/go/driver (imported by github.com/PaddlePaddle/PaddleDTX/dai/blockchain/xchain/contract); to add:
    	go get github.com/PaddlePaddle/PaddleDTX/dai/blockchain/xchain/contract
    

    用了go 16和17两个版本。 不知道为什么go会抽风...

  • 手册创建命名空间命令参数出错

    手册创建命名空间命令参数出错

    创建命名空间报错

    手册位置:https://paddledtx.readthedocs.io/zh_CN/latest/quickstart/client.html 操作XuperDB ,创建命名空间这步:

    在AIStudio BML notebook环境中,使用的命令为: !cd ~/PaddleDTX/xdb/output && ./xdb-cli files addns --host http://127.0.0.1:8122 -k eae7344064e1d5b53af6da1a23407b1e7e265d15eaf0442c476e3caac3003406 -n paddlempc -r 2 报错信息为:

    err:{"code":"XDAT0004","message":"from xdb api: request url not found"}
    

    调试的时候百思不得其解,甚至一度怀疑端口监听到ipv6地址了。但是查看前面开服务的时候,已经测试连接,输出状态信息了。

    后来仔细看,发现是连通数据持有节点,而8122端口是数据存储节点,原来文档上面写错端口号了。

    修改命令端口号到8123,搞定!

    !cd ~/PaddleDTX/xdb/output && ./xdb-cli files addns --host http://127.0.0.1:8123 -k eae7344064e1d5b53af6da1a23407b1e7e265d15eaf0442c476e3caac3003406 -n paddlempc -r 2

  • 在AIStudio中执行训练任务报错

    在AIStudio中执行训练任务报错

    在AIStudio中,已经启动xuperchina,两个数据管理节点,3个数据存储节点,两个执行节点,已经上传训练文件,在 提交训练任务的时候报错:

    命令:
    !cd ~/PaddleDTX/localtestdatatmp/executor/node1 && requester-cli task publish -a "linear-vl" -l "MEDV" \
    -k 716ae5ad5a374e54cc9a6770faa03213b03fc0fdeccd9319517383e2837cbacd -t "train" -n "房价预测-训练任务v1" -d "用飞桨,划时代" -p "id,id" \
    --conf ./conf/config.toml \
    -f "6fadfc82-db15-46b7-ad70-3fef902f7b49,2372c2ec-c983-4a33-a0d6-9fb8708b86b8" \
    -e  '127.0.0.1:8011,127.0.0.1:8012'
    
    报错信息为:
    Publish task failed: failed to get executor node by node name: failed to QueryNativeContract: {"code":"XDAT0004","message":"node not found: rpc error: code = Unknown desc = Key not found"}
    
    

    看着好像获取节点名字就出问题了。

On-line Machine Learning in Go (and so much more)

goml Golang Machine Learning, On The Wire goml is a machine learning library written entirely in Golang which lets the average developer include machi

Jan 5, 2023
Gorgonia is a library that helps facilitate machine learning in Go.
Gorgonia is a library that helps facilitate machine learning in Go.

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily

Dec 30, 2022
Machine Learning libraries for Go Lang - Linear regression, Logistic regression, etc.

package ml - Machine Learning Libraries ###import "github.com/alonsovidales/go_ml" Package ml provides some implementations of usefull machine learnin

Nov 10, 2022
Gorgonia is a library that helps facilitate machine learning in Go.
Gorgonia is a library that helps facilitate machine learning in Go.

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily

Dec 27, 2022
Prophecis is a one-stop machine learning platform developed by WeBank
Prophecis is a one-stop machine learning platform developed by WeBank

Prophecis is a one-stop machine learning platform developed by WeBank. It integrates multiple open-source machine learning frameworks, has the multi tenant management capability of machine learning compute cluster, and provides full stack container deployment and management services for production environment.

Dec 28, 2022
Go Machine Learning Benchmarks
Go Machine Learning Benchmarks

Benchmarks of machine learning inference for Go

Dec 30, 2022
Deploy, manage, and scale machine learning models in production
Deploy, manage, and scale machine learning models in production

Deploy, manage, and scale machine learning models in production. Cortex is a cloud native model serving platform for machine learning engineering teams.

Dec 30, 2022
A High-level Machine Learning Library for Go
A High-level Machine Learning Library for Go

Overview Goro is a high-level machine learning library for Go built on Gorgonia. It aims to have the same feel as Keras. Usage import ( . "github.

Nov 20, 2022
Standard machine learning models

Cog: Standard machine learning models Define your models in a standard format, store them in a central place, run them anywhere. Standard interface fo

Jan 9, 2023
Katib is a Kubernetes-native project for automated machine learning (AutoML).
Katib is a Kubernetes-native project for automated machine learning (AutoML).

Katib is a Kubernetes-native project for automated machine learning (AutoML). Katib supports Hyperparameter Tuning, Early Stopping and Neural Architec

Jan 2, 2023
Self-contained Machine Learning and Natural Language Processing library in Go
Self-contained Machine Learning and Natural Language Processing library in Go

Self-contained Machine Learning and Natural Language Processing library in Go

Jan 8, 2023
A high performance go implementation of Wappalyzer Technology Detection Library

wappalyzergo A high performance port of the Wappalyzer Technology Detection Library to Go. Inspired by https://github.com/rverton/webanalyze. Features

Jan 8, 2023
Versioned model registry suitable for temporary in-training storage and permanent storage

Cogment Model Registry Cogment is an innovative open source AI platform designed to leverage the advent of AI to benefit humankind through human-AI co

May 26, 2022
a simple & tiny scrapy clustering solution, considered a drop-in replacement for scrapyd

scrapyr a very simple scrapy orchestrator engine that could be distributed among multiple machines to build a scrapy cluster, under-the-hood it uses r

Nov 24, 2021
Reinforcement Learning in Go
Reinforcement Learning in Go

Overview Gold is a reinforcement learning library for Go. It provides a set of agents that can be used to solve challenges in various environments. Th

Dec 11, 2022
Spice.ai is an open source, portable runtime for training and using deep learning on time series data.
Spice.ai is an open source, portable runtime for training and using deep learning on time series data.

Spice.ai Spice.ai is an open source, portable runtime for training and using deep learning on time series data. ⚠️ DEVELOPER PREVIEW ONLY Spice.ai is

Dec 15, 2022
FlyML perfomant real time mashine learning libraryes in Go

FlyML perfomant real time mashine learning libraryes in Go simple & perfomant logistic regression (~100 LoC) Status: WIP! Validated on mushrooms datas

May 30, 2022
Go (Golang) encrypted deep learning library; Fully homomorphic encryption over neural network graphs

DC DarkLantern A lantern is a portable case that protects light, A dark lantern is one who's light can be hidden at will. DC DarkLantern is a golang i

Oct 31, 2022
A tool for building identical machine images for multiple platforms from a single source configuration
A tool for building identical machine images for multiple platforms from a single source configuration

Packer Packer is a tool for building identical machine images for multiple platforms from a single source configuration. Packer is lightweight, runs o

Oct 3, 2021