A go implementation of Verkle trees

Last update: Jan 8, 2023

Comments: 17

go-verkle

A very experimental implementation of Verkle trees. When production-ready, the code is to be split between go-kzg and go-ethereum.

Notes

Nodes have 1024 children. More size should be supported, but it hasn't been tested. #7946dbeb
Generated proofs are currently incorrect. #794f78fb
Proofs are given in pi and rho form, not sigma form #79570617

Running the tests

$ go test .

The test called TestProofVerifyTwoLeaves is currently broken, all other tests should pass.

Owner

Guillaume Ballet

https://github.com/gballet/go-verkle

Comments

Figure out why Kilic is so slow

A mainnet tree conversion with kilic is ~2x as slow as with herumi. I doubt it's entirely due to the library being slower. This is a tracking issue to figure out if the problem is indeed on the kilic side or if it's based on our usage of it.

bench: group to field

BenchmarkGroupToField/single-16                   793231              1506 ns/op
BenchmarkGroupToField/multiple/1-16               539619              1862 ns/op              1862 ns/value          144 B/op          6 allocs/op
BenchmarkGroupToField/multiple/2-16               520746              2251 ns/op              1125 ns/value          336 B/op          8 allocs/op
BenchmarkGroupToField/multiple/4-16               343665              3501 ns/op               875.2 ns/value        736 B/op         12 allocs/op
BenchmarkGroupToField/multiple/8-16               178898              6015 ns/op               751.8 ns/value       1530 B/op         17 allocs/op
BenchmarkGroupToField/multiple/16-16              116660             10384 ns/op               649.0 ns/value       3124 B/op         25 allocs/op
BenchmarkGroupToField/multiple/32-16               67484             17460 ns/op               545.6 ns/value       6312 B/op         39 allocs/op
BenchmarkGroupToField/multiple/64-16               35632             35094 ns/op               548.4 ns/value      12713 B/op         68 allocs/op
BenchmarkGroupToField/multiple/128-16              18733             65418 ns/op               511.1 ns/value      25435 B/op        114 allocs/op
BenchmarkGroupToField/multiple/256-16               8793            137116 ns/op               535.6 ns/value      50922 B/op        210 allocs/op

kililc give different commitments depending on the value of multiExpThreshold

multiExpThreshold == 45:

INFO [05-18|19:42:21.287] Iterating state snapshot                 accounts=114215117 slots=400207177 elapsed=12h21m26.912s eta=2.205s
INFO [05-18|19:42:21.537] Iterated snapshot                        accounts=114215117 slots=400223042 elapsed=12h21m27.162s
INFO [05-18|19:42:21.537] Computed verkle tree                     root hash="542518…3afdb6"
INFO [05-18|19:42:21.539] Number of nodes written to DB
           nodes=174639610

multiExpThreshold == 20:

INFO [05-19|04:00:27.202] Iterating state snapshot                 accounts=114215117 slots=400208359 elapsed=15h45m18.981s eta=3.19s
INFO [05-19|04:00:27.785] Iterated snapshot                        accounts=114215117 slots=400223042 elapsed=15h45m19.564s
INFO [05-19|04:00:27.785] Computed verkle tree                     root hash="2c9607…dcf64f"
INFO [05-19|04:00:27.787] Number of nodes written to DB
           nodes=689077769

Looks the same number of nodes didn't get written to othe db, which certainly explains the time difference. Is it also the reason for obtaining different roots?

Unrelated tests failing in geth CI
See [this build failure]https://ci.appveyor.com/project/ethereum/go-ethereum/builds/41512143/job/ef4490waxtngwnpy)

For instance, we have the following issues:

--- FAIL: TestCheckpointRegister (0.01s) panic: undefined value in witness [recovered] panic: undefined value in witness goroutine 11 [running]:

Other tests point to an uninitialized AccessWitness
BlockCancun commit had to be reverted after switch to IPA
After the switch to IPA, the changes introduced in 688952b18b56dcdcf1bb90ab7646b1a8f330214f seem to no longer work.

In particular, the following command

rm -rf .verkle && go build -tags bignum_kilic ./cmd/geth/... && ./geth init genesis_verkle.json && ./geth --datadir=.verkle --nodiscover --mine --miner.threads=1 --http --http.corsdomain="*" --networkid 86 --http.rpcprefix="/verkle" --miner.etherbase=0x617661d148A52bef51a268C728b3A21B58f94306

fails during the init stage, because it tries to (de)serialize a verkle node as a regular MPT node. Not sure why this failed, but I had to revert it in d95451bdba7a4ebc825e7423e01ad0f0af4ffe6d in order to be able to run it.

Contract creation fails

Sending a transaction to create a contract causes a panic:

panic: trying to produce a commitment for an empty subtree

goroutine 62 [running]:
github.com/gballet/go-verkle.Empty.GetCommitmentsAlongPath(...)
	/home/gballet/go/pkg/mod/github.com/gballet/[email protected]/tree.go:857
github.com/gballet/go-verkle.(*InternalNode).GetCommitmentsAlongPath(0xc0006ac410, {0xc0128788c0, 0x20, 0x20})
	/home/gballet/go/pkg/mod/github.com/gballet/[email protected]/tree.go:580 +0xcc
github.com/gballet/go-verkle.GetCommitmentsForMultiproof({0x75ad18, 0xc0006ac410}, {0xc00030b040, 0xd, 0xd})
	/home/gballet/go/pkg/mod/github.com/gballet/[email protected]/proof.go:189 +0x20a
github.com/gballet/go-verkle.MakeVerkleMultiProof({0x75ad18, 0xc0006ac410}, {0xc00030b040, 0xd, 0xd})
	/home/gballet/go/pkg/mod/github.com/gballet/[email protected]/proof.go:207 +0xc5
github.com/ethereum/go-ethereum/trie.(*VerkleTrie).ProveAndSerialize(0xc0000c0768, {0xc00030b040, 0xd, 0xd}, 0xc012d8cba0)
	/usr/home/gballet/src/go-ethereum/trie/verkle.go:187 +0x76
github.com/ethereum/go-ethereum/miner.(*worker).commit(0xc0000b1b00, {0xc0000709a0, 0x0, 0x2}, 0x0, 0x1, {0xc056bd6b867e7087, 0xbac6fadd9d5, 0x20f5c40})
	/usr/home/gballet/src/go-ethereum/miner/worker.go:1038 +0x22d
github.com/ethereum/go-ethereum/miner.(*worker).commitNewWork(0xc0000b1b00, 0xc012631f98, 0x1, 0x617a7e16)
	/usr/home/gballet/src/go-ethereum/miner/worker.go:1022 +0x13c5
github.com/ethereum/go-ethereum/miner.(*worker).mainLoop(0xc0000b1b00)
	/usr/home/gballet/src/go-ethereum/miner/worker.go:460 +0x505
created by github.com/ethereum/go-ethereum/miner.newWorker
	/usr/home/gballet/src/go-ethereum/miner/worker.go:232 +0x817

Proof verification fails at higher powers of z

VerifyVerkleProof seems unable to accept proofs made by MakeVerkleProofOneLeaf when the polynomial is at a higher degree.

In practice, the values generated for y_is in the verifier are different from the ones generated in the prover. Injecting the y_is from the prover into the verifier solve the problem.
Use CoW in node insert + code simplification
This PR is a major refactor that is needed to introduce various optimizations for the block replay:

Optimization:

Use a "copy-on-write" approach in InternalNode : the commitment of a node is updated from only the commitments of its children that were written to with Insert or Delete. That computation is intended to be performed once, at the end of a block creation/verification.

Use MapToScalarField to speed up some stuff, although this is already covered by #291 so it won't be integrated here.

Refactoring:

StatelessNode and InternalNode no longer use diff-insert, as multiple children will typically be updated during block execution

LeafNode uses diff-insert, as insertion into this type of nodes typically happen only once per block

LeafNodeCommit therefore no longer perform any calculation

InternalNode.Commit on the other hand, will only recompute the commitment contribution from the nodes that were written to.

Simplify the code by merging Insert and InsertStem, as well as InsertOrdered and InsertStemOrdered.

This PR shaves roughly 20% of the execution time, compared to master.
Implement CoW for stateless internal nodes

Implement copy-on-write of commitments for StatelessNode as well.

This effectively breaks stateless leaf nodes, which isn't a problem because it is not used in geth at the moment, and a rewrite of the child node management is necessary before geth can execute statelessly again.
remove the delete method from VerkleNode

Verkle trees do not support deletion. The value should be overwritten with 0s instead. As new code paths get added with e.g. differential insertion, reducing the maintenance scope by removing methods that are not necessary make sense.

Measure impact of Bandersnatch optimizations in go-verkle

This PR is only to show and test further the impact of the optimization I proposed in https://github.com/crate-crypto/go-ipa/pull/18 for this repo, since most of the heavy lifting comes from using that dependency for the Bandersnatch elliptic curve implementation.

Here's a bench comparison between master (i.e: current Bandersnatch implementation):

name                                        old time/op    new time/op    delta
ProofCalculation-16                            18.2s ± 1%      6.6s ± 1%  -63.85%  (p=0.002 n=5+8)
ProofVerification-16                          34.9ms ± 1%    30.3ms ± 1%  -13.23%  (p=0.002 n=5+8)
CommitLeaves/insert/leaves/1000-16             212ms ± 3%      81ms ± 2%  -62.04%  (p=0.002 n=5+8)
CommitLeaves/insertOrdered/leaves/1000-16      214ms ± 0%      81ms ± 1%  -62.31%  (p=0.002 n=5+8)
CommitLeaves/insert/leaves/10000-16            2.10s ± 0%     0.77s ± 1%  -63.37%  (p=0.006 n=4+7)
CommitLeaves/insertOrdered/leaves/10000-16     2.13s ± 1%     0.78s ± 1%  -63.52%  (p=0.002 n=5+8)
CommitFullNode-16                             40.6ms ± 2%    14.9ms ± 2%  -63.36%  (p=0.002 n=5+8)
ModifyLeaves-16                                2.72s ± 1%     1.64s ± 0%  -39.74%  (p=0.003 n=5+7)

name                                        old alloc/op   new alloc/op   delta
ProofCalculation-16                           5.05GB ± 0%    2.75GB ± 0%  -45.56%  (p=0.002 n=5+8)
ProofVerification-16                          7.48MB ± 0%    0.17MB ± 0%  -97.72%  (p=0.004 n=5+6)
CommitLeaves/insert/leaves/1000-16            58.8MB ± 0%    34.2MB ± 0%  -41.81%  (p=0.002 n=5+8)
CommitLeaves/insertOrdered/leaves/1000-16     58.9MB ± 0%    34.3MB ± 0%  -41.77%  (p=0.002 n=5+8)
CommitLeaves/insert/leaves/10000-16            562MB ± 0%     324MB ± 0%  -42.32%  (p=0.002 n=5+8)
CommitLeaves/insertOrdered/leaves/10000-16     563MB ± 0%     325MB ± 0%  -42.28%  (p=0.002 n=5+8)
CommitFullNode-16                             13.4MB ± 0%     8.0MB ± 0%  -40.38%  (p=0.002 n=5+8)
ModifyLeaves-16                                740MB ± 0%     319MB ± 0%  -56.83%  (p=0.002 n=5+8)

name                                        old allocs/op  new allocs/op  delta
ProofCalculation-16                            30.0M ± 0%      1.3M ± 0%  -95.68%  (p=0.003 n=5+7)
ProofVerification-16                            115k ± 0%        1k ± 0%  -99.28%  (p=0.003 n=5+7)
CommitLeaves/insert/leaves/1000-16              328k ± 0%       15k ± 0%  -95.39%  (p=0.002 n=5+8)
CommitLeaves/insertOrdered/leaves/1000-16       331k ± 0%       18k ± 0%  -94.56%  (p=0.000 n=5+8)
CommitLeaves/insert/leaves/10000-16            3.18M ± 0%     0.14M ± 0%  -95.50%  (p=0.002 n=5+8)
CommitLeaves/insertOrdered/leaves/10000-16     3.21M ± 0%     0.17M ± 0%  -94.64%  (p=0.002 n=5+8)
CommitFullNode-16                              71.2k ± 0%      3.3k ± 0%  -95.36%  (p=0.002 n=5+8)
ModifyLeaves-16                                5.95M ± 0%     0.22M ± 0%  -96.32%  (p=0.002 n=5+8)

TL;DR for proof calculation:

A reduction of 63.85% in computation time, most coming from avoiding 95.68% allocations.
Indirectly, we reduced 45.56% memory usage.

TL;DR for proof verification:

A reduction of 13.23% in computation time, most coming from avoiding 99.28% allocations.
Indirectly, we reduced 97.72% memory usage.

I wonder if there's any other way the performance of this repo is being measured. This branch is totally open to exploring that if feels that the work on the elliptic curve implementation can be considered to be merged.

condrieu: method handler crashed

Deploying AAVE v2 on Condrieu fails with the following problem:

ERROR[12-16|12:05:47.482] RPC method eth_estimateGas crashed: gas pool pushed above uint64
WARN [12-16|12:05:47.482] Served eth_estimateGas                   conn=[::1]:42714           reqid=825                                  duration=170.401096ms err="method handler crashed"

Multiple performance improvements

This PR contains multiple performance improvements that will be explained in PR comments.

These changes were motivated by optimizing the benchmarks in this go-ethereum draft PR. It's recommended to see that PR first so you have a top-down approach to understanding the changes.

Of course, this had a noticeable impact on the benchmarks of this repo too:

name                                        old time/op    new time/op    delta
ProofCalculation-16                            1.87s ± 2%     0.08s ± 3%  -95.75%  (p=0.003 n=5+7)
ProofVerification-16                          26.9ms ± 2%    26.5ms ± 1%     ~     (p=0.171 n=5+8)
GroupToField/single-16                        1.53µs ± 2%    1.45µs ± 2%   -5.08%  (p=0.002 n=5+8)
GroupToField/multiple/1-16                    1.61µs ± 4%    1.57µs ± 3%   -2.94%  (p=0.048 n=5+7)
GroupToField/multiple/2-16                    1.83µs ± 4%    1.73µs ± 1%   -5.24%  (p=0.003 n=5+7)
GroupToField/multiple/4-16                    2.53µs ± 2%    2.38µs ± 3%   -5.88%  (p=0.002 n=5+8)
GroupToField/multiple/8-16                    3.96µs ± 4%    3.75µs ± 2%   -5.36%  (p=0.002 n=5+8)
GroupToField/multiple/16-16                   6.26µs ± 3%    6.01µs ± 1%   -4.10%  (p=0.003 n=5+8)
GroupToField/multiple/32-16                   11.2µs ± 4%    10.9µs ± 2%   -2.55%  (p=0.006 n=5+8)
GroupToField/multiple/64-16                   21.5µs ± 3%    20.9µs ± 2%   -2.99%  (p=0.006 n=5+8)
GroupToField/multiple/128-16                  40.2µs ± 2%    39.3µs ± 0%   -2.29%  (p=0.003 n=5+7)
GroupToField/multiple/256-16                  79.8µs ± 4%    77.8µs ± 1%   -2.54%  (p=0.010 n=5+7)
CommitLeaves/insert/leaves/1000-16            47.2ms ± 2%    11.9ms ± 3%  -74.71%  (p=0.002 n=5+8)
CommitLeaves/insertOrdered/leaves/1000-16     66.2ms ± 3%    66.5ms ± 4%     ~     (p=0.833 n=5+8)
CommitLeaves/insert/leaves/10000-16            471ms ± 2%      65ms ± 5%  -86.19%  (p=0.002 n=5+8)
CommitLeaves/insertOrdered/leaves/10000-16     626ms ± 1%     830ms ± 2%  +32.68%  (p=0.002 n=5+8)
CommitFullNode-16                             6.30ms ± 5%    5.12ms ± 2%  -18.66%  (p=0.003 n=5+7)
ModifyLeaves-16                                380ms ± 4%     169ms ± 3%  -55.40%  (p=0.002 n=5+8)

Note: this PR is optimistically sitting on top of another branch of go-ipa that has other optimizations. I'll update the go.mod to the official commit when that gets merged. Only after we do that, the replay block GH action will run correctly for go.mod editing reasons.

conversion: use Copy on Write with `InsertOrdered`
The current behavior of InsertOrdered is to compute the commitment of any subtree as soon as it is determined that values will no longer be inserted. This is to save memory.

Because of this, toFr is called multiple times on a given node, and that is wasteful since toMultipleFr could be called if these subtrees were kept in RAM a bit longer. One idea would be to:

implement copy on write in InsertOrdered too

keep the current behavior of InsertOrdered at depth < 2

not flush nodes at greater depth unless their parent are flushed

use toMultipleFr to save on the division when serializing deeper nodes, hoping to benefit from the performance of serializing several points at the same time
Sateless delete is wrong

In stateless, Delete simply inserts value 0. This is incorrect, as the stateful version will first check if the leaf to delete is present, and will return an error if it's not.
Use a versionned hash for the tree root

In order for rollups to use verkle trees, we are considering adding turning the root of the verkle tree into a versionned hash, so that it can be verified by the same (athough modified) precompile that is used in eip-4844.

The version number remains to be picked.

The first byte of the (pedersen hash of) the root commitment is replaced with the version number, and the dropped byte is added to the proof payload, for the precompile to be able to reconstruct the full tree from the proof.
faster algorithm for lagrange precomputes

Maybe you already know about this faster algorithm. I was not able to find the part of the go-verkle code that generates the precompute.

The faster algorithm seems to be around 128x faster.

I implemented this faster precompute algorithm in the version of the verkle tree I am working on https://github.com/zack-bitcoin/verkle/blob/master/src/crypto/poly.erl#L255 poly:calc_DA

The naive algorithm is like this. There are 256 base polynomials. You multiply up every possible combination of 255 of them, to make another set of 256 polynomials. This second set of polynomials is all added together to calculate the precompute.

In the faster algorithm, we are taking advantage of the fact that many polynomials in the second set, they share factors. So we don't need to multiply those shared factors together more than once.

A simple example. the base polynomials are [a, b, c, d, e, f]. So the second set of polynomials is [abcde, abcdf, abcef, abdef, acdef, bcdef].

the order of multiplications is like ab abc abcd abcde

then from the other direction. fe fed fedc fedcb

Then we can multiply pairs of these things together. f * abcd = abcdf fe * abc = abcef fed * ab = abdef fedc * a = acdef

and now we have the 6 polynomials we need to add together.

I think this is the go code that is doing the precompute https://github.com/crate-crypto/go-ipa/blob/master/banderwagon/precomp_multiexp.go#L156 but I can't understand it very well.

Related tags

Data Structures go-verkle

A Left-Leaning Red-Black (LLRB) implementation of balanced binary search trees for Google Go

GoLLRB GoLLRB is a Left-Leaning Red-Black (LLRB) implementation of 2-3 balanced binary search trees in Go Language. Overview As of this writing and to

Dec 23, 2022

Golang implementation of Radix trees

go-radix Provides the radix package that implements a radix tree. The package only provides a single Tree implementation, optimized for sparse nodes.

Dec 30, 2022

A Go implementation of a radix tree, that uses binary searches to speed up insert, retrieve and delete operations on dense trees

radixs A Go implementation of a radix tree, that uses binary searches to speed up insert, retrieve and delete operations on dense trees. This implemen

Feb 14, 2022

A go implementation of Verkle trees

go-verkle

Notes

Running the tests

Owner

Guillaume Ballet

Comments

Figure out why Kilic is so slow

bench: group to field

kililc give different commitments depending on the value of multiExpThreshold

Unrelated tests failing in geth CI

BlockCancun commit had to be reverted after switch to IPA

Contract creation fails

Proof verification fails at higher powers of z

Use CoW in node insert + code simplification

Implement CoW for stateless internal nodes

remove the delete method from VerkleNode

Measure impact of Bandersnatch optimizations in go-verkle

condrieu: method handler crashed

Multiple performance improvements

conversion: use Copy on Write with `InsertOrdered`

Sateless delete is wrong

Use a versionned hash for the tree root

faster algorithm for lagrange precomputes

Related tags

A Left-Leaning Red-Black (LLRB) implementation of balanced binary search trees for Google Go

Golang implementation of Radix trees

A Go implementation of a radix tree, that uses binary searches to speed up insert, retrieve and delete operations on dense trees

Adaptive Radix Trees implemented in Go

Generic types that are missing from Go, including sets, trees, sorted lists, etc.

A generic Go library for implementations of tries (radix trees), state commitments and proofs of inclusion

Go implementation of Count-Min-Log

A Go implementation of the Elias-Fano encoding

Set is a useful collection but there is no built-in implementation in Go lang.

A skip list implementation in Go

Go implementation of C++ STL iterators and algorithms.

Go implementation to calculate Levenshtein Distance.

A Merkle Tree implementation written in Go.

A prefix tree implementation in go

Package ring provides a high performance and thread safe Go implementation of a bloom filter.

A simple Set data structure implementation in Go (Golang) using LinkedHashMap.

A Go library for an efficient implementation of a skip list: https://godoc.org/github.com/MauriceGit/skiplist

A slice-based implementation of a stack. In Go!

gtreap is an immutable treap implementation in the Go Language