Maybe you already know about this faster algorithm. I was not able to find the part of the go-verkle code that generates the precompute.
The faster algorithm seems to be around 128x faster.
I implemented this faster precompute algorithm in the version of the verkle tree I am working on https://github.com/zack-bitcoin/verkle/blob/master/src/crypto/poly.erl#L255
poly:calc_DA
The naive algorithm is like this.
There are 256 base polynomials.
You multiply up every possible combination of 255 of them, to make another set of 256 polynomials.
This second set of polynomials is all added together to calculate the precompute.
In the faster algorithm, we are taking advantage of the fact that many polynomials in the second set, they share factors. So we don't need to multiply those shared factors together more than once.
A simple example. the base polynomials are [a, b, c, d, e, f]. So the second set of polynomials is [abcde, abcdf, abcef, abdef, acdef, bcdef].
the order of multiplications is like
ab
abc
abcd
abcde
then from the other direction.
fe
fed
fedc
fedcb
Then we can multiply pairs of these things together.
f * abcd = abcdf
fe * abc = abcef
fed * ab = abdef
fedc * a = acdef
and now we have the 6 polynomials we need to add together.
I think this is the go code that is doing the precompute https://github.com/crate-crypto/go-ipa/blob/master/banderwagon/precomp_multiexp.go#L156
but I can't understand it very well.