Lookup Singularity

Thanks CC for review and feedback :slight_smile:

Intro

For years the central component of ZKPs was A* B + D == C. This check was applied repeatedly to build our full circuits. Any computation can be represented in this way. This representation is called R1CS.

It turns out R1CS is prohibitively expensive for certain tasks. This led to increased use of lookup arguments. Currently we use lookups and a variant of R1CS called polynomial constraints to build our circuits.

A lot of things are blocked if we want to build them using only r1cs. So the general flow is that some clever people get together and find efficient ways to arithmetise circuits. They create hand tuned circuits that include both polynomial constraints and lookup arguments. This is circuit specific and is not very scalable.

Polynomial constraints

Polynomial constraints are complicated. The circuit implementer makes many of these polynomial equations. The whole circuit is defined as a system of polynomial equations.
A solution to this system of equations makes a valid proof. It’s hard to reason about the results of systems of equations.
Formal verification tools cannot , at the moment, solve polynomial equations in a prime field.

Lookup arguments

Lookup arguments are set membership checks. They were first used to do efficient big integer arithmetic. Now used for that but also as control flow for VMs and to do operations that are not snark friendly. They are not more efficient for all operations. The cost per lookup introduces some prover cost. We currently try and use lookup arguments rarely because they are expensive in the prover

Why lookup arguments are good

Language

Currently snark friendly languages are very difficult for new programmers to learn. They use prime fields and polynomial constraints which are not a paradigm they are used to. A language that only uses lookup arguments could be much simpler. Current languages are good at doing computation that is specific to snarks but doing traditional computation is much more expensive. A lookup only language would be better on both counts.

Security review

Auditors are not used to finding solutions to a series of polynomial equations. Lookup arguments are much simpler to reason about.

Example:

Take a circuit is an AND of two input variables, the output is the result.

Lookup approach
out = get x,y from AND table
return(out)

The above lookup approach is much simpler than the polynomial approach shown next.

polynomial approach
(x)(x-1) = 0 
y(y-1) = 0 
x*y = out
return(out)

It is likely going to be much simpler to find bugs in lookup only circuits.

Formal verification

Formal verification tools need to solve the series of polynomial equations. Tools that solve these systems do not work well over prime fields. This leads to a lot of extra work being required to make a solver that works over a field. Using only lookup arguments would allow us to use already existing tools. It would also open up other methods for us to explore. Lookup arguments limit the set of valid variables at any point in the circuit. So we go from the set of possible variables being 2^{254} to being 2 or 2^{16}. This could even allow us to do things like state space enumeration to confirm our circuit is correct.

Information theory comparison

In order to effectively describe our program as a circuit we need to make a circuit that maps an input to the correct output. We can look at circuits as information encoded per second of prover time. This seems like a good way to compare the different ways we have of implementing circuits.

Polynomial constraints have limited degree. This is because the degree affects the prover time. The degree limits the information you can encode with them. You can make a polynomial that is degree 5 that can map 5 input values to 5 output values. You can’t include more values in the polynomial unless you increase the degree. In many cases this is fine because we use the structure of the polynomial constraint to do calculations. So multiplication is defined by a polynomial operation A * B == C but XOR is not and needs to be encoded as keys to values.

Lookup arguments can contain much more information. We have previously been limited in table size of 2^{28} elements. But recent work makes the limit of circuit size the maximum trusted setup you can feasibly complete. This limit on the table\_size. Baloo: Nearly Optimal Lookup Arguments. A single polynomial constraint can contain ~ 5 *2^{254} bits of information. Lookup arguments can contain 2^{254} * table\_size bits of information.

Polynomial constraints are very useful where you use the structure of the polynomial constraint. But as tables of bigger size become feasible this advantage will disappear.

Conclusions

If we can efficiently define circuits using lookup arguments only it will lead to simpler tooling and circuits.

Lookup arguments will become more efficient than polynomial constraints for almost everything with time.

We should start building the tooling for a lookup centric zkps

Future work

  • Compare efficiency of circuits right now.

  • Example lookup only language

  • Do a comparison of relative efficiency of various lookup arguments. Try and estimate the pace of improvements

  • Find examples where lookup arguments are better and worse than polynomial constraints

    • Find the worst case examples for both
    • Benchmark many circuits using: lookup arguments, lookup + polynomial constraints and compare them for efficiency
11 Likes

Great post, thanks for sharing! Another interesting idea of similar flavor is to consider recursion protocols as a primitive for cheap structured operations. In particular, with folding scheme based constructions like Nova (that work directly on the level of constraint IOPs), we could witness repeated operations and prove them within a larger SNARK seperately.

For instance, building on your example, suppose we have a large circuit with N AND operations mapping two M bit inputs to one M bit output. Then, we could first extract these N operations from the witness and seperately write a recursive circuit of N steps (verifying one M bit AND op at each step) in Nova, and generate as output a Nova (relaxed R1CS) instance with O(M) constraints + O(1) recursion overhead constraints (~10k or so in case of Nova). Therefore, the prover would do O(N) work outside the main proof (notably the dominating cost is from O(N) MSMs) + O(1) extra checks in their primary circuit, matching the optimal lookup table arguments runtime.

In this setup, we don’t exactly get all the described benefits of security/simplicity lookups have, but it’s not bounded on trusted setup size (as yet?) and efficient implementations of folding schemes like Nova already exist. It would be interesting to try comparing these ideas and benchmarking concrete numbers!

@therealyingtong also noted that similar approaches are used in the ZK EVM circuits (for hashing, for instance)

7 Likes

As Ariel nicely articulated above, I think a hybrid approach makes a lot of sense. One of my biggest pain point of writing Circom today is having to using things like this (don’t know how to call this, Groth16 arithmetization?) as a “poor man’s” lookup table:

2 Likes

I think Ariel is assuming you can’t make a table iwth more than 2**128 elements I think if we can do that then we can do better than native adds and muls especially if you are doing wrong field stuff.

I agree that we need to be able to make very big tables.

3 Likes

The prover work of newer constructions like Baloo is quasi-linear to size
of the set to be opened (rather than the entire table), which is very nice. However, these works still require a linear or quasi-linear setup time with regards to the actual size of the lookup table. Do you think this is an inherent limitation? If so, would that imply making a 2^128 sized lookup table would still be quite prohibitive? (I don’t know what is the real practical size to be, but I would guess something like 2^60 ish).

1 Like

I guess then that with schemes like Baloo, we could get slightly closer the lookup-singularity by increasing the ratio bits/lookup with slightly better pre-processing times and, for sure, much better proving times. But not sure if it gives enough positive expectations for the lookup singularity future. Let me elaborate:

Theoretically, with 4 columns of 2^64 elements each, we should be able to express any 2^256 element. This means that we need 8 columns to represent 2 2^256-elements and 12 columns to represent any operation between them that gives as a result another 2^256 element.

This will look like:

E64 E128 E192 E256 F64 F128 F192 F256 R64 R128 R192 R256
X1 X2 X3 X4 Y1 Y2 Y3 Y4 Z1 Z2 Z3 Z4
XN_1 XN_2 XN_3 XN_4 YN_1 YN_2 YN_3 YN_4 ZN_1 ZN_2 ZN_3 ZN_4

Assuming this, we see that the ratio with tables of 2^64 bit elements for an addition, mul, XOR or any F(x,y)->z is of 12columns/256bit-num operation. So according to @shumo the best case scenario we have now would be of 12columns/256bit-num operation with a feasible-to-build SRS size.

Note that if the lookup can’t be of 2^256 bit elements directly, we’re forced to have some sort of decompositions of the numbers we want to use. This means that we not only need the lookups but also some custom gate or similar thing to perform the decomposition into limbs of base 2^64.

There are two parts where this really concerns me:

  1. Each new operation we want to support (Assume we have Addition and we want Multiplication for example) will require to add 4 more columns with all of the 2^64 possible results of the limb permutation operations that would form the 2^256 possibilities.

This is quite bad and barely optimal. And not in regards proving time specially. But rather setup time. We need to compute 4 new commitments per operation we want to add support for. That means 4 new MSMs and the argument required by the lookup too. And considering we are managing polynomials of 2^64 elements, this might not me a quick task. Even with GPU or similar stuff.

  1. We still need the permutation argument and all the rest of stuff until we can’t lookup directly 2^256-bit numbers. Mainly because there’s no way to compose/decompose the elements into 64-bit limbs unless we use custom gates. Which will then add more commitments. Which will already be extended due to the length of the tables and the Degree finally required by the circuit.

Things I’d like to see/explore:

  • Shouldn’t we be able to express all the possible 8-bit operations we need and then, operate with the byte representations of the Field Elements that we use? I know this sounds a bit crazy. But what if we could define our tables of all of the 8-bit operations and the necessary 16-bit columns that derive into 8-bit limb decomposition when needed?
    So for example, imagine we have a multiplication of 2 bytes. We could first define the columns for Inp1, Inp2 and 16-bit Out. Then, define a decomposition op which would give us the resulting 8-bit limb and the remainder which needs to be added to the other limb etc…
    This is something that came to my mind some times already, and that might be useless, but might also be another perspective to look at the problem from.

  • Could we modelate lookup arguments as Bloom Filters for example? Given an entry (x,y,z) for example where X,Y are inputs and Z an output, run through a high precision Bloom Filter to check that the relation is indeed one of the accepted ones.
    This is also, another idea that sometimes poped on my head. Again, could be completelly useless too. But I think that maybe, lookup singularity might arrive from a change of perspective over the problem, rather than minor optimizations over the previous protocol we had.

2 Likes

We are definitely a ways away from the lookup singularity. One way to get there is to try and find a lookup argument which is better than linear setup time in the size of the table. Tho there might be an impossibility result here. Something about having to include information about every possible result of the computation.

The other way we can over come this is to think about how the elements of our table are very repeated. So we may be able to take advantage of this to build tables with better than linear setup time. For example

  • we want to calculate the a xor b

  • assume a and b are 1 bit

  • Our table is

A B Result
0 0 0
1 0 1
0 1 1
1 1 0
  • Note that the result is commutative. Meaning if you switch A and B the result is the same

  • We can leverage this if our lookup argument can check for A , B AND B , A

  • This also works for addition , multiplication and any other operation that is commutative

  • This should reduce our look table size by 1/2

So its definitely possible to have a sub-linear setup time assuming we can make our lookup arguments allow for for this kind of AND behavior.