@levs57 posted a construction of how one would add lookups to groth16, called UltraGroth. The above linked hackmd has the main details. This thread is for sharing additional discussion between nx, yush, and levs.
nx:
i had some questions after reading ur notes. Sorry if they seem silly, it’s been a while since i’ve messed with groth16
so from what i understand, the main idea here is that we split public and private inputs into multiple rounds, such that private inputs for a round cannot depend on inputs from later rounds?
i’m not really sure why this can’t be done with vanilla groth16. i think it would help if i had an explicit example
levs:
ok, this is almost vanilla groth16, but with slightly modified crs
it has multiple gammas (and I think also multiple deltas if you want multiround public inputs, though I think it is not necessary)
if you just did it with a single delta (i.e. communicate multiple commitments which allegedly correspond to rounds), I’d be unable to see the integrity  i.e. you’d be able to arbitrarily subtract something from commitment to 1st round, and add the same point to 2nd…
which is not good
groth16 lets recall how it works first
we have R1CS witness w_i
and we compute A = [\alpha]_0 + \sum A_i w_i
B = [\beta]_1 + \sum B_i w_i
and C is kinda funky
we have 3 sets of polynomials, and they correspond to rows in matrix repr of R1CS
L_i (x), R_i(x), O_i(x)
“left, right, output”
each C_i is actually (\beta A_i + \alpha B_i + O_i) / delta
and there is also subset of things treated as “public”
and for these C_is are denoted IC_i s
and have different formula
(\beta A_i + \alpha B_i  O_i) / gamma
next, in C there is also some additional stuff related to quotient polynomial
because R1CS equation says that A(x) B(x) = C(x) on our evaluation subset, right?
so it is actually A(x) B(x) = C(x) + Z(x) H(x)
lets denote h_i coefficients of H(x)
so I want to say that A_i = L_i(\tau), B_i = R_i(\tau)
and C_i is kinda funky.
and I forgot signs of course
anyway, into C we also inject terms related to H
so we also have Z_k = \tau^k Z(\tau)/delta
and hence sum h_k Z_k sort of represents term Z(x) H(x), and 1/\delta “marks” it so it is grouped into C
so lets inspect verification eq
e(A, B) = e([\alpha]_1, [\beta]_2) + e(C, \delta) + e(IC, \gamma)
lets ignore public stuff for a moment
e(A, B) opens up into e(\alpha, beta), plus products of Aterms by beta, plus products of Bterms by alpha (all clearly cancelled in RHS)
and actual interesting stuff, which is exactly A(\tau) B(\tau)
and it gets cancelled by C(\tau)/delta * \delta
ok, so now lets try to understand why prover can’t cheat too much
we can do it in algebraic group model, which basically means that we just pretend that these elements are like indeterminates, and every value that prover produces is linear combination of things that it has in common reference string
first, lets observe that lhs must have alpha and beta to produce \alpha \beta term on the right
moreover, because crs doesnt contain [beta]_1 or [alpha]_2, A actually must contain alpha, and B must contain beta
lets also linearly rescale them in a way that A= alpha + stuff, B = beta + stuff
then observe that stuff thats multiplied by alpha can only be cancelled by terms in C, similar to stuff in beta
that will enforce that witness coefficients that are involved in A, B and C are all equal
and then, there will be additional terms related to delta, don’t exactly remember how they are cancelled, but it is actually written in my text
so now lets see what gamma does, functionally
it sort of splits C into 2 parts, such that the union of their coefficients (these are 2 nonintersecting vectors) is the same as coefficients of A, or B
but they can not get intermixed with each other
in groth16, it is used to separate into public and private inputs
but we can use the same principle to separate into just few “rounds”  separate subsets of the circuit, such that the commitment to them is exposed in the proof
now we can hash them, and pass data into public inputs, i.e. use them as actual challenge rounds
which is exactly my proposal
the question of zero knowledge is a bit more subtle than in normal Groth16, which is even rerandomizable  i.e. given a proof, I can make a new proof which is equidistributed in space of all valid proofs
but indeed we can find blinding factors that properly hide everything
There is also a way to get to the same construction from other side, basically starting “modularly”, saying  lets use groth16style construction for linear combinations, and then somehow interface into plonk, which supports rounds.
Then, you can think what functionally works as commitment to data (so you can pass it to plonk ) and apparently find that now you don’t need to pass it, and can just make rounds inside of groth16.
Considering public inputs  I think we can have a single round of public inputs, i.e. multiple gammas, but single delta.
Whats important, though, is that FiatShamir must include actual public inputs. It can be achieved by creating a private signal for each public signal (in a round corresponding to this public signal “real round”), and then applying the constraint that makes them equal.
In reality, public inputs in round 0 are real public inputs, and all others are challenges.
nx:
it’s still not super clear to me what the exact workflow is for a prover here. like say i want to prove some lookup is correct, what do i need to compute and what is public vs private
levs:
oh, considering lookup it is quite easy, we only need 2 rounds
0th round contains all your interesting data, and lookup argument itself (there are a bunch of them, but they all reduce to some sort of randomized check over some array, for example standard halo2 lookup eventually checks permutation argument, which says that \prod (x_i  t) = \prod (y_i  t) for a random t)
now, t is a challenge, and evaluation of this thing is allocated in 1st round
so prover flow is as follows
compute witness to 0th round
compute C of the 0th round
(apply blinding factors here, too, actually)
hash it to obtain t
now compute full witness
t is also substituted as public input in its rightful place
cool, we are now ready to do normal groth16 proof
(except C is like, split)
now verifier does following checks:

<A, B> = <\alpha, \beta> + <C0, \delta_0> + <C1, \delta_1> + <IC, \gamma>

IC.t = Hash(C0) (recall IC are public inputs, so they are sent plainly with the proof)
nx:
is it reasonable to think of this as like “separate” groth16 proofs kinda “chained” together
levs:
nah, more like a single proof with more “unmixable” groups than just private and public inputs
chained thing could be done too, it is just like… worse
i.e. you could imagine 2 groth16 proofs with some sort of interface between them
but to actually move data between them, you need commitment, which, in turn, actually solves the problem already
i.e. “C actually commits to witness”
nx:
i see
is it also technically possible to not reveal the hash challenge and prove it was correctly computed in circuit
so that it just ends up being a normal groth16 proof?
levs:
let me think, thats interesting
I guess no, you will need to somehow pass C0 (parsed as collection of scalar field elements) back into the circuit, still
if not t, you will need to at least pass C0
normal groth proof doesn’t have access to itself in a circuit
I need verifier to pass it as public input
nx:
can’t i do a hash over all the earlier inputs or smth
levs:
you need to hash commitment to private data
you could technically hash EVERYTHING thats in C0
but that would defeat the purpose, i.e. we are like abusing the verifier to give us hash of commitment of stuff to C0
to avoid doing it in circuit
(similar to what happens in plonk, exactly)
instead of doing incircuit hashing, ask verifier to hash the commitment  it is cheaper
levs:
maybe an instructive thing is Plonk + linchecks
you can make an argument, simpler than plonk
that proves the following: for a fixed matrix A (trusted setup data), and committed vector v, it holds that Av = 0
and this can be combined with normal PlonK (even without copy constraints, because we can simulate copy constraints as particular case of lincheck)
so this achieves (in a bit more involved way, but still) the same combo  we have both lookups, and free linear combinations
this even has certain advantages, i.e. you can use gates of higher degree if you use plonkish argument then
and lincheck is quite easy to do  you make normal KZG commitments, and for each basis element e_i you also make something like
e_i / delta + A(e_i) / gamma
and you publish [delta]_2
but do not publish anything related to gamma
so only way to cancel them is make linear combination \sum s_i e_i such that A(\sum s_i e_i) = 0
and then for kzg commitment C (normal one) you ask to find an element C’ such that <C, [1]_2> = <C’, [\delta]_2>
this is like small and really stupid version of groth which is only capable of checking linear equations ;))
nx:
so if this were to be implemented in circom groth16,

r1cs doesn’t need to change, u just need some extra metadata in ur .r1cs file storing what “rounds” each wire belongs to

the witness generation gets a little funky since ur challenges are H(C_i). In the sense that u kind of need to compute ur proof and ur witness together, rather than being able to do them separately
levs:
indeed, exactly
 slightly different format of CRS (normal groth16 crs can be used, but insecure of course, you need different deltas)
considering circom specifically, there are 2 different approaches to syntax that can be taken
you can make “squeeze challenge” primitive, which shifts round to the next  and all signals are allocated to next round after that
it is very simple and clean, but nonmodular  components now become very annoying to use…
second option is allowing user to declare rounds manually, but then they can write something which will not execute (because they made a cycle and require some info depending on a challenge in earlier rounds)  which can be easily detected, though, so the program just will not compile, similar to circular dependency in unassigned signals
yush:
btw whats the main diff between this scheme and bellings? https://ethresear.ch/t/grolupplookupforr1cs/14307
levs:
this is heavier, akin to modular scheme I mentioned
i.e. it is like interfacing from snark to plonk (thats the content of legosnark paper)