ECC Operations Optimizations in R1CS
This note was originally written in collaboration with Wen-Ding Li in March 2022
Our goal is to build groth16 circuits that can verify claims involving ECDSA algorithms.
To do this, we want to implement R1CS circuits for elliptic curve operations (particularly on secp256k1) with a small number of constraints.
Suppose the scalar field of the curve
y^2 = x^3 + b
is F_q where q has 256 bits and the baby Jubjub prime is p which is slightly less 254-bit. Choose n and k so that
- nk >= 256
- 3n + log_2(2k^2) + \epsilon < 254
where \epsilon is a small constant. This means that we can represent 256-bit nonnegative integers in k registers of n bits each (inequality 1), and that in doing arithmetic operations with “overflows,” we won’t overflow the prime field size (inequality 2).
For a definition of overflows and carries, see the “BigInt Multiplication” section of this blog post.
Addition and Multiplication (no carries)
We represent BigInts with k registers of n bits each. For register size X = 2^n, define addition and multiplication without carries by:
- \sum_i a_i X^i +' \sum_i b_i X^i = \sum_i c_i X^i with c_i = a_i + b_i
- \sum_i a_i X^i *' \sum_i b_i X^i = \sum_i d_i X^i with d_i = \sum_{j = 0}^i a_j * b_{i - j}
Tracking overflows
Define n_{z_i} = \lceil log_2(z_i) \rceil for a signal z_i. This function represents the number of bits needed to represent z_i.
For an array z = \sum_i z_i X^i, define n_z = \max_i (n_{z_i}). Tracking n_z allows us to ensure that we are never overflowing the maximum theoretical register size of 253 bits, even when representing numbers in overflow representation.
Additionally, define k_z to be the number of such n_z-bit registers used to represent k.
Suppose that c = a +' b. Then n_c \leq \max(n_a, n_b) + 1, and k_c = \max(k_a, k_b).
Now suppose that d = a *' b. Then n_d \leq n_a + n_b + \lceil log_2(\min(k_a, k_b)) \rceil, and k_d = k_a + k_b - 1.
Handling Carries
See Wen-Ding’s code here: zkREPL
Canonical representation
See Wen-Ding’s code here: zkREPL
PrimeReduce trick
Consider X = 2^{64} and
S = \sum_{i=0}^9 a_i X^i
Where each of the a_i is up to 192+\epsilon bits (probably fine to say \epsilon \leq 6).
We wish to find some S' that is congruent to S modulo p (secp256k1 prime) with few registers.
The p = 2^{256} - k trick is:
2^{256} * Z \equiv k * Z \mod p
Since the secp256k1 prime has p = 2^{256} - 2^{32} - \delta for some small \delta \approx 2^{10}, we can write the following congruence:
Since (2^{32}+\delta)^2 = 2^{64} + 2^{33}\delta + \delta^2.
Let C_i be the coefficient of X^i after terms are accumulated in this last equation. We have:
C_3 = (2^{32}+\delta)a_7 + a_3
C_2 = (2^{32}+\delta)a_6 + a_2
C_1 = (2^{64}+2^{33}\delta+\delta^2)a_9 + (2^{32}+\delta)a_5 + a_1
C_0 = (2^{64}+2^{33}\delta+\delta^2)a_8 + (2^{32}+\delta)a_4 + a_0
Let’s replace 2^{64} with X in the expressions for C_1 and C_0 and re-accumulate:
C_3' = (2^{32}+\delta)a_7 + a_3
C_2' = (2^{32}+\delta)a_6 + a_2 + a_9
C_1' = (2^{33}\delta+\delta^2)a_9 + (2^{32}+\delta)a_5 + a_1 + a_8
C_0' = (2^{33}\delta+\delta^2)a_8 + (2^{32}+\delta)a_4 + a_0
We add an additional 44 bits of overflow at most to the registers (the worst coefficient of an a_i involved in a sum is 2^{33}\delta). Since our a_i are at most 200 bits approximately, the C_i do not overflow the babyjubjub prime (253 bits). So now we have a representation S' = C_3'X^3 + C_2'X^2 + C_1'X + C_0' of a number that is congruent to S, such that all of the C_i' are at most 244 bits. After subtracting off some multiple of p we can check that after carries this is equivalent to 0 in 244 * 4 \approx 1000 constraints.
For reference, let’s see how this plays out for a 7-register number:
S
\equiv a_6X^6 + a_5X^5 + a_4X^4 + a_3X^3 + a_2X^2 + a_1X + a_0
(2^{32}+\delta)a_6X^2 + (2^{32}+\delta)a_5X + (2^{32}+\delta)a_4 + a_3X^3 + a_2X^2 + a_1X + a_0
Here, we get:
C_3 = a_3
C_2 = (2^{32}+\delta)a_6 + a_2
C_1 = (2^{32}+\delta)a_5 + a_1
C_0 = (2^{32}+\delta)a_4 + a_0
AddUnequal
For input points (x_1, y_1) and (x_2, y_2) with sum (x_3, y_3), we verify
- x_3 = \lambda^2 - x_2 - x_1 where \lambda = \frac{y_2-y_1}{x_2-x_1}
- (x_1 + x_2 + x_3)(x_2-x_1)^2 - (y_2-y_1)^2 = 0
- \frac{y_3 + y_1}{x_3 - x_1} = \frac{y_3 + y_2}{x_3 - x_2}, which is equivalent to:
- y_1 * x_3 + (y_3 + y_2) * x_1 = y_2 * x_3 + (y_3 + y_1) * x_2
- x_3, y_3 are 4x64 bit numbers in the range [0, p-1]
To verify each of the first two statements, we do the following:
-
Evaluate both sides using +' and *' so that no carrying takes place.
-
We now wish to verify the identity \sum_i a_i X^i = \sum_i b_i X^i mod q where 0 \leq a_i, b_i < C_1X^3 for a small constant C_1
-
(implementation). Verify that there is a bigint r = \sum_i r_i X^i for which
Carry(\sum_i (a_i - b_i) X^i) = Carry(\sum_i r_i X^i *' \sum_i q_i X^i)
Note that we’ll have to be a little careful on this third substep. We want the quotient r to be nonnegative, so we probably have to add a fat multiple of q to LHS to ensure that LHS is positive. i.e. If registers can be overful to 200 bits, then following the prime trick we end up four-register overful numbers which are about 136 bits overful. This gives us 136+256=392 bit numbers on LHS. So to offset, we probably want to add something like 2^{137}q to LHS, which can be easily done by simply adding the registers of q left shifted by 137 bits to LHS since we’re already in overflow representation.
Finally we do a range check.
- (implementation). constrain x_3 and y_3 to be 64 bits per register and also in [0,p-1] by rangechecking that all registers are in [0,2^{64}-1] and if upper registers are all equal to 2^{64}-1 then checking lowest register is in [0, 2^{64}-2^{32} - 2^9 -2^8 - 2^7 - 2^6 -2^4 -1 -1].
Double
For input points (x_1, y_1) with double (x_3, y_3), we verify
-
y_3^2 = x_3^3 + b
-
\frac{- y_3 - y_1}{x_3 - x_1} = \frac{3 x_1^2}{2y_1}, which is equivalent to:
- -2y_1\cdot (y_3+y_1) = 3x_1^2\cdot(x_3-x_1)
- (x_3,y_3) \neq (x_1,-y_1)
- only need to do x_3 \neq x_1 check
- x_3, y_3 are 4x64 bit numbers and in the range [0,p-1]
Note: If x_1 = x_2 and y_1 + y_2 = 0, then P + Q = O where P = (x_1,y_1) and Q = (x_2, y_2).
Additional Resources
Aztec CRT trick: Aztec emulated field and group operations - HackMD
Previous scratchpad: ECC operation optimizations - HackMD
Onur Kilic’s PLONK strategies: ECC multiplication strategies inside PLONK - HackMD