Arithmetization
This page explains how a ZisK execution becomes a system of polynomial constraints. It introduces AIRs and their trace matrices, the fixed/cached/witness column kinds, interactions and buses, the three argument types (permutation, lookup, copy), bus balancing, and the LogUp reduction that turns global bus balancing into per-AIR partial sums checked at aggregation — all tied together by circuits and ZisK's five system buses.
The previous pages described what a ZisK program looks like and how the machine runs it (ISA & processor), and how a RISC-V binary becomes ZisK code in the first place (Transpilation). This page is about the layer beneath all of that: how the prover turns an execution into something it can actually prove.
The short version is that an execution becomes a system of polynomial constraints, and a valid execution is a satisfying assignment of that system. The long version — what kind of polynomials, over what values, hooked up how — is what fills out this page.
Two layers, one pipeline
Every proving system splits cleanly into two layers, and ZisK is no exception:
| Layer | Responsibility |
|---|---|
| Frontend | Express the computation as a structured mathematical object — typically a system of polynomial constraints over a finite field. |
| Backend | Take that object and produce a cryptographic proof of its satisfiability. |
This page is about the frontend. The backend — Goldilocks-field STARKs — is what the Segments page covered: per-segment proving, aggregation trees, recursion, and the root proof. The two layers are decoupled by design — the same frontend can target different backends, and improvements to either don't require redesigning the other.
ZisK's frontend is shaped by a few goals:
- Separation of concerns — each chip can be designed, implemented, and tested on its own.
- Composability — chips can be reused across different computations or combined in different configurations.
- Scalability — different chips can have different sizes, so resources track actual usage rather than worst-case budgets.
- Backend independence — the arithmetization can be paired with any compatible proof system.
All of those are achieved by one core abstraction: the AIR.
AIRs
An Algebraic Intermediate Representation (AIR) describes a computation as a matrix of values plus a set of polynomial constraints that must hold at every row.
col 0 col 1 col 2 col W-1
┌──────┬──────┬──────┬─── ─┬──────┐
0 │ ... │ ... │ ... │ │ ... │ row 0
1 │ ... │ ... │ ... │ ··· │ ... │ row 1
2 │ ... │ ... │ ... │ │ ... │ row 2
│ ⋮ │ ⋮ │ ⋮ │ │ ⋮ │ ⋮
N-1│ ... │ ... │ ... │ │ ... │ row N-1
↑ trace matrix T ∈ F^(N×W)
Constraints are polynomials over the column values at the current row and a few rows ahead (the depth of the constraint), evaluated row by row. Every constraint has to evaluate to zero at every row of a valid trace. The matrix is allowed to be wide (many columns) or tall (many rows), and different AIRs in the same circuit can have different dimensions.
ZisK extends the standard definition with AIR values — per-instance scalars that aren't indexed by row. AIR values are how global constants like public inputs and aggregation roots become first-class citizens of the constraint system rather than being awkwardly encoded as per-row data.
The formal definition (whitepaper §4.1) is in a footnote
worth, but the idea is just the table above plus "the
constraints can also see a small per-instance scalar vector."
N is always a power of two because the backend uses a
low-degree test over a multiplicative subgroup, and subgroup
sizes have to divide a power of two.
Three kinds of column
Not every column has the same lifecycle. ZisK distinguishes three kinds, and the distinction is what lets the proving system avoid redundant work and stay precise about what the verifier learns:
| Kind | Determined… | Visible to verifier? | Typical use |
|---|---|---|---|
| Fixed | At setup time | Yes — part of the AIR's description. | Lookup tables, instruction encodings. |
| Cached | Per program, not per input | No (committed once, reused across all proofs of that program). | The transpiled ROM instruction trace. |
| Witness | Per execution | No (committed per proof). | Register values, memory accesses, intermediate results. |
A fixed column is the same for every execution anywhere — there's no point committing it again per proof. A cached column depends on the program (its ELF) but not on the inputs — committed once at setup, reused across all proofs of that program. A witness column carries the specific execution and gets freshly committed every time.
When an AIR has fixed columns, the trace height N is also
fixed at setup.
Interactions and buses
A single AIR is self-contained — its constraints can only reference columns within its own trace. They can't reach over to another AIR's columns. But realistic computations cross AIR boundaries constantly: Main fetches an instruction from the ROM AIR, reads a value the Memory AIR wrote, dispatches a hash to a precompile, queries a base-operation chip's lookup table.
ZisK expresses those relationships without merging every AIR into one monolithic constraint system. The mechanism is interactions — row-level messages one AIR sends and another receives, mediated by a shared channel called a bus.
Each interaction is a tuple (b, m, e):
| Component | What it is |
|---|---|
b | The bus identifier — a fixed field element selecting which bus the interaction is on. |
m | The multiplicity polynomial — at each row, a signed integer indicating how many times this row sends (positive) or receives (negative) the message. |
e = (e_1, …, e_ℓ) | The message polynomials — at each row, the ℓ values being communicated. |
Both m and e_j are polynomials over the same variables as
the AIR's constraints, so they can depend on the trace and on
AIR values. A typical sender row has m(r) = +1 and e(r)
equal to the message payload; a typical receiver row has
m(r) = -1 and e(r) equal to the same payload:
Bus balancing
What ties all those messages together is bus balancing —
the single condition that across all AIRs and all rows, the
signed sum of m(r) · e(r) on each bus must be zero.
Equivalently, every message sent on a bus is received exactly
once with matching content. Gaps, duplications, and reorderings
all surface as a non-zero sum.
That one condition is the foundation of every cross-AIR argument in ZisK — lookups, permutations, table queries, instruction dispatches, memory consistency. They're all different patterns of "send this, receive that, multiplicity matters."
Arguments
Interactions define the communication structure; arguments specify the semantic relationship being enforced. An argument is realised as a pair of interactions on a shared bus — one AIR sending with positive multiplicity, another receiving with negative multiplicity — chosen so that bus balancing becomes equivalent to the desired relationship.
ZisK uses three argument types:
| Argument | What it asserts | Used for |
|---|---|---|
| Permutation | Two sequences are permutations of each other — same elements with the same multiplicities, possibly in different order. | Memory's sorted access log; reorderings that preserve content. |
| Lookup | Every element of a query sequence appears in a table sequence (no length or multiplicity match required). | Range checks, table-driven computations, opcode validation. |
| Copy | Designated cells across different parts of a witness hold equal values. Implemented as a permutation argument on the cells within each equivalence class. | Wiring intermediate values between AIRs without duplicating them. |
The Memory chip's sorted-access log is the canonical
permutation example. "This byte is in [0, 255]" is the
canonical lookup. And the result of an ALU operation matching
the value written to a register is the canonical copy
argument.
The LogUp reduction
There's a problem with checking bus balancing naïvely: iterating over every row of every AIR, summing messages, and verifying the totals would force the verifier to re-process the entire witness. That defeats the whole point of succinct proofs.
ZisK uses LogUp to reduce bus balancing to a small set of polynomial constraints the backend can verify in time independent of the trace size. The reduction boils down to two ideas applied in sequence.
First, compress every multi-element message into one field
element. Using a verifier-supplied challenge γ, the message
vector e(r) = (e_1(r), …, e_ℓ(r)) folds into a single value:
ê(r) = b + γ¹·e_1(r) + γ²·e_2(r) + ⋯ + γ^ℓ·e_ℓ(r)
The bus identifier b is included to prevent cancellation
across different buses.
Second, reduce multiset equality to a rational identity.
With another challenge α, two multisets S and R are equal
if and only if
∑_{s ∈ S} 1/(α + s) = ∑_{r ∈ R} 1/(α + r)
This identity is hard to satisfy by accident: the prover has
to commit to its witness before α is sampled, so it can't
just construct values that make the equation balance.
Each AIR carries an extra partial-sum column S_i whose
final value s_i is that AIR's signed contribution to the
bus. Once every AIR has computed its s_i, global bus
balancing reduces to one scalar equation:
∑ᵢ s_i = 0
Each AIR proof exposes its own s_i as a public output. The
aggregation step (covered on Segments → The shared challenge)
sums them and verifies the equation at the root:
The property worth stressing: each AIR proves only its own local constraints plus the running partial sum. The cross-AIR consistency check collapses into a single arithmetic equality at the very end of the pipeline.
Circuits
A circuit is the whole construction — a collection of AIRs each paired with its set of interactions. Different AIRs can have different trace heights, and bus identifiers connect any two interactions that should be talking to each other.
A circuit is satisfied when two conditions both hold:
- Local satisfaction — each AIR's witness satisfies its own constraints at every row.
- Global bus balancing — every bus across the whole circuit balances.
Local satisfaction is verified independently per AIR. Global bus balancing is enforced by the aggregation step at the end of the proving pipeline — the place where every AIR's partial sum lands and the verifier checks they cancel to zero.
The five system buses
ZisK uses five named buses for its cross-chip plumbing:
| Bus | What flows on it | Used by |
|---|---|---|
| Operation Bus | (op, a, b, c, flag, extra) — a delegated operation. | Main → Base Operations / Precompiles. |
| ROM Bus | (pc, a_imm, b_imm, op, …) — an instruction at pc. | Main reads from the ROM chip. |
| Memory Bus | (op_mem, addr, t_mem, width, data) — memory access. | Anything that touches memory; Memory chip checks consistency. |
| Continuation Bus | Per-chip state across segment boundaries. | Chips with persistent state (Main, Memory, …). |
| Table Buses | Per-table lookup queries. | Instruction executors → their lookup tables. |
Each bus has its own message schema. The Specialized chips page describes the chips that send and receive on them, and the Segments page covers how the Continuation Bus carries state across boundaries.
Where this picks up
You now have the vocabulary every other part of the proof
system builds on: AIRs and trace matrices,
fixed/cached/witness columns, interactions (b, m, e), signed
multiplicities, bus balancing, the three argument types, LogUp
turning balancing into a polynomial-degree argument with
per-AIR partial sums, and circuits as collections of AIRs plus
interactions.
The next page, Limits, closes the section with the hard upper bounds the current ZisK setup imposes — how large a program, how long an execution, how much input each circuit can handle — and what fraction of each bound a real workload actually consumes.