Distributed Execution
Generating a ZisK proof means proving the full execution trace of a program. For real workloads, that trace is too large and too slow to prove on a single machine. A ZisK cluster splits the trace into segments, proves each segment in parallel on separate machines, and aggregates the results into a single final proof. Throughput and latency scale with the number of machines you give it.
This guide covers the three things you need to run distributed proving: the cluster's architecture, a single-host quickstart that gets a job through the binaries, and the production path that deploys the same binaries on bare Linux hosts with systemd.
Architecture
A ZisK cluster is two binaries: a single zisk-coordinator and one
or more zisk-worker instances.
flowchart TB
Host(["Host application<br/>RemoteClient"])
subgraph Cluster["ZisK cluster"]
Coord["zisk-coordinator<br/>:7000 :50051 :9090"]
W1[zisk-worker 1]
W2[zisk-worker 2]
W3[zisk-worker 3]
Agg[Aggregation tree]
end
Out(["Final proof"])
Host -->|"gRPC :7000 prove request"| Coord
Coord -->|assign segments| W1
Coord -->|assign segments| W2
Coord -->|assign segments| W3
W1 -->|segment proof| Agg
W2 -->|segment proof| Agg
W3 -->|segment proof| Agg
Agg --> Out
Out -->|return proof| Host
The coordinator
The coordinator is the only stateful process in the cluster. It exposes a public gRPC interface that hosts use to submit proof requests, poll job status, and retrieve results. From the host's point of view, the coordinator is the only endpoint it ever talks to; workers are an invisible implementation detail.
Internally, the coordinator splits each job into segments, assigns them to workers, and returns the final proof. It also caches the proving keys derived from each uploaded guest ELF, so subsequent jobs for the same program skip the expensive setup step.
Workers
Workers are the proving processes. Each worker connects outbound to the coordinator and waits for segment assignments. Workers are stateless across jobs, holding only the segments they are currently proving. You can add, remove, or restart them without touching the coordinator or losing cluster state.
The first worker to send its partial proof to the coordinator is automatically promoted to aggregator for that job. The aggregator collects the remaining segment proofs and assembles the final proof, then returns it to the coordinator.
Proving pipeline
Once a job is submitted, the coordinator selects workers from the available pool and runs three phases:
- Partial contributions. Each assigned worker processes its segments and returns partial challenges. The coordinator collects them and derives a single global challenge.
- Prove. The coordinator broadcasts the global challenge to all workers. Each worker computes its partial proofs and returns them.
- Aggregation. As partial proofs arrive, the coordinator builds an opportunistic binary aggregation tree, folding proofs in as they land. The first worker to deliver its partial proof is promoted to aggregator and assembles the final proof.
sequenceDiagram
participant C as Client
participant Coord as Coordinator
participant W as Workers
C->>Coord: prove(request)
Coord->>W: assign segments
Note over Coord,W: Phase 1: Partial contributions
W->>Coord: partial challenges
Note over Coord,W: Phase 2: Prove
Coord->>W: global challenge
W->>Coord: partial proofs
Note over Coord,W: Phase 3: Aggregation
Note over W: First worker to reply<br/>becomes aggregator
Coord->>W: aggregate
W->>Coord: final proof
Coord->>C: return proof
Quickstart: single-host cluster
This brings up one coordinator and one worker on the same machine, then submits a real proving job. It is the smallest deployment that exercises the production binaries end-to-end.
Prerequisites
- Linux x86_64
- Rust toolchain (
cargo --versionshould work) - ~32 GB free RAM (Assembly emulator preallocates large shared regions)
Clone the repo:
git clone https://github.com/0xPolygonHermez/zisk.git
cd zisk
Build the binaries
cargo build --release --bin zisk-coordinator --bin zisk-worker
The first build takes several minutes. The output binaries land at:
target/release/zisk-coordinator
target/release/zisk-worker
These are the same binaries used in every later deployment.
Start the coordinator
cargo run --release --bin zisk-coordinator
The coordinator binds three default ports on startup:
| Port | Purpose |
|---|---|
| 7000 | Client-facing gRPC API. Host applications connect here. |
| 50051 | Worker-facing gRPC port. Workers connect here. |
| 9090 | Prometheus metrics endpoint and /health liveness probe. |
If the coordinator exits with Address already in use, override the
offending port:
cargo run --release --bin zisk-coordinator -- \
--api-port 8000 --cluster-port 60000 --metrics-port 5245
Start a worker
In a second terminal:
cargo run --release --bin zisk-worker -- \
--config distributed/deploy/config/worker.toml
worker.toml points the worker at http://127.0.0.1:50051, advertises
ten compute units, and sets the log level to debug. On a successful
handshake:
INFO connecting to coordinator http://127.0.0.1:50051
INFO registered as worker <random-uuid> (capacity 10)
INFO heartbeat ok
The coordinator logs the matching side:
INFO worker registered: <random-uid> capacity=10
Health check
With the coordinator and worker both running, verify the cluster in two steps: a liveness probe and an end-to-end proving job.
Liveness probe. In a third terminal:
curl http://127.0.0.1:9090/health
A healthy coordinator returns 200 OK with an empty body.
Smoke-test proof. Submit a real job from the included example:
cd examples/sha-hasher/host
cargo run --release --bin prove-remote
The prove-remote binary builds a ProverClient::remote("http://127.0.0.1:7000"),
uploads the guest ELF, and waits for the final proof. End-to-end:
the coordinator splits the trace into segments and hands them to the
worker, the worker produces STARK proofs, and the coordinator
aggregates them into the final proof. Terminals 1 and 2 show the
matching coordinator and worker activity.
Deployment with scripts
This section deploys the same two binaries on bare Linux hosts under systemd, the canonical path for a ZisK cluster.
Prerequisites
- One Linux host with
sudofor the coordinator - One or more Linux hosts with
sudofor workers, each with 64 GB RAM (Assembly emulator) or 32 GB (Rust emulator) - Every worker must reach the coordinator; keep all hosts on the same private network for low coordinator/worker latency
Clone the repo on every host:
git clone https://github.com/0xPolygonHermez/zisk.git
cd zisk
Install the coordinator
On the coordinator host:
sudo distributed/deploy/scripts/coordinator/install.sh
The script:
- Creates the
zisk-coordinatorsystem user - Drops the binary at
/usr/local/bin/zisk-coordinator - Writes the config at
/etc/zisk/coordinator.toml(mode640) - Creates the working directory at
/var/lib/zisk - Installs the systemd unit and runs
systemctl enable --now
Verify the service:
sudo systemctl status zisk-coordinator
sudo journalctl -u zisk-coordinator -f
You should see the same two "listening on" lines from the
quickstart. If the service is failed, shows the
underlying error (most often a port conflict or missing config
field).
Configure the coordinator
Every setting is optional; the binary falls back to a built-in default for anything you leave out.
Override precedence (later wins): built-in defaults → config file →
ZISK_COORDINATOR_* environment variables → CLI flags.
Edit /etc/zisk/coordinator.toml:
[service] — coordinator identity.
| Setting | Default | Notes |
|---|---|---|
name | "ZisK Coordinator" | Shown in logs and status output. |
environment | development | One of development, staging, production. Use production. |
[server] — client-facing gRPC API.
| Setting | Default | Notes |
|---|---|---|
host | 0.0.0.0 | Listen address. Bind to a specific interface to restrict access. |
port | 7000 | Client gRPC port. CLI: --api-port, env: ZISK_COORDINATOR_API_PORT. |
shutdown_timeout_seconds | 30 | Drain time after a shutdown signal before forced exit. |
[coordinator] — worker-facing port and core tuning.
| Setting | Default | Notes |
|---|---|---|
port | 50051 | Worker gRPC port. CLI: --cluster-port, env: ZISK_COORDINATOR_CLUSTER_PORT. |
config_file | (none) | Optional path to a coordinator-core tuning file. |
[metrics] — Prometheus endpoint.
| Setting | Default | Notes |
|---|---|---|
enabled | true | Set false to disable /metrics. /health stays available either way. |
host | 0.0.0.0 | Listen address for the scrape endpoint. |
port | 9090 | Scrape port. CLI: --metrics-port, env: ZISK_COORDINATOR_METRICS_PORT. |
[logging] — what gets logged and where.
| Setting | Default | Notes |
|---|---|---|
level | info | trace, debug, info, warn, error. RUST_LOG takes precedence. |
format | pretty | pretty, json (production aggregators), or compact. |
file_path | (none) | Rotating daily log file. Leave unset on systemd hosts; journald captures stdout. |
After editing:
sudo systemctl restart zisk-coordinator
Install workers
Workers need the proving keys on local disk before they can start. On each worker host, download and extract them first:
wget https://storage.googleapis.com/zisk-setup/zisk-provingkey-0.16.0.tar.gz
tar -xzf zisk-provingkey-0.16.0.tar.gz
Then run the installer, pointing it at the extracted directory:
sudo distributed/deploy/scripts/worker/install.sh \
--proving-key /path/to/provingKey --gpu
If you skip --proving-key, the installer falls back to
/var/lib/zisk-worker/provingKey; place (or symlink) the extracted
keys there if you prefer the default. The --gpu flag can be ommited
if want to work with cpu.
The worker starts immediately, but its default coordinator URL
(http://127.0.0.1:50051) points at localhost — it will keep retrying
this address until you redirect it to your real coordinator.
Point the worker at the coordinator
Edit /etc/zisk/worker.toml and set the coordinator URL:
[coordinator]
url = "http://<coordinator-host>:50051"
sudo systemctl restart zisk-worker
sudo journalctl -u zisk-worker -f
Workers retry the connection every reconnect_interval_seconds
(default 5) until the coordinator answers. Start order does not
matter.
Configure the worker
Annotated example: distributed/crates/worker/config/prod.toml.
Override precedence: built-in defaults → config file →
ZISK_WORKER__* environment variables (double underscore as table
separator) → CLI flags.
Edit /etc/zisk/worker.toml:
[worker] — identity, capacity, on-disk location.
| Setting | Default | Notes |
|---|---|---|
worker_id | random UUID | Pin to e.g. the hostname so log correlation works at scale. |
compute_capacity.compute_units | 10 | Start at one unit per physical CPU core (minus two for OS overhead), plus one per GPU stream. |
environment | development | development or production. |
inputs_folder | /var/lib/zisk-worker/inputs | Where the worker writes intermediate input files. Override only for a faster disk or separate partition. |
[coordinator] — registration target.
| Setting | Default | Notes |
|---|---|---|
url | http://127.0.0.1:50051 | gRPC URL of the coordinator's worker-facing port. |
[connection] — reaction to network trouble.
| Setting | Default | Notes |
|---|---|---|
reconnect_interval_seconds | 5 | Backoff between reconnect attempts when the coordinator is unreachable. |
heartbeat_timeout_seconds | 30 | How long to wait for a heartbeat before treating the connection dead. |
[logging] — same shape as the coordinator's [logging]
table.
After editing:
- In Linux:
sudo systemctl restart zisk-worker
Add more workers
Run the install script on as many hosts as you want. All workers register against the same coordinator and receive work proportional to their advertised capacity.
flowchart TB
subgraph App["Application host"]
AP["host program<br/>RemoteClient"]
end
subgraph H1["Coordinator host"]
C["zisk-coordinator<br/>:7000 :50051 :9090"]
end
subgraph H2["Worker host A (32 units)"]
WA[zisk-worker]
end
subgraph H3["Worker host B (32 units)"]
WB[zisk-worker]
end
subgraph H4["Worker host C (16 units)"]
WC[zisk-worker]
end
AP -->|":7000"| C
WA -->|":50051"| C
WB -->|":50051"| C
WC -->|":50051"| C
CLI references
A handful of operational knobs are CLI-only and not exposed in the TOML:
| Flag | Default | Description |
|---|---|---|
--proving-key | ~/.zisk/provingKey | Path to the proving-key folder |
--elf | (none) | Path to the ELF file |
--shared-tables | false | Share tables when running in a cluster |
--verify-constraints | false | Verify constraints after witness gen |
-n, --number-threads-witness | (none) | Threads for witness computation |
-g, --gpu | false | Enable GPU mode (CUDA build only) |
-t, --max-streams | (none) | Maximum GPU streams |
CLI flags override the config file for one-off testing:
zisk-coordinator --api-port 8000 --cluster-port 60000 --log-level debug
zisk-worker --coordinator-url http://prod-coord:50051 --compute-capacity 32