Deploy binaries directly
A step-by-step guideline for the lightest prover deployment: the
binaries on your PATH, started by hand, configured via two
short TOML files. By the end of this page a coordinator and one
worker are running on your machine.
When to pick this
Pick this path when you want no service-manager and no container in the loop — you launch the binaries yourself and own their lifecycle. Right shape for:
- A developer workstation testing a real cluster locally.
- Custom orchestrators where systemd / launchd / Docker would get in the way.
For a 24/7 production prover, see Deploy as services; for a Compose stack with Prometheus and Grafana attached, see Deploy with Docker.
Install the binaries
Both install paths: the prebuilt ziskup flow and the
from-source build are documented end to end in the developer
install guides, alongside system dependencies (apt / brew),
shared-memory tuning, and proving-key setup:
Get the configuration files
The cluster reads two TOMLs: one for the coordinator, one for the
worker. You can either copy them out of a repo clone or write
them yourself with any editor both end with the same files at
~/zisk-cluster/.
Download from the repo
Grab the two files directly from GitHub repository:
mkdir -p ~/zisk-cluster
curl -fsSL -o ~/zisk-cluster/coordinator.toml \
https://raw.githubusercontent.com/0xPolygonHermez/zisk/main/distributed/deploy/config/coordinator.toml
curl -fsSL -o ~/zisk-cluster/worker.toml \
https://raw.githubusercontent.com/0xPolygonHermez/zisk/main/distributed/deploy/config/worker.toml
If you already have the repo cloned (e.g. for the source build),
cp distributed/deploy/config/{coordinator,worker}.toml ~/zisk-cluster/ works just the same.
Write them yourself
mkdir -p ~/zisk-cluster && cd ~/zisk-cluster
Open coordinator.toml in nano, vim, or your editor of
choice and paste:
[service]
name = "ZisK Coordinator"
environment = "production" # development | staging | production
[server]
host = "0.0.0.0"
port = 7001 # client-facing gRPC (overridden on the CLI below)
shutdown_timeout_seconds = 30
[coordinator]
port = 50052 # worker-facing gRPC (overridden on the CLI below)
[metrics]
enabled = true
host = "0.0.0.0"
port = 9091 # Prometheus + /health (overridden on the CLI below)
[logging]
level = "info" # trace | debug | info | warn | error
format = "json" # pretty | json | compact
[backend]
mode = "coordinator" # must be "coordinator"
Open worker.toml and paste:
[worker]
# worker_id = "worker-a" # default: random UUID; pin for stable logs
compute_capacity = { compute_units = 10 }
environment = "production"
# inputs_folder = "." # default: current directory
[coordinator]
url = "http://127.0.0.1:50052" # overridden on the CLI below
[connection]
reconnect_interval_seconds = 5
heartbeat_timeout_seconds = 30
[logging]
level = "info"
format = "json"
~/zisk-cluster/ is just a convention used by this page, you
can drop the TOMLs anywhere and point --config at whatever path
you choose.
Configure the coordinator
The table below explains everything zisk-coordinator reads:
| Table | Purpose |
|---|---|
[service] | Identity used in log labels. environment is a free-form tag (development / staging / production). |
[server] | The public gRPC API where clients submit jobs. shutdown_timeout_seconds is the drain window applied on SIGTERM. |
[coordinator] | The cluster gRPC port workers dial to register and stream segments. |
[metrics] | GET /metrics returns Prometheus text; GET /health returns 200 OK while alive. |
[logging] | pretty for human-friendly stdout; json for log aggregators (Loki, Datadog). |
[backend] | Internal — must stay coordinator. |
Edit it freely; service name, log format / level, environment
label, bind addresses, ports. The launch command in this guide o
verrides the three ports back to canonical defaults
(7000 / 50051 / 9090) regardless of what's in the TOML.
Settings resolve in four layers, later wins: built-in defaults →
TOML → env vars (ZISK_COORDINATOR_*, RUST_LOG) → CLI flags.
Full reference on
zisk-coordinator.
Configure the worker
The supplied worker.toml points the worker at
coordinator on port 50052. The four tables it reads:
| Table | Purpose |
|---|---|
[worker] | Identity + capacity. compute_units is an abstract weight the coordinator uses to assign segments. |
[coordinator] | Where the worker dials. |
[connection] | How the worker reacts to network blips — reconnect backoff and heartbeat deadline. |
[logging] | Same shape as the coordinator's. |
There's intentionally no [backend] table — backend selection
(Assembly emulator, Rust emulator, GPU, Plonk) is done on the
command line. See zisk-worker for every flag.
Edit it freely, pin a stable worker_id, raise or lower
compute_units, switch the log format, tune the
reconnect/heartbeat windows. The --coordinator-url override in
the launch command bin this guide will point the worker at the
canonical:50051 regardless of what's in the TOML.
Settings resolve in four layers, later wins: built-in defaults →
TOML → env vars (ZISK_WORKER_*, RUST_LOG) → CLI flags. Full
reference on
zisk-worker.
Start the coordinator
In your first terminal, launch the coordinator in the foreground.
The --*-port flags override the Docker-tuned ports back to
canonical defaults:
zisk-coordinator \
--config ~/zisk-cluster/coordinator.toml \
--api-port 7000 \
--cluster-port 50051 \
--metrics-port 9090
You should see three "listening on" lines for 0.0.0.0:7000,
0.0.0.0:50051, and 0.0.0.0:9090. Sanity-check from any other
shell:
curl -i http://127.0.0.1:9090/health # → HTTP/1.1 200 OK
Leave the terminal open.
Override only the conflicting port (e.g. --metrics-port 9095).
Keep the others on the defaults so the worker and smoke-test
client below find them.
Start the worker
In your second terminal, launch the worker, pointing it at the
coordinator on the canonical :50051 you just bound:
zisk-worker \
--config ~/zisk-cluster/worker.toml \
--coordinator-url http://127.0.0.1:50051
The worker resolves the proving key (default
~/.zisk/provingKey), opens a bidirectional WorkerStream to
127.0.0.1:50051, registers, and starts heartbeating. The
coordinator's log shows:
INFO worker registered: <uuid> capacity=10
You now have a one-coordinator, one-worker cluster running locally.
A handful of flags are worth knowing without flipping to the full reference:
| Flag | Purpose |
|---|---|
--gpu | Turn on GPU proving. The single biggest performance lever — proving throughput typically scales several-fold versus the CPU/ASM path. |
--worker-id | Pin a stable id for log correlation instead of the random UUID. |
--compute-capacity | Raise / lower the advertised weight without editing the TOML. |
--proving-key | Use a non-default proving-key location. |
On a CUDA-capable host, --gpu is the biggest single performance
unlock the worker offers. Add --max-streams N (start with one
stream per ~8 GB of GPU memory) and raise --compute-capacity
accordingly so the coordinator assigns it proportionally more
segments. See
zisk-worker / Performance tuning.
Submit a smoke-test job
The repo ships a gcd example whose remote-host binary submits
a real proving job over the public gRPC API. Open a third
terminal and pull the repo if you haven't already, then run the
example:
# Skip the clone if you already have one (e.g. from the source-build install)
git clone https://github.com/0xPolygonHermez/zisk.git
cd zisk/examples/gcd/host
cargo run --release --bin remote-host
The binary builds a
ProverClient::remote("http://localhost:7000") internally — the
same code path your own host applications use.
While it runs:
- The coordinator log shows the job lifecycle (
Queued → Running → Completed), segment assignments, and the promoted-aggregator pick. - The worker log shows witness generation, partial proofs,
and a
job completemessage.
When remote-host exits cleanly, the proof is on disk.
Stop the cluster
Each binary runs in the foreground, so Ctrl-C (or SIGTERM) in
the terminal you launched it from triggers a graceful shutdown.
The coordinator stops accepting new jobs and waits up to
[server].shutdown_timeout_seconds (default 30s) for in-flight
jobs to drain — anything still running when that window closes
surfaces as Failed on the client side. The worker stops
accepting new assignments but lets the segments it's currently
proving finish before exiting, so the coordinator marks it
offline only once its heartbeats stop.