Skip to main content

Deploy binaries directly

A step-by-step guideline for the lightest prover deployment: the binaries on your PATH, started by hand, configured via two short TOML files. By the end of this page a coordinator and one worker are running on your machine.

When to pick this

Pick this path when you want no service-manager and no container in the loop — you launch the binaries yourself and own their lifecycle. Right shape for:

  • A developer workstation testing a real cluster locally.
  • Custom orchestrators where systemd / launchd / Docker would get in the way.

For a 24/7 production prover, see Deploy as services; for a Compose stack with Prometheus and Grafana attached, see Deploy with Docker.


Install the binaries

Both install paths: the prebuilt ziskup flow and the from-source build are documented end to end in the developer install guides, alongside system dependencies (apt / brew), shared-memory tuning, and proving-key setup:


Get the configuration files

The cluster reads two TOMLs: one for the coordinator, one for the worker. You can either copy them out of a repo clone or write them yourself with any editor both end with the same files at ~/zisk-cluster/.

Download from the repo

Grab the two files directly from GitHub repository:

mkdir -p ~/zisk-cluster
curl -fsSL -o ~/zisk-cluster/coordinator.toml \
https://raw.githubusercontent.com/0xPolygonHermez/zisk/main/distributed/deploy/config/coordinator.toml
curl -fsSL -o ~/zisk-cluster/worker.toml \
https://raw.githubusercontent.com/0xPolygonHermez/zisk/main/distributed/deploy/config/worker.toml

If you already have the repo cloned (e.g. for the source build), cp distributed/deploy/config/{coordinator,worker}.toml ~/zisk-cluster/ works just the same.

Write them yourself

mkdir -p ~/zisk-cluster && cd ~/zisk-cluster

Open coordinator.toml in nano, vim, or your editor of choice and paste:

~/zisk-cluster/coordinator.toml
[service]
name = "ZisK Coordinator"
environment = "production" # development | staging | production

[server]
host = "0.0.0.0"
port = 7001 # client-facing gRPC (overridden on the CLI below)
shutdown_timeout_seconds = 30

[coordinator]
port = 50052 # worker-facing gRPC (overridden on the CLI below)

[metrics]
enabled = true
host = "0.0.0.0"
port = 9091 # Prometheus + /health (overridden on the CLI below)

[logging]
level = "info" # trace | debug | info | warn | error
format = "json" # pretty | json | compact

[backend]
mode = "coordinator" # must be "coordinator"

Open worker.toml and paste:

~/zisk-cluster/worker.toml
[worker]
# worker_id = "worker-a" # default: random UUID; pin for stable logs
compute_capacity = { compute_units = 10 }
environment = "production"
# inputs_folder = "." # default: current directory

[coordinator]
url = "http://127.0.0.1:50052" # overridden on the CLI below

[connection]
reconnect_interval_seconds = 5
heartbeat_timeout_seconds = 30

[logging]
level = "info"
format = "json"
tip

~/zisk-cluster/ is just a convention used by this page, you can drop the TOMLs anywhere and point --config at whatever path you choose.


Configure the coordinator

The table below explains everything zisk-coordinator reads:

TablePurpose
[service]Identity used in log labels. environment is a free-form tag (development / staging / production).
[server]The public gRPC API where clients submit jobs. shutdown_timeout_seconds is the drain window applied on SIGTERM.
[coordinator]The cluster gRPC port workers dial to register and stream segments.
[metrics]GET /metrics returns Prometheus text; GET /health returns 200 OK while alive.
[logging]pretty for human-friendly stdout; json for log aggregators (Loki, Datadog).
[backend]Internal — must stay coordinator.

Edit it freely; service name, log format / level, environment label, bind addresses, ports. The launch command in this guide o verrides the three ports back to canonical defaults (7000 / 50051 / 9090) regardless of what's in the TOML.

Override precedence

Settings resolve in four layers, later wins: built-in defaults → TOML → env vars (ZISK_COORDINATOR_*, RUST_LOG) → CLI flags. Full reference on zisk-coordinator.


Configure the worker

The supplied worker.toml points the worker at coordinator on port 50052. The four tables it reads:

TablePurpose
[worker]Identity + capacity. compute_units is an abstract weight the coordinator uses to assign segments.
[coordinator]Where the worker dials.
[connection]How the worker reacts to network blips — reconnect backoff and heartbeat deadline.
[logging]Same shape as the coordinator's.

There's intentionally no [backend] table — backend selection (Assembly emulator, Rust emulator, GPU, Plonk) is done on the command line. See zisk-worker for every flag.

Edit it freely, pin a stable worker_id, raise or lower compute_units, switch the log format, tune the reconnect/heartbeat windows. The --coordinator-url override in the launch command bin this guide will point the worker at the canonical:50051 regardless of what's in the TOML.

Override precedence

Settings resolve in four layers, later wins: built-in defaults → TOML → env vars (ZISK_WORKER_*, RUST_LOG) → CLI flags. Full reference on zisk-worker.


Start the coordinator

In your first terminal, launch the coordinator in the foreground. The --*-port flags override the Docker-tuned ports back to canonical defaults:

zisk-coordinator \
--config ~/zisk-cluster/coordinator.toml \
--api-port 7000 \
--cluster-port 50051 \
--metrics-port 9090

You should see three "listening on" lines for 0.0.0.0:7000, 0.0.0.0:50051, and 0.0.0.0:9090. Sanity-check from any other shell:

curl -i http://127.0.0.1:9090/health # → HTTP/1.1 200 OK

Leave the terminal open.

Port already in use

Override only the conflicting port (e.g. --metrics-port 9095). Keep the others on the defaults so the worker and smoke-test client below find them.


Start the worker

In your second terminal, launch the worker, pointing it at the coordinator on the canonical :50051 you just bound:

zisk-worker \
--config ~/zisk-cluster/worker.toml \
--coordinator-url http://127.0.0.1:50051

The worker resolves the proving key (default ~/.zisk/provingKey), opens a bidirectional WorkerStream to 127.0.0.1:50051, registers, and starts heartbeating. The coordinator's log shows:

INFO worker registered: <uuid> capacity=10

You now have a one-coordinator, one-worker cluster running locally.

A handful of flags are worth knowing without flipping to the full reference:

FlagPurpose
--gpuTurn on GPU proving. The single biggest performance lever — proving throughput typically scales several-fold versus the CPU/ASM path.
--worker-idPin a stable id for log correlation instead of the random UUID.
--compute-capacityRaise / lower the advertised weight without editing the TOML.
--proving-keyUse a non-default proving-key location.
Reach for GPU first if you care about throughput

On a CUDA-capable host, --gpu is the biggest single performance unlock the worker offers. Add --max-streams N (start with one stream per ~8 GB of GPU memory) and raise --compute-capacity accordingly so the coordinator assigns it proportionally more segments. See zisk-worker / Performance tuning.


Submit a smoke-test job

The repo ships a gcd example whose remote-host binary submits a real proving job over the public gRPC API. Open a third terminal and pull the repo if you haven't already, then run the example:

# Skip the clone if you already have one (e.g. from the source-build install)
git clone https://github.com/0xPolygonHermez/zisk.git
cd zisk/examples/gcd/host
cargo run --release --bin remote-host

The binary builds a ProverClient::remote("http://localhost:7000") internally — the same code path your own host applications use.

While it runs:

  • The coordinator log shows the job lifecycle (Queued → Running → Completed), segment assignments, and the promoted-aggregator pick.
  • The worker log shows witness generation, partial proofs, and a job complete message.

When remote-host exits cleanly, the proof is on disk.


Stop the cluster

Each binary runs in the foreground, so Ctrl-C (or SIGTERM) in the terminal you launched it from triggers a graceful shutdown. The coordinator stops accepting new jobs and waits up to [server].shutdown_timeout_seconds (default 30s) for in-flight jobs to drain — anything still running when that window closes surfaces as Failed on the client side. The worker stops accepting new assignments but lets the segments it's currently proving finish before exiting, so the coordinator marks it offline only once its heartbeats stop.