Deploy with Docker

A step-by-step guideline for the containerised prover stack. A single docker compose up brings the whole cluster online: one coordinator, N workers (scaled at the command line), Prometheus scraping the coordinator, and Grafana with the ZisK dashboard pre-provisioned. By the end of this page you'll have the full stack running and a real gcd proof on disk.

When to pick this

Pick this path when you want the cluster bundled as containers rather than as host services or hand-launched binaries. Right shape for:

A demo or CI smoke test that needs to tear down cleanly.
A single-host staging environment with Prometheus + Grafana attached out of the box.
Shipping the prover as part of a wider docker-compose stack.

For a 24/7 production prover under systemd or launchd, see Deploy as services; for a developer machine where you just want the binaries on PATH, see Deploy the binaries directly.

Prerequisites

Requirement	Notes
Linux or macOS	Tested on Ubuntu 22.04+ and macOS 13+ (Apple Silicon or Intel).
Docker Engine 24+	Must include the Compose plugin v2 (`docker compose version` should print `v2.x`).
~32 GB RAM	Each worker container preallocates the Assembly emulator's shared regions; the host needs enough headroom.
ZisK bundle on host (`/opt/zisk/`)	Worker containers mount the toolchain + proving key read-only from this path. `ziskup` is the easiest way to populate it.
Internet access	First `docker compose build` pulls base images, Rust toolchain, and crates from the network.
Open ports `7001`, `9090`, `9091`, `3000`	Coordinator public API, Prometheus, coordinator metrics, and Grafana on the host side (mapped from containers).

System dependencies (apt / brew packages used to populate /opt/zisk/) live in the developer install guides: Linux, macOS.

Install Docker

Pick your OS — Linux installs Docker Engine directly; macOS uses Docker Desktop, which also includes the Compose plugin.

1. Install Docker Engine. The upstream convenience script is the fastest path on Ubuntu / Debian:

curl -fsSL https://get.docker.com | sudo sh

For other distros or hardened environments, follow the official Docker Engine install docs.

2. Allow your user to run docker without sudo (optional but recommended):

sudo usermod -aG docker $USER
newgrp docker            # apply the group change without logging out

3. Install the Docker Compose v2 plugin. Drop the latest release into Docker's CLI-plugins directory so docker compose becomes available as a subcommand:

mkdir -p ~/.docker/cli-plugins
curl -SL https://github.com/docker/compose/releases/latest/download/docker-compose-linux-x86_64 \
    -o ~/.docker/cli-plugins/docker-compose
chmod +x ~/.docker/cli-plugins/docker-compose

4. Verify.

docker version
docker compose version    # must print "Docker Compose version v2.x"

Install ZisK on the host

The worker containers mount the toolchain + proving-key bundle read-only from the host. You need ziskup to run on the host once so /opt/zisk/ exists with the proving key inside. Follow the developer install guides:

At the ziskup prompt, pick option 1 (Install proving key). On macOS the bundle lands at /Library/Application Support/ZisK/ — set ZISK_HOME to that path before docker compose so the worker containers find it (the compose file falls back to /opt/zisk when the env var is unset).

ZisK also needs cargo-zisk on the smoke-test host

The gcd host program in the smoke-test step further down is built with cargo run and links against the ZisK toolchain. The same ziskup install puts cargo-zisk on your PATH, so running ziskup on the host satisfies both the bundle-mount requirement and the smoke-test toolchain requirement at once.

Clone the repo

The compose file, the coordinator + worker Dockerfiles, and the Prometheus / Grafana provisioning all live under distributed/deploy/docker/ in the upstream repo. Clone it once and run everything from the workspace root:

git clone https://github.com/0xPolygonHermez/zisk.git
cd zisk

The relevant files:

distributed/deploy/docker/
├── compose.yaml             # the four-service stack
├── Dockerfile.coordinator   # coordinator image
└── Dockerfile.worker        # worker image (CPU/GPU via build arg)
distributed/deploy/config/
├── coordinator.toml         # mounted RO into the coordinator container
└── worker.toml              # mounted RO into each worker replica

Build the images

First-run builds compile the prover stack from source inside the images, which takes several minutes. From the workspace root:

docker compose -f distributed/deploy/docker/compose.yaml build

This builds two images: zisk-coordinator:latest and zisk-worker:latest (CPU build by default).

GPU workers

For a GPU-accelerated worker, rebuild with the GPU build arg:

docker compose -f distributed/deploy/docker/compose.yaml \
    build --build-arg GPU=true worker

The container still needs CUDA exposed via the NVIDIA Container Toolkit — install it on the host and add the relevant deploy.resources.reservations.devices block to the worker service in compose.yaml. See zisk-worker / Backend selection.

Start the stack

--scale worker=N controls how many worker replicas spawn. Four is a sane starting point on a beefy host:

docker compose -f distributed/deploy/docker/compose.yaml \
    up -d --scale worker=4

-d runs the stack detached. The stack contains:

Service	Image	Host ports	Notes
`coordinator`	`zisk-coordinator:latest`	`7001`, `9091`	Public gRPC on `:7001`; Prometheus scrape on `:9091`. Workers reach it via Docker DNS at `coordinator:50052`.
`worker`	`zisk-worker:latest`	(none)	Replicas spawned by `--scale`. Each gets an auto-generated `worker_id`. Mounts the host's `${ZISK_HOME:-/opt/zisk}` RO.
`prometheus`	`prom/prometheus:v2.51.0`	`9090`	Auto-scrapes the coordinator's `/metrics`.
`grafana`	`grafana/grafana:11.2.0`	`3000`	Anonymous Admin, datasource + dashboards auto-provisioned. Dev-only; lock down for any non-local deploy.

Watch the stack come up:

docker compose -f distributed/deploy/docker/compose.yaml logs -f

The coordinator emits worker registered: <uuid> capacity=10 lines as each worker dials in.

Verify

A /health probe from the host confirms the coordinator is reachable on the mapped port:

curl -i http://127.0.0.1:9091/health
# → HTTP/1.1 200 OK

Inspect container status:

docker compose -f distributed/deploy/docker/compose.yaml ps

You should see four services (coordinator, worker ×N, prometheus, grafana) all in the running state, with coordinator reporting healthy after the start period (it has a grpc_health_probe healthcheck baked in).

Open Grafana at http://127.0.0.1:3000 — anonymous Admin gets you in without a password, and the pre-provisioned ZisK dashboard is wired to the Prometheus datasource.

Submit a smoke-test job

The repo ships a gcd example whose remote-host binary submits a real proving job. Run it on the host (not inside a container) — it'll talk to the coordinator's mapped :7001.

The remote-host binary targets http://localhost:7000 by default. The Docker stack maps the coordinator to :7001, so edit examples/gcd/host/src/prover-clients/remote.rs and change the URL to http://localhost:7001 before running:

cd examples/gcd/host
cargo run --release --bin remote-host

The binary builds a ProverClient::remote("http://localhost:7001") internally and uploads the gcd guest ELF. While it runs, tail the coordinator and worker logs:

docker compose -f distributed/deploy/docker/compose.yaml logs -f coordinator worker

You should see the job move through Queued → Running → Completed, segment-assignment lines, witness generation, and a job complete message. When remote-host exits cleanly, the proof is on disk.

Scale the worker pool

Bump --scale worker=N up or down on the fly without touching the coordinator. Re-running up -d is idempotent — only the worker count changes:

docker compose -f distributed/deploy/docker/compose.yaml \
    up -d --scale worker=8

Each replica gets a unique auto-generated worker_id (the mounted worker.toml intentionally leaves worker_id unset). The coordinator distributes segments proportionally to the advertised compute_units (defaults to 10 per replica; edit distributed/deploy/config/worker.toml and restart to change).

Manage the stack

Day-two operations are handled by Docker Compose directly. All commands run from the workspace root, prefixed with docker compose -f distributed/deploy/docker/compose.yaml.

Inspect status

docker compose -f distributed/deploy/docker/compose.yaml ps

Tail the logs

docker compose -f distributed/deploy/docker/compose.yaml logs -f                 # all services
docker compose -f distributed/deploy/docker/compose.yaml logs -f coordinator     # one service

Restart a service

Pick up a config edit (e.g. after editing coordinator.toml) by restarting that container:

docker compose -f distributed/deploy/docker/compose.yaml restart coordinator

Stop / start the stack

docker compose -f distributed/deploy/docker/compose.yaml stop      # graceful shutdown, containers preserved
docker compose -f distributed/deploy/docker/compose.yaml start     # restart the stopped containers

stop sends SIGTERM, which triggers the binaries' graceful shutdown windows (see zisk-coordinator / Signals).

Updating

Pull the latest revs, rebuild the affected images, and re-converge:

git pull
docker compose -f distributed/deploy/docker/compose.yaml \
    build coordinator worker
docker compose -f distributed/deploy/docker/compose.yaml \
    up -d --scale worker=4

compose up -d is idempotent — containers whose image hash hasn't changed are left alone; only the ones with new images get recreated.

Tearing down

docker compose -f distributed/deploy/docker/compose.yaml down

Add -v to also remove the Grafana data volume (loses any dashboard edits you made via the UI):

docker compose -f distributed/deploy/docker/compose.yaml down -v

The built images (zisk-coordinator:latest, zisk-worker:latest) stay on disk for next time. Remove them explicitly if you want a clean slate:

docker rmi zisk-coordinator:latest zisk-worker:latest

When to pick this​

Prerequisites​

Install Docker​

Install ZisK on the host​

Clone the repo​

Build the images​

Start the stack​

Verify​

Submit a smoke-test job​

Scale the worker pool​

Manage the stack​

Inspect status​

Tail the logs​

Restart a service​

Stop / start the stack​

Updating​

Tearing down​