Configuration
The full coordinator.toml schema for zisk-coordinator, documenting every table ([service], [server], [coordinator], [metrics], [logging], [backend]) and its settings and defaults, including the coordinator-core tuning file.
The TOML file describes zisk-coordinator's long-lived
configuration. Every table is optional; if you omit a key, the
built-in default applies. CLI flags and ZISK_COORDINATOR_*
environment variables override values set here. See
CLI & environment for the
full flag list.
[service]
name = "ZisK Coordinator"
environment = "production" # development | staging | production
[server]
host = "0.0.0.0"
port = 7000 # client-facing gRPC
shutdown_timeout_seconds = 30
[coordinator]
port = 50051 # worker-facing gRPC
[metrics]
enabled = true
host = "0.0.0.0"
port = 9090 # Prometheus + /health
[logging]
level = "info" # trace | debug | info | warn | error
format = "pretty" # pretty | json | compact
[backend]
mode = "coordinator"
[service]
Coordinator identity. Used for log labels and status output; no behavioural effect on the proving pipeline.
| Setting | Default | Notes |
|---|---|---|
name | "ZisK Coordinator" | Shown in logs and status output. |
environment | development | One of development, staging, production. |
[server]
The public API plane. Clients talk to this port to submit jobs and retrieve proofs.
| Setting | Default | Notes |
|---|---|---|
host | 0.0.0.0 | Bind address. Bind to a specific interface to restrict access. |
port | 7000 | Client gRPC port. CLI: --api-port, env: ZISK_COORDINATOR_API_PORT. |
shutdown_timeout_seconds | 30 | Drain window after SIGTERM. Jobs that don't complete in this window are abandoned at forced exit. |
[coordinator]
The cluster plane. Workers dial this port to register and stream their assignments and proofs.
| Setting | Default | Notes |
|---|---|---|
port | 50051 | Worker gRPC port. CLI: --cluster-port, env: ZISK_COORDINATOR_CLUSTER_PORT. |
config_file | (unset) | Path to a coordinator-core tuning file (worker pool size, phase timeouts, heartbeat intervals, job TTL, webhook URL). |
Coordinator-core tuning
The config_file exposes internals the main TOML doesn't: how
many workers can pile onto a single proof job, how long each
proving phase is allowed to run, and how quickly a disconnected
worker fails. The upstream production baseline lives at
distributed/crates/coordinator/config/prod.toml
and lays these settings out under a [coordinator] table:
[coordinator]
shutdown_timeout_seconds = 30
max_workers_per_job = 20
max_total_workers = 5000
phase1_timeout_seconds = 600 # 10 min
phase2_timeout_seconds = 1200 # 20 min
reconnect_grace_period_ms = 500
| Setting | Default | Notes |
|---|---|---|
shutdown_timeout_seconds | 30 | Graceful drain window on SIGTERM for the cluster plane. Distinct from [server].shutdown_timeout_seconds, which applies to the public API plane. |
max_workers_per_job | 20 | Maximum workers assigned to a single proof job. Lower it to spread workers across more parallel jobs; raise it for fewer, larger jobs. |
max_total_workers | 5000 | Cap on registered workers in the cluster. The coordinator refuses new registrations past this limit. |
phase1_timeout_seconds | 600 | Maximum duration of Phase 1 (Partial contributions). Workers that don't return their partial challenges in this window fail the job. |
phase2_timeout_seconds | 1200 | Maximum duration of Phase 2 (Prove). Workers that don't return their partial proofs in this window fail the job. |
reconnect_grace_period_ms | 500 | Window after a worker's heartbeat stops before its in-flight assignment is failed and reassigned. Larger values absorb transient network blips; smaller values fail fast. |
Point [coordinator].config_file at this TOML to load the
tuning:
[coordinator]
port = 50051
config_file = "/etc/zisk/coordinator-core.toml"
[metrics]
A plain HTTP server bound alongside the gRPC ports.
GET /metrics returns a Prometheus text payload (cluster /
worker / job counters); GET /health returns 200 OK while the
binary is alive and is the canonical liveness probe.
| Setting | Default | Notes |
|---|---|---|
enabled | true | Set false to disable /metrics. /health stays available either way. |
host | 0.0.0.0 | Listen address for the scrape endpoint. |
port | 9090 | Scrape port. CLI: --metrics-port, env: ZISK_COORDINATOR_METRICS_PORT. |
Never expose the metrics port to the public Internet. The
/metrics endpoint leaks operational detail and the binary
performs no authentication on it.
[logging]
| Setting | Default | Notes |
|---|---|---|
level | info | trace, debug, info, warn, error. RUST_LOG takes precedence. |
format | pretty | pretty, json, compact. Use json for log aggregators (Loki, Datadog). |
file_path | (unset) | Rotating daily log file. Leave unset on systemd hosts; journald captures stdout. |
[backend]
Determines the role the binary plays at runtime. The coordinator
must run in coordinator mode.
| Setting | Default | Notes |
|---|---|---|
mode | coordinator | Must be coordinator for zisk-coordinator. |