Skip to main content

Configuration

The full coordinator.toml schema for zisk-coordinator, documenting every table ([service], [server], [coordinator], [metrics], [logging], [backend]) and its settings and defaults, including the coordinator-core tuning file.

The TOML file describes zisk-coordinator's long-lived configuration. Every table is optional; if you omit a key, the built-in default applies. CLI flags and ZISK_COORDINATOR_* environment variables override values set here. See CLI & environment for the full flag list.

coordinator.toml
[service]
name = "ZisK Coordinator"
environment = "production" # development | staging | production

[server]
host = "0.0.0.0"
port = 7000 # client-facing gRPC
shutdown_timeout_seconds = 30

[coordinator]
port = 50051 # worker-facing gRPC

[metrics]
enabled = true
host = "0.0.0.0"
port = 9090 # Prometheus + /health

[logging]
level = "info" # trace | debug | info | warn | error
format = "pretty" # pretty | json | compact

[backend]
mode = "coordinator"

[service]

Coordinator identity. Used for log labels and status output; no behavioural effect on the proving pipeline.

SettingDefaultNotes
name"ZisK Coordinator"Shown in logs and status output.
environmentdevelopmentOne of development, staging, production.

[server]

The public API plane. Clients talk to this port to submit jobs and retrieve proofs.

SettingDefaultNotes
host0.0.0.0Bind address. Bind to a specific interface to restrict access.
port7000Client gRPC port. CLI: --api-port, env: ZISK_COORDINATOR_API_PORT.
shutdown_timeout_seconds30Drain window after SIGTERM. Jobs that don't complete in this window are abandoned at forced exit.

[coordinator]

The cluster plane. Workers dial this port to register and stream their assignments and proofs.

SettingDefaultNotes
port50051Worker gRPC port. CLI: --cluster-port, env: ZISK_COORDINATOR_CLUSTER_PORT.
config_file(unset)Path to a coordinator-core tuning file (worker pool size, phase timeouts, heartbeat intervals, job TTL, webhook URL).

Coordinator-core tuning

The config_file exposes internals the main TOML doesn't: how many workers can pile onto a single proof job, how long each proving phase is allowed to run, and how quickly a disconnected worker fails. The upstream production baseline lives at distributed/crates/coordinator/config/prod.toml and lays these settings out under a [coordinator] table:

coordinator-core.toml
[coordinator]
shutdown_timeout_seconds = 30
max_workers_per_job = 20
max_total_workers = 5000
phase1_timeout_seconds = 600 # 10 min
phase2_timeout_seconds = 1200 # 20 min
reconnect_grace_period_ms = 500
SettingDefaultNotes
shutdown_timeout_seconds30Graceful drain window on SIGTERM for the cluster plane. Distinct from [server].shutdown_timeout_seconds, which applies to the public API plane.
max_workers_per_job20Maximum workers assigned to a single proof job. Lower it to spread workers across more parallel jobs; raise it for fewer, larger jobs.
max_total_workers5000Cap on registered workers in the cluster. The coordinator refuses new registrations past this limit.
phase1_timeout_seconds600Maximum duration of Phase 1 (Partial contributions). Workers that don't return their partial challenges in this window fail the job.
phase2_timeout_seconds1200Maximum duration of Phase 2 (Prove). Workers that don't return their partial proofs in this window fail the job.
reconnect_grace_period_ms500Window after a worker's heartbeat stops before its in-flight assignment is failed and reassigned. Larger values absorb transient network blips; smaller values fail fast.

Point [coordinator].config_file at this TOML to load the tuning:

coordinator.toml
[coordinator]
port = 50051
config_file = "/etc/zisk/coordinator-core.toml"

[metrics]

A plain HTTP server bound alongside the gRPC ports. GET /metrics returns a Prometheus text payload (cluster / worker / job counters); GET /health returns 200 OK while the binary is alive and is the canonical liveness probe.

SettingDefaultNotes
enabledtrueSet false to disable /metrics. /health stays available either way.
host0.0.0.0Listen address for the scrape endpoint.
port9090Scrape port. CLI: --metrics-port, env: ZISK_COORDINATOR_METRICS_PORT.
warning

Never expose the metrics port to the public Internet. The /metrics endpoint leaks operational detail and the binary performs no authentication on it.


[logging]

SettingDefaultNotes
levelinfotrace, debug, info, warn, error. RUST_LOG takes precedence.
formatprettypretty, json, compact. Use json for log aggregators (Loki, Datadog).
file_path(unset)Rotating daily log file. Leave unset on systemd hosts; journald captures stdout.

[backend]

Determines the role the binary plays at runtime. The coordinator must run in coordinator mode.

SettingDefaultNotes
modecoordinatorMust be coordinator for zisk-coordinator.