Skip to main content

Logs

Where metrics tell you the cluster is unhappy, logs tell you why a specific job failed. The coordinator and worker both emit tracing logs to stdout/stderr and your deployment routes them into whatever log surface the host uses. This page walks through log levels and formats, the three ways to configure them, where the logs land on each supported deployment, and the filters worth keeping in muscle memory.

Log levels

Both binaries use the standard five-level tracing hierarchy. Each level includes everything noisier than itself, so picking debug includes info, warn, and error.

LevelWhen to use
errorProduction-quiet operation. Only failures.
warnProduction default for noisy environments.
infoRecommended production default. One line per significant lifecycle event.
debugActive investigation. Phase transitions, worker assignments, heartbeats.
traceDeep debugging. Very verbose; can noticeably slow hot proving paths.
warning

trace and debug levels on a worker generate a lot of output per proving segment. On a busy cluster this can slow proving and overwhelm the log aggregator. Raise temporarily, drop back to info as soon as you have what you need.


Log formats

The same logs can be rendered three ways:

FormatBest forNotes
prettyTerminals during developmentDefault. Colored, multi-line, human-readable.
compactLocal non-TTY pipesSingle line per event, no color.
jsonProduction aggregationEvery field stays structured; aggregators index without regex.

JSON is the right choice in production because it lets your aggregator query by job ID, level, or any other emitted field without a fragile regex pass.


Configuring logging

Logging is controlled in three places, in precedence order from lowest to highest:

  1. The [logging] section in coordinator.toml or worker.toml. This is the persistent default.
  2. RUST_LOG environment variable. Standard tracing-subscriber env filter syntax; overrides the TOML level per crate.
  3. The --log-level CLI flag on the coordinator. Overrides both of the above.

Persistent config

Put this block in both coordinator.toml and worker.toml before promoting them to production:

coordinator.toml / worker.toml
[logging]
level = "info"
format = "json"
file_path = "" # empty = stdout/stderr

Temporary override with RUST_LOG

RUST_LOG follows the standard tracing-subscriber env-filter syntax. The most useful pattern is "everything at info, this one crate at debug":

bash
sudo systemctl stop zisk-coordinator
sudo RUST_LOG="info,zisk_coordinator=debug" systemctl start zisk-coordinator

For a one-shot foreground run during an investigation:

bash
RUST_LOG="info,zisk_worker=debug" \
./target/release/zisk-worker \
--config /etc/zisk/worker.toml

CLI flag (coordinator only)

bash
zisk-coordinator --config /etc/zisk/coordinator.toml --log-level debug
tip

Always restore info after an incident is resolved. Forgetting to drop the level is the single most common cause of unexpected log-storage bills and degraded worker throughput.


Where the logs land

The binaries write to stdout/stderr; each deployment path inherits its host's log routing. Use the table to pick the right tail command.

Linux with systemd

The bare-metal install script registers both binaries as systemd units, so logs flow into journald:

coordinator host
sudo journalctl -u zisk-coordinator -f
worker host
sudo journalctl -u zisk-worker -f

macOS with launchd

The macOS install path registers launchd plists and pipes logs to flat files under /var/log/, rotated by newsyslog at 100 MB with 5 rotations kept:

bash
tail -f /var/log/zisk-coordinator/zisk-coordinator.log
tail -f /var/log/zisk-worker/zisk-worker.log

Docker Compose

Logs go to the Docker daemon. Use the Compose subcommand to follow them with service names rather than container IDs:

bash
docker compose logs -f coordinator
docker compose logs -f worker

To follow both at once interleaved:

bash
docker compose logs -f coordinator worker

Kubernetes

The Helm chart ships only the worker; the coordinator is deployed separately. Tail across all worker pods at once with a label selector:

bash
kubectl logs -n zisk -l app.kubernetes.io/name=zisk-worker -f

For a single worker pod or to inspect the previous container after a crash loop:

bash
kubectl logs -n zisk <pod-name> -f
kubectl logs -n zisk <pod-name> --previous

Useful filters

journald and Docker logs both filter without external tools. The recipes below are the ones to remember.

Filter by level

systemd
# Errors only on the coordinator
sudo journalctl -u zisk-coordinator -p err -f

# Warnings and errors on a worker
sudo journalctl -u zisk-worker -p warn -f

-p err and -p warn accept standard syslog priorities and work for any systemd-managed service on the host.

On Docker Compose, filter by piping through grep (or, with JSON format, jq):

docker compose
docker compose logs --since 10m coordinator | grep -E 'ERROR|WARN'

Filter by phase

The coordinator emits phase= fields on every relevant log line during a job. With JSON logs, the cleanest filter uses jq:

bash
sudo journalctl -u zisk-coordinator -o cat \
| jq 'select(.fields.phase == "Prove")'

Without JSON, fall back to plain grep:

bash
sudo journalctl -u zisk-coordinator | grep 'phase=Prove'

The three phase names emitted by the coordinator are Contributions, Prove, and Aggregate, matching the job state machine in the cluster API.

Filter by job ID

Production investigations usually start from a job ID, not a log line. The recipe is the same on every deployment: grep the coordinator first, then follow the worker IDs it logged into the worker hosts.

coordinator host
sudo journalctl -u zisk-coordinator | grep <job-id>

The output names which workers received which segments. Switch to the worker host and grep again on that line's segment or job ID:

worker host
sudo journalctl -u zisk-worker | grep <job-id>

With JSON logs flowing into an aggregator, the same recipe collapses to a single query like job_id="<job-id>", returning the end-to-end trail across every host in one view. This is the practical reason to insist on JSON in production; without it, correlation is host-by-host.

note

Job IDs are minted by the coordinator and returned in the JobRequest response. Log the returned ID on the client side alongside whatever business context (request ID, user ID, batch ID) is meaningful in your environment — the cluster does not know that context.


Next steps

Continue to Troubleshooting for the runbook that maps the most common symptoms surfaced by metrics and logs to their concrete fixes.