Monitoring
A running ZisK cluster exposes two observability surfaces: a Prometheus `/metrics` endpoint on the coordinator and structured logs from every binary. This section walks through both, shows the scrape config and dashboards shipped with the repo, and ends with a runbook for the failure modes operators hit most often.
Observability
Metrics and alerts
Scrape the coordinator's Prometheus endpoint, walk the full metric catalogue, load the bundled Grafana dashboard, and bootstrap alerting rules from a starter set of PromQL queries.
Read more →Observability
Logs
Read coordinator and worker logs on every deployment shape (systemd, launchd, Docker Compose, Kubernetes), switch to JSON for production aggregation, and filter by level, phase, or job ID.
Read more →Runbook
Troubleshooting
Concrete diagnoses and fixes for the failure modes operators hit most: stuck workers, port conflicts, phase timeouts, heartbeat drops, mismatched proving keys, and coordinator restart loss.
Read more →