Introduction
ZisK is a high-performance zkVM (Zero-Knowledge Virtual Machine) designed to generate zero-knowledge proofs of arbitrary program execution. It enables developers to prove the correctness of a computation without revealing its internal state, making ZisK a powerful tool for privacy-preserving and verifiable computation.
Proving systems traditionally involve complex cryptographic operations that require deep expertise and significant computational resources. ZisK abstracts these complexities by providing an optimized toolstack that minimizes computational overhead, making ZK technology accessible to a broader range of developers. With Rust-based execution and planned multi-language support, ZisK is designed to be developer-friendly while maintaining high performance and robust security.
Why ZisK?
- High-performance architecture optimized for low-latency proof generation.
- Rust-based zkVM, with future support for additional languages.
- No recompilation required across different programs.
- Standardized prover interface (JSON-RPC, GRPC, CLI).
- Flexible integration: usable as a standalone service or as a library.
- Decentralized architecture for trustless proof generation.
- Optimized proof generation costs for real-world applications.
- Fully open-source and backed by Polygon zkEVM and Plonky3 technology.
Installation Guide
ZisK can be installed from prebuilt binaries (recommended) or by building the ZisK tools, toolchain and setup files from source.
System Requirements
ZisK currently supports Linux x86_64 and macOS platforms (see note below).
Note: On macOS, proof generation is not yet optimized, so some proofs may take longer to generate.
Required Tools
Ensure the following tools are installed:
- Rust
- Git
- To enable GPU support in ZisK, you must have NVIDIA Driver version 525.60.13 or later installed.
- If you use
zisk-sdkcrate, you must also have CUDA Toolkit version 12.9 or later installed.
Installing Dependencies
Ubuntu
Ubuntu 22.04 or higher is required.
Install all required dependencies with:
sudo apt-get install -y xz-utils jq curl build-essential qemu-system libomp-dev libgmp-dev nlohmann-json3-dev protobuf-compiler uuid-dev libgrpc++-dev libsecp256k1-dev libsodium-dev libpqxx-dev nasm libopenmpi-dev openmpi-bin openmpi-common libclang-dev clang gcc-riscv64-unknown-elf
ZisK uses shared memory to exchange data between processes. The system must be configured to allow enough locked memory per process:
$ ulimit -l
unlimited
A way to achieve it is to edit the file /etc/systemd/system.conf and add the line DefaultLimitMEMLOCK=infinity. Reboot for changes to take effect.
macOS
macOS 14 or higher is required.
You must have Homebrew and Xcode installed.
Install all required dependencies with:
brew reinstall jq curl libomp protobuf openssl nasm pkgconf open-mpi libffi nlohmann-json libsodium riscv-tools
Installing ZisK
Option 1: Prebuilt Binaries (Recommended)
-
To install ZisK using ziskup, run the following command in your terminal:
curl https://raw.githubusercontent.com/0xPolygonHermez/zisk/main/ziskup/install.sh | bash -
During installation, ziskup will detect whether CUDA is available on your machine. If so, it will install ZisK binaries with GPU support. Otherwise, you will be prompted to choose between CPU binaries (default) or GPU binaries.
-
Also during the installation, you will be prompted to select a setup option. You can choose from the following:
- Install proving key (default) – Required for generating and verifying proofs.
- Install proving key (no constant tree files) – Install proving key but without constant tree files generation.
- Install verify key – Needed only if you want to verify proofs.
- None – Choose this if you only want to compile programs and execute them using the ZisK emulator.
-
Verify the Rust toolchain: (which includes support for the
riscv64ima-zisk-zkvmcompilation target):rustup toolchain listThe output should include an entry for
zisk, similar to this:stable-x86_64-unknown-linux-gnu (default) nightly-x86_64-unknown-linux-gnu zisk -
Verify the
cargo-ziskCLI tool:cargo-zisk --versionIt should show
cargo-zisk X.X.X [gpu]if the GPU version is installed, orcargo-zisk X.X.X [cpu]otherwise
Updating ZisK
To update ZisK to the latest version, simply run:
bash ziskup
You can use the flags --provingkey, --verifykey or --nokey to specify the installation setup and skip the selection prompt.
To install the PLONK proving key (provingKeySnark), run:
bash ziskup setup_snark
Option 2: Building from Source
Build ZisK
-
Clone the ZisK repository:
git clone https://github.com/0xPolygonHermez/zisk.git cd zisk -
Build ZisK tools:
cargo build --releaseNote: The build process will automatically detect whether CUDA is available on your machine. If so, it will build the GPU-enabled binaries; otherwise, it will build the CPU version. To force the CPU version, use the
--features cpu-onlyflag.Note: By default, the build process auto-detects the GPU architecture of the host machine. Use the
CUDA_ARCHSenvironment variable to control which architectures are compiled:# Single architecture (faster build — e.g. Ada Lovelace sm_89 / RTX 4090) CUDA_ARCHS="89" cargo build --release # Multiple architectures (e.g. Ada + Hopper) CUDA_ARCHS="89,90" cargo build --release # All major architectures — portable binary for distribution # (sm_80, sm_86, sm_89, sm_90, sm_100, sm_120 + PTX forward compatibility) # Note: this takes significantly longer to compile CUDA_ARCHS="major" cargo build --release -
Copy the tools to
~/.zisk/bindirectory:mkdir -p $HOME/.zisk/bin cp target/release/cargo-zisk target/release/ziskemu target/release/riscv2zisk target/release/zisk-coordinator target/release/zisk-worker target/release/libziskclib.a $HOME/.zisk/bin -
Copy required files for assembly rom setup:
Note: This is only needed on Linux x86_64, since assembly execution is not supported on macOS
mkdir -p $HOME/.zisk/zisk/emulator-asm cp -r ./emulator-asm/src $HOME/.zisk/zisk/emulator-asm cp ./emulator-asm/Makefile $HOME/.zisk/zisk/emulator-asm cp -r ./lib-c $HOME/.zisk/zisk -
Add
~/.zisk/binto your system PATH:If you are using
bashorzsh:PROFILE=$([[ "$(uname)" == "Darwin" ]] && echo ".zshenv" || echo ".bashrc") echo >>$HOME/$PROFILE && echo "export PATH=\"\$PATH:$HOME/.zisk/bin\"" >> $HOME/$PROFILE source $HOME/$PROFILE -
Install the ZisK Rust toolchain:
cargo-zisk toolchain installNote: This command installs the ZisK Rust toolchain from prebuilt binaries. If you prefer to build the toolchain from source, follow these steps:
-
Ensure all dependencies required to build the Rust toolchain from source are installed.
-
Build and install the Rust ZisK toolchain:
cargo-zisk toolchain build -
-
Verify the installation:
rustup toolchain listConfirm that
ziskappears in the list of installed toolchains. -
Verify the
cargo-ziskCLI tool:cargo-zisk --versionIt should show
cargo-zisk X.X.X [gpu]if the GPU version is built, orcargo-zisk X.X.X [cpu]otherwise.
Build Setup
Please note that the process can be long, taking approximately 45-60 minutes depending on the machine used.
NodeJS version 20.x or higher is required to build the setup files.
-
Clone the following repositories in the parent folder of the
ziskfolder created in the previous section:git clone https://github.com/0xPolygonHermez/pil2-compiler.git git clone https://github.com/0xPolygonHermez/pil2-proofman.git git clone https://github.com/0xPolygonHermez/pil2-proofman-js -
Install packages:
(cd pil2-compiler && npm i) (cd pil2-proofman-js && npm i) -
All subsequent commands must be executed from the
ziskfolder created in the previous section:cd zisk -
Generate fixed data:
cargo run --release --bin arith_frops_fixed_gen cargo run --release --bin binary_basic_frops_fixed_gen cargo run --release --bin binary_extension_frops_fixed_gen -
Compile ZisK PIL:
node --max-old-space-size=16384 ../pil2-compiler/src/pil.js pil/zisk.pil -I pil,../pil2-proofman/pil2-components/lib/std/pil,state-machines,precompiles -o pil/zisk.pilout -u tmp/fixed -O fixed-to-fileThis command will create the
pil/zisk.piloutfile -
Generate setup data: (this step may take 30-45 minutes):
node --max-old-space-size=16384 --stack-size=8192 ../pil2-proofman-js/src/main_setup.js -a ./pil/zisk.pilout -b build -t ../pil2-proofman/pil2-components/lib/std/pil -u tmp/fixed -r -s ./state-machines/starkstructs.jsonThis command generates the
build/provingKeydirectory.Additionally, to generate the snark wrapper:
node ../pil2-proofman-js/src/main_setup_snark.js -b build -t ../pil2-proofman/pil2-components/lib/std/pil -f -w ../powersOfTau28_hez_final_27.ptau -p ./state-machines/publics.json -n plonkIt is stored under the
build/provingKeySnarkdirectory. -
Copy (or move) the
build/provingKeydirectory to$HOME/.ziskdirectory:cp -R build/provingKey $HOME/.zisk
Uninstall Zisk
-
Uninstall ZisK toolchain:
rustup uninstall zisk -
Delete ZisK folder
rm -rf $HOME/.zisk
Quickstart
In this guide, you will learn how to install ZisK, create a simple program and run it using ZisK.
Installation
ZisK currently supports Linux x86_64 and macOS platforms (see note below).
Note: On macOS, proof generation is not yet optimized, so some proofs may take longer to generate.
Ubuntu 22.04 or higher is required.
macOS 14 or higher with Xcode installed is required.
-
Make sure you have Rust installed.
-
Install all required dependencies with:
- Ubuntu:
sudo apt-get install -y xz-utils jq curl build-essential qemu-system libomp-dev libgmp-dev nlohmann-json3-dev protobuf-compiler uuid-dev libgrpc++-dev libsecp256k1-dev libsodium-dev libpqxx-dev nasm libopenmpi-dev openmpi-bin openmpi-common libclang-dev clang gcc-riscv64-unknown-elf - macOS:
brew reinstall jq curl libomp protobuf openssl nasm pkgconf open-mpi libffi nlohmann-json libsodium
- Ubuntu:
-
To install ZisK using ziskup, run the following command in your terminal:
curl https://raw.githubusercontent.com/0xPolygonHermez/zisk/main/ziskup/install.sh | bash
Create a Project
The first step is to generate a new example project using the cargo-zisk new <name> command. This command creates a new directory named <name> in your current directory. For example:
cargo-zisk new sha_hasher
cd sha_hasher
This will create a project with the following structure:
.
├── common
| ├── src
| | └── main.rs
| └── Cargo.toml
├── guest
| ├── src
| | └── main.rs
| └── Cargo.toml
├── host
| ├── src
| | └── main.rs
| ├── bin
| | ├── execute.rs
| | ├── minimal.rs
| | ├── prove.rs
| | ├── plonk.rs
| | └── run.rs
| ├── Cargo.toml
| └── build.rs
└── Cargo.toml
The example program takes a number n as input and computes the SHA-256 hash n times.
Build
The next step is to build the program to generate an ELF file (RISC-V), which will be used later to generate the proof. Execute:
cargo build --release
This command builds the program using the zkvm target. The resulting sha_hasher ELF file (without extension) is generated in the ./target/elf/riscv64ima-zisk-zkvm-elf/release directory.
Execute
Before generating a proof, you can test the program using the ZisK emulator to ensure its correctness:
cargo run --release --bin execute
The emulator will execute the program and display the public outputs:
Public outputs:
Hash: 0x36c1cb4f826ae42ceba848227e0c5f786178ca9dceca6772e5d728d09c30a2f6
Iterations: 1000
Magic number: 0xdeadbeef
These outputs should match the native execution, confirming the program works correctly.
Prove
To generate a cryptographic proof of execution, run:
mkdir tmp
cargo run --release --bin prove
This will:
- Execute the program and generate the execution trace
- Compute witness values for all state machines
- Generate the polynomial commitments
- Create the zk-STARK proof
The proof will be saved in the ./tmp directory. This process may take several minutes depending on the program complexity.
Compressed Proof (Optional)
After generating the proof, you can optionally create a compressed version to reduce the proof size:
cargo run --release --bin minimal
This generates an additional compressed proof on top of the existing one using recursive composition. The compressed proof is significantly smaller while maintaining the same security guarantees.
Writing Programs
This document explains how to write or modify a Rust program for execution in ZisK.
Setup
Code changes
Writing a Rust program for ZisK is similar to writing a standard Rust program, with a few minor modifications. Follow these steps:
-
Modify
main.rsfile:Add the following code to mark the main function as the entry point for ZisK:
#![allow(unused)] #![no_main] fn main() { ziskos::entrypoint!(main); } -
Modify
Cargo.tomlfile:Add the
ziskoscrate as a dependency:[dependencies] ziskos = { git = "https://github.com/0xPolygonHermez/zisk.git" }
Let's show these changes using the example program from the Quickstart section.
Example program
main.rs:
// This example program takes a number `n` as input and computes the SHA-256 hash `n` times sequentially. // Mark the main function as the entry point for ZisK #![no_main] ziskos::entrypoint!(main); use alloy_sol_types::SolValue; use common::Output; use sha2::{Digest, Sha256}; fn main() { // Read the input data let n: u32 = ziskos::io::read(); let mut hash = [0u8; 32]; // Compute SHA-256 hashing 'n' times for _ in 0..n { let mut hasher = Sha256::new(); hasher.update(hash); let digest = &hasher.finalize(); hash = Into::<[u8; 32]>::into(*digest); } let output = Output { hash: hash.into(), iterations: n, magic_number: 0xDEADBEEF, }; println!("Computed hash: {:02x?}", output.hash); println!("Iterations: {}", output.iterations); let bytes = output.abi_encode(); println!("Bytes to commit: {:?}", bytes); // Write raw ABI-encoded bytes directly (no bincode serialization) ziskos::io::commit_slice(&bytes); }
Cargo.toml:
[package]
name = "guest"
version = "0.1.0"
edition = "2024"
[dependencies]
byteorder = "1.5.0"
sha2 = "0.10.8"
serde = { version = "1.0", default-features = false, features = ["derive"] }
ziskos = { workspace = true }
alloy-sol-types = "1.5.7"
common = { path = "../common" }
Input/Output Data
To read input data in your ZisK program, use the ziskos::io::read() function, which deserializes data from the input:
#![allow(unused)] fn main() { // Read a u32 value from input let n: u32 = ziskos::io::read(); }
You can also read custom types that implement the Deserialize trait:
#![allow(unused)] fn main() { // Read a custom struct from input let my_data: MyStruct = ziskos::io::read(); }
To write public output data, use the ziskos::io::commit_slice() function, which commits a slice to the output:
#![allow(unused)] fn main() { let bytes = output.abi_encode(); println!("Bytes to commit: {:?}", bytes); // Write raw ABI-encoded bytes directly (no bincode serialization) ziskos::io::commit_slice(&bytes); }
You can also use commit() function to output any type that implements the Serialize trait. The data will be serialized and made available as public outputs that can be verified by anyone checking the proof.
Build
Before compiling your program for ZisK, you can test it on the native architecture just like any regular Rust program using the cargo command.
Once your program is ready to run on ZisK, compile it into an ELF file (RISC-V architecture), using the cargo-zisk CLI tool from the guest project folder:
cargo-zisk build
This command compiles the program using the zisk target. The resulting guest ELF file (without extension) is generated in the ./target/elf/riscv64ima-zisk-zkvm-elf/debug directory.
For production, compile the ELF file with the --release flag, similar to how you compile Rust projects:
cargo-zisk build --release
In this case, the guest ELF file will be generated in the ./target/elf/riscv64ima-zisk-zkvm-elf/release directory.
Execute
You can test your compiled program using the emulator before generating a proof. Use the -i (--inputs) flag to specify the location of the input file:
cargo-zisk run --release -i ../host/tmp/input.bin
If the program requires a large number of ZisK steps, you might encounter the following error:
Error during emulation: EmulationNoCompleted
Error: Error executing Run command
To resolve this, use ziskemu directly and increase the number of execution steps using the -n (--max-steps) flag. For example:
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i ../host/tmp/input.bin -n 10000000000
Metrics and Statistics
Performance Metrics
You can get performance metrics related to the program execution in ZisK using the -m (--log-metrics) flag in ziskemu tool:
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i ../host/tmp/input.bin -m
The output will include details such as execution time, throughput, and clock cycles per step:
process_rom() steps=4450270 duration=0.0436 tp=102.0505 Msteps/s freq=3504.0000 34.3359 clocks/step
...
Execution Statistics
You can get statistics related to the program execution in Zisk using the -p (--profiling) flag with summary in cargo-zisk:
cargo-zisk run --release -i ../host/tmp/input.bin -p summary
The output will include details such as cost definitions, total cost, opcode statistics, etc:
R╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║ ◆ REPORT SUMMARY ║
╠══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
║ STEPS 4,450,270 ║
║ COST 787,338,404 ║
║ RAM 0.00 MB / 507.75 MB ║
╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║ ◆ COST DISTRIBUTION SUMMARY ║
╠══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
║ CATEGORY COST % ║
║ ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄ ║
║ Base █████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 293,601,280 37.3% ║
║ Main ██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 302,618,360 38.4% ║
║ Opcodes █████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 174,799,164 22.2% ║
║ Precompiles ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 234,155 0.0% ║
║ Memory ██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 16,085,445 2.0% ║
║ ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄ ║
║ Total 787,338,404 100.0% ║
╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║ ◆ COST DISTRIBUTION BY OPCODE ║ ◆ OPS vs FROPS ║
╠══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
║ OPCODE COST % ║ OPS + FROPS FROPS % ║
║ ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄ ║ ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄ ║
║ xor █░░░░░░░░░░░░░░░░░░░░░░ 41,398,920 5.3% ║ 42,240,480 841,560 2.0% ║
║ or █░░░░░░░░░░░░░░░░░░░░░░ 36,646,620 4.7% ║ 38,881,560 2,234,940 5.7% ║
║ srl_w █░░░░░░░░░░░░░░░░░░░░░░ 34,606,615 4.4% ║ 36,040,000 1,433,385 4.0% ║
║ sll █░░░░░░░░░░░░░░░░░░░░░░ 30,019,783 3.8% ║ 34,007,662 3,987,879 11.7% ║
║ add ░░░░░░░░░░░░░░░░░░░░░░░ 16,846,475 2.1% ║ 16,998,100 151,625 0.9% ║
║ and ░░░░░░░░░░░░░░░░░░░░░░░ 12,917,580 1.6% ║ 13,456,080 538,500 4.0% ║
║ signextend_w ░░░░░░░░░░░░░░░░░░░░░░░ 849,590 0.1% ║ 849,590 0 0.0% ║
║ signextend_b ░░░░░░░░░░░░░░░░░░░░░░░ 848,053 0.1% ║ 848,053 0 0.0% ║
║ srl ░░░░░░░░░░░░░░░░░░░░░░░ 429,883 0.1% ║ 439,953 10,070 2.3% ║
║ dma_xmemset ░░░░░░░░░░░░░░░░░░░░░░░ 200,496 0.0% ║ ║
║ ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄ ║ ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄ ║
║ Total 175,033,319 22.2% ║ 184,735,683 9,702,364 5.3% ║
╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║ ◆ TOP COST FUNCTIONS ║
╠══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
║ 0 sha2::sha256::compress256 ████████████░░░░░░░░ 473,976,966 60.2% ║
║ 1 std::io::stdio::_print ░░░░░░░░░░░░░░░░░░░░ 4,290,957 0.5% ║
║ 2 core::fmt::write ░░░░░░░░░░░░░░░░░░░░ 4,258,155 0.5% ║
║ 3 <alloc::vec::Vec<u8> as core::fmt::Debug>::fmt ░░░░░░░░░░░░░░░░░░░░ 3,852,860 0.5% ║
║ 4 <core::fmt::builders::DebugSet>::entry ░░░░░░░░░░░░░░░░░░░░ 3,746,448 0.5% ║
║ 5 <std::..::Adapter<…> as core::fmt::Write>::write_str ░░░░░░░░░░░░░░░░░░░░ 2,549,696 0.3% ║
║ 6 <&u8 as core::fmt::Debug>::fmt ░░░░░░░░░░░░░░░░░░░░ 2,193,178 0.3% ║
║ 7 <u8 as core::fmt::Display>::fmt ░░░░░░░░░░░░░░░░░░░░ 2,105,434 0.3% ║
║ 8 <std::..::LineWriterShim<…> as std::io::Write>::write_all ░░░░░░░░░░░░░░░░░░░░ 1,953,802 0.2% ║
║ 9 <core::fmt::Formatter>::pad_integral ░░░░░░░░░░░░░░░░░░░░ 1,820,586 0.2% ║
║ 10 core::slice::memchr::memrchr ░░░░░░░░░░░░░░░░░░░░ 843,066 0.1% ║
║ 11 memset ░░░░░░░░░░░░░░░░░░░░ 499,356 0.1% ║
║ 12 <std::io::buffered::bufwriter::BufWriter<…>>::flush_buf ░░░░░░░░░░░░░░░░░░░░ 202,008 0.0% ║
║ 13 sys_write ░░░░░░░░░░░░░░░░░░░░ 196,791 0.0% ║
║ 14 <core::fmt::Formatter>::pad_integral::write_prefix ░░░░░░░░░░░░░░░░░░░░ 190,411 0.0% ║
║ 15 memcpy ░░░░░░░░░░░░░░░░░░░░ 117,529 0.0% ║
║ 16 ziskos::io::commit_slice ░░░░░░░░░░░░░░░░░░░░ 85,079 0.0% ║
║ 17 <alloy_primitives::..::FixedBytes<…> as core::fmt::Debug>::fmt ░░░░░░░░░░░░░░░░░░░░ 57,891 0.0% ║
║ 18 <u32 as core::fmt::Display>::fmt ░░░░░░░░░░░░░░░░░░░░ 29,674 0.0% ║
║ 19 <core::fmt::Formatter as core::fmt::Write>::write_str ░░░░░░░░░░░░░░░░░░░░ 19,363 0.0% ║
║ 20 <core::fmt::Formatter>::debug_list ░░░░░░░░░░░░░░░░░░░░ 13,582 0.0% ║
║ 21 <core::fmt::builders::DebugList>::finish ░░░░░░░░░░░░░░░░░░░░ 13,189 0.0% ║
║ 22 <…>::initialize::<…> ░░░░░░░░░░░░░░░░░░░░ 7,830 0.0% ║
║ 23 <u32>::_fmt_inner ░░░░░░░░░░░░░░░░░░░░ 7,338 0.0% ║
║ 24 std::io::stdio::print_to_buffer_if_capture_used ░░░░░░░░░░░░░░░░░░░░ 6,165 0.0% ║
╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
Prove
Program Setup
Before generating a proof, you need to generate the program setup files. This must be done the first time after building the program ELF file, or any time it changes:
cargo-zisk program-setup
The program setup files will be generated in the cache directory located at $HOME/.zisk.
To clean the cache directory content, use the following command:
cargo-zisk utils clean-cache --all
Generate Proof
To generate a proof, run the following command:
cargo-zisk prove -i ../host/tmp/input.bin -o proof.bin
In this command:
-i(--input) specifies the input file location.-o(--output) determines the output directory (in this exampleproof).
Note: If you have installed the GPU version of the ZisK binaries, you can use the --gpu flag to enable GPU acceleration during proof generation.
If the process is successful, you should see a message similar to:
...
INFO: --- PROVE SUMMARY ------------------------
INFO: Proof Time: 5.097 seconds
INFO: Execution completed in 5097ms, steps: 4450272
INFO: Execution summary: Proofman 4910ms + Execution 34ms + Count&Plan 17ms + Count&Plan MO 0ms
Concurrent Proof Generation
Zisk proofs can be generated using multiple processes concurrently to improve performance and scalability. The standard MPI (Message Passing Interface) approach is used to launch these processes, which can run either on the same server or across multiple servers.
To execute a Zisk proof using multiple processes, use the following command:
mpirun --bind-to none -np <num_processes> -x OMP_NUM_THREADS=<num_threads_per_process> -x RAYON_NUM_THREADS=<num_threads_per_process> target/release/cargo-zisk <zisk arguments>
In this command:
<num_processes>specifies the number of processes to launch.<num_threads_per_process>sets the number of threads used by each process via theOMP_NUM_THREADSandRAYON_NUM_THREADSenvironment variables.--bind-to noneprevents binding processes to specific cores, allowing the operating system to schedule them dynamically for better load balancing.
Running a Zisk proof with multiple processes enables efficient workload distribution across multiple servers. On a single server with many cores, splitting execution into smaller subsets of cores generally improves performance by increasing concurrency. As a general rule, <num_processes> * <num_threads_per_process> should match the number of available CPU cores or double that if hyperthreading is enabled.
The total memory requirement increases proportionally with the number of processes. If each process requires approximately 25GB of memory, running P processes will require roughly (25 * P)GB of memory. Ensure that the system has sufficient available memory to accommodate all running processes.
Verify Proof
To verify a generated proof, use the following command:
cargo-zisk verify -p proof.bin
In this command:
-p(--proof) specifies the final proof file generated with cargo-zisk prove.- The remaining flags specify the files required for verification; they are optional, set by default to the files found in the
$HOME/.ziskdirectory.
Precompiles
Precompiles are built-in system functions within ZisK’s operating system that accelerate computationally expensive and frequently used operations such as the Keccak-f permutation and Secp256k1 addition and doubling.
These precompiles improve proving efficiency by offloading intensive computations from ZisK programs to dedicated, pre-integrated sub-processors.
How Precompiles Work
Precompiles are primarily used to patch third-party crates, replacing costly operations with system calls. This ensures that commonly used cryptographic primitives like Keccak hashing and elliptic curve operations can be efficiently executed within ZisK programs.
Typically, precompiles are used to patch third-party crates that implement these operations and are then used as dependencies in the Zisk programs we write.
You can see here an example of the patched tiny-keccak crate.
Available Precompiles in ZisK
Below is a summary of the precompiles currently available in ZisK:
- syscall_add256: Addition over 256-bit non-negative integers.
- syscall_arith256: Multiplication followed by addition over 256-bit non-negative integers.
- syscall_arith256_mod: Modular multiplication followed by addition over 256-bit non-negative integers.
- syscall_arith384_mod: Modular multiplication followed by addition over 256-bit non-negative integers.
- syscall_keccak_f: Keccak-f 1600 permutation function from the Keccak cryptographic hash function.
- syscall_sha256_f: Extend and compress function of the SHA-256 cryptographic hash function.
- syscall_blake2br: Round function of the BLAKE2b cryptographic hash function.
- syscall_syscall_poseidon2: Compression function of the Poseidon2 cryptographic hash function.
- syscall_secp256k1_add: Elliptic curve point addition over the Secp256k1 curve.
- syscall_secp256k1_dbl: Elliptic curve point doubling over the Secp256k1 curve.
- syscall_secp256r1_add: Elliptic curve point addition over the Secp256r1 curve.
- syscall_secp256r1_dbl: Elliptic curve point doubling over the Secp256r1 curve.
- syscall_bn254_curve_add: Elliptic curve point addition over the Bn254 curve.
- syscall_bn254_curve_dbl: Elliptic curve point doubling over the Bn254 curve.
- syscall_bn254_complex_add: Complex addition within the quadratic extension built over the base field of the Bn254 curve.
- syscall_bn254_complex_sub: Complex subtraction within the quadratic extension built over the base field of the Bn254 curve.
- syscall_bn254_complex_mul: Complex multiplication within the quadratic extension built over the base field of the Bn254 curve.
- syscall_arith384_mod: Modular multiplication followed by addition over 384-bit non-negative integers.
- syscall_bls12_381_curve_add: Elliptic curve point addition over the BLS12-381 curve.
- syscall_bls12_381_curve_dbl: Elliptic curve point doubling over the BLS12-381 curve.
- syscall_bls12_381_complex_add: Complex addition within the quadratic extension built over the base field of the BLS12-381 curve.
- syscall_bls12_381_complex_sub: Complex subtraction within the quadratic extension built over the base field of the BLS12-381 curve.
- syscall_bls12_381_complex_mul: Complex multiplication within the quadratic extension built over the base field of the BLS12-381 curve.
Distributed Execution
Generating a ZisK proof means proving the full execution trace of a program. For real workloads, that trace is too large and too slow to prove on a single machine. A ZisK cluster splits the trace into pieces, proves each in parallel on separate machines, and aggregates the results into a single final proof. Throughput and latency scale with the number of machines you give it.
This guide covers the three things you need to run distributed proving: the cluster's architecture, a single-host quickstart that gets a job through the binaries, and the production path that deploys the same binaries on bare Linux hosts with systemd.
Architecture
A ZisK cluster is two binaries: a single zisk-coordinator and one
or more zisk-worker instances.
┌─────────────────────────┐
│ Host application │
│ (RemoteClient) │
└────────────┬────────────┘
│
│ gRPC :7000
│ prove request
▼
╔════════════════════════════════════════════════════════╗
║ ZisK cluster ║
║ ║
║ ┌──────────────────────────────────┐ ║
║ │ zisk-coordinator │ ║
║ │ :7000 :50051 :9090 │ ║
║ └───┬──────────┬──────────┬────────┘ ║
║ │ │ │ ║
║ assign │ assign │ assign │ ║
║ segments │ segments │ segments │ ║
║ ▼ ▼ ▼ ║
║ ┌──────────┐ ┌──────────┐ ┌──────────┐ ║
║ │ worker 1 │ │ worker 2 │ │ worker 3 │ ║
║ └─────┬────┘ └─────┬────┘ └─────┬────┘ ║
║ │ │ │ ║
║ segment │ segment │ segment │ ║
║ proof │ proof │ proof │ ║
║ ▼ ▼ ▼ ║
║ ┌──────────────────────────────┐ ║
║ │ Aggregation tree │ ║
║ └──────────────┬───────────────┘ ║
╚════════════════════════│═══════════════════════════════╝
│
▼
┌─────────────────┐
│ Final proof │
└────────┬────────┘
│
│ return proof
▼
┌─────────────────────────┐
│ Host application │
└─────────────────────────┘
The coordinator
The coordinator is the only stateful process in the cluster. It exposes a public gRPC interface that hosts use to submit proof requests, poll job status, and retrieve results. From the host's point of view, the coordinator is the only endpoint it ever talks to; workers are an invisible implementation detail.
Internally, the coordinator splits each job into segments, assigns them to workers, and returns the final proof. It also caches the proving keys derived from each uploaded guest ELF, so subsequent jobs for the same program skip the expensive setup step.
Workers
Workers are the proving processes. Each worker connects outbound to the coordinator and waits for proof assignments. Workers are stateless across jobs, holding only the segments they are currently proving. You can add, remove, or restart them without touching the coordinator or losing cluster state.
The first worker to send its partial proof to the coordinator is automatically promoted to aggregator for that job. The aggregator collects the remaining segment proofs and assembles the final proof, then returns it to the coordinator.
Proving pipeline
Once a job is submitted, the coordinator selects workers from the available pool and runs three phases:
- Partial contributions. Each assigned worker processes its segments and returns partial challenges. The coordinator collects them and derives a single global challenge.
- Prove. The coordinator broadcasts the global challenge to all workers. Each worker computes its partial proofs and returns them.
- Aggregation. The first worker to deliver its partial proof is promoted to aggregator and builds a binary aggregation tree, folding the remaining partial proofs in as they land and returning the final proof to the coordinator.
Client Coordinator Workers
│ │ │
│ prove(request) │ │
├───────────────────>│ │
│ │ assign segments │
│ ├───────────────────>│
│ │ │
│ ╔════════════╧════════════════════╧════════════╗
│ ║ Phase 1: Partial contributions ║
│ ╚════════════╤════════════════════╤════════════╝
│ │ partial challenges │
│ │<───────────────────┤
│ │ │
│ ╔════════════╧════════════════════╧════════════╗
│ ║ Phase 2: Prove ║
│ ╚════════════╤════════════════════╤════════════╝
│ │ global challenge │
│ ├───────────────────>│
│ │ │
│ │ partial proofs │
│ │<───────────────────┤
│ │ │
│ ╔════════════╧════════════════════╧════════════╗
│ ║ Phase 3: Aggregation ║
│ ║ ┌──────────────────────────────┐ ║
│ ║ │ First worker to reply │ ║
│ ║ │ becomes aggregator │ ║
│ ║ └──────────────────────────────┘ ║
│ ╚════════════╤════════════════════╤════════════╝
│ │ aggregate │
│ ├───────────────────>│
│ │ │
│ │ final proof │
│ │<───────────────────┤
│ return proof │ │
│<───────────────────┤ │
│ │ │
▼ ▼ ▼
Quickstart: single-host cluster
This brings up one coordinator and one worker on the same machine, then submits a real proving job. It is the smallest deployment that exercises the production binaries end-to-end.
Prerequisites
- Rust toolchain (
cargo --versionshould work) - ~32 GB free RAM (Assembly emulator preallocates large shared regions)
- Zisk installed. Follow installation guide.
Clone the repo:
git clone https://github.com/0xPolygonHermez/zisk.git
cd zisk
Start the coordinator
zisk-coordinator
The coordinator binds three default ports on startup:
| Port | Purpose |
|---|---|
| 7000 | Client-facing gRPC API. Host applications connect here. |
| 50051 | Worker-facing gRPC port. Workers connect here. |
| 9090 | Prometheus metrics endpoint and /health liveness probe. |
If the coordinator exits with Address already in use, override the
offending port:
zisk-coordinator --api-port 8000 --cluster-port 60000 --metrics-port 5245
Start a worker
In a second terminal:
zisk-worker --config distributed/deploy/config/worker.toml
If you built ZisK with CUDA support and want the worker to use the
GPU, append --gpu.
worker.toml points the worker at http://127.0.0.1:50051, advertises
ten compute units, and sets the log level to info. On a successful
handshake:
INFO registered as worker <random-uuid> (capacity 10)
The coordinator logs the matching side:
INFO worker registered: <random-uid> capacity=10
Health check
With the coordinator and worker both running, verify the cluster in two steps: a liveness probe and an end-to-end proving job.
Liveness probe. In a third terminal:
curl http://127.0.0.1:9090/health
A healthy coordinator returns 200 OK with an empty body.
Smoke-test proof. Submit a real job from the included example:
cd examples/sha-hasher/host
cargo run --release --bin prove-remote
The prove-remote binary builds a ProverClient::remote("http://127.0.0.1:7000"),
uploads the guest ELF, and waits for the final proof. End-to-end:
the coordinator splits the trace into segments and hands them to the
worker, the worker produces the STARK proofs. Terminals 1 and 2 show the
matching coordinator and worker activity.
CLI references
A handful of operational knobs are CLI-only and not exposed in the TOML:
| Flag | Default | Description |
|---|---|---|
--proving-key | ~/.zisk/provingKey | Path to the proving-key folder |
--elf | (none) | Path to the ELF file |
--shared-tables | false | Share tables when running in a cluster |
--verify-constraints | false | Verify constraints after witness gen |
-n, --number-threads-witness | (none) | Threads for witness computation |
-g, --gpu | false | Enable GPU mode (CUDA build only) |
-t, --max-streams | (none) | Maximum GPU streams |
CLI flags override the config file for one-off testing:
zisk-coordinator --api-port 8000 --cluster-port 60000 --log-level debug
zisk-worker --coordinator-url http://prod-coord:50051 --compute-capacity 32
Deployment with scripts
This section deploys the same two binaries on bare hosts under systemd, the canonical path for a ZisK cluster.
Prerequisites
- ~32 GB free RAM (for Assembly emulator to preallocate large shared regions)
Install the coordinator
On the coordinator host run:
curl https://raw.githubusercontent.com/0xPolygonHermez/zisk/refs/heads/main/distributed/deploy/scripts/coordinator/install.sh | sudo bash
The script:
- Creates the zisk system user and group (home /var/empty, no login)
- Drops the zisk-coordinator-server binary at /usr/local/bin/
- Writes the config to /etc/zisk/coordinator.toml (or installs the example if none provided)
- Creates the working directory at /var/lib/zisk with a pre-made .zisk/cache subdir, owned by the service user
- Writes a hardened systemd unit at /etc/systemd/system/zisk-coordinator.service (Linux) or a launchd plist at /Library/LaunchDaemons/ plus a newsyslog rotation rule (macOS)
- Runs systemctl enable --now (or launchctl load) unless --no-start / --no-enable is passed
Verify the service:
- In Linux:
sudo systemctl status zisk-coordinator
sudo journalctl -u zisk-coordinator -f
- In macOS:
sudo launchctl print system/com.zisk.coordinator
sudo tail -f /var/log/zisk/zisk-coordinator-server.log
If the service is failed, the logs above show the underlying
error (most often a port conflict or a missing config field).
Configure the coordinator
Every setting is optional; the binary falls back to a built-in default for anything you leave out.
Override precedence (later wins): built-in defaults → config file →
ZISK_COORDINATOR_* environment variables → CLI flags.
Edit /etc/zisk/coordinator.toml:
[service] — coordinator identity.
| Setting | Default | Notes |
|---|---|---|
name | "ZisK Coordinator" | Shown in logs and status output. |
environment | development | One of development, staging, production. Use production. |
[server] — client-facing gRPC API.
| Setting | Default | Notes |
|---|---|---|
host | 0.0.0.0 | Listen address. Bind to a specific interface to restrict access. |
port | 7000 | Client gRPC port. CLI: --api-port, env: ZISK_COORDINATOR_API_PORT. |
shutdown_timeout_seconds | 30 | Drain time after a shutdown signal before forced exit. |
[coordinator] — worker-facing port and core tuning.
| Setting | Default | Notes |
|---|---|---|
port | 50051 | Worker gRPC port. CLI: --cluster-port, env: ZISK_COORDINATOR_CLUSTER_PORT. |
config_file | (none) | Optional path to a coordinator-core tuning file. |
[metrics] — Prometheus endpoint.
| Setting | Default | Notes |
|---|---|---|
enabled | true | Set false to disable /metrics. /health stays available either way. |
host | 0.0.0.0 | Listen address for the scrape endpoint. |
port | 9090 | Scrape port. CLI: --metrics-port, env: ZISK_COORDINATOR_METRICS_PORT. |
[logging] — what gets logged and where.
| Setting | Default | Notes |
|---|---|---|
level | info | trace, debug, info, warn, error. RUST_LOG takes precedence. |
format | pretty | pretty, json (production aggregators), or compact. |
file_path | (none) | Rotating daily log file. Leave unset on systemd hosts; journald captures stdout. |
After editing:
- In Linux:
sudo systemctl restart zisk-coordinator
- In macOS:
sudo launchctl kickstart -k system/com.zisk.coordinator
Install workers
Run the installer, with the following command:
- In Linux:
curl https://raw.githubusercontent.com/0xPolygonHermez/zisk/refs/heads/main/distributed/deploy/scripts/worker/install.sh | sudo bash
- In macOS:
curl https://raw.githubusercontent.com/0xPolygonHermez/zisk/refs/heads/main/distributed/deploy/scripts/worker/install.sh | sudo bash -s -- --no-mpi
This script:
- Creates the zisk system user and group (home /var/empty, no login)
- Drops the zisk-worker binary at /usr/local/bin/
- Writes the config to /etc/zisk/worker.toml (or installs the example if none provided)
- Creates the working directory at /var/lib/zisk with a pre-made .zisk/cache subdir, owned by the service user
- Writes a hardened systemd unit at /etc/systemd/system/zisk-worker.service (Linux) or a launchd plist at /Library/LaunchDaemons/ plus a newsyslog rotation rule (macOS)
- Runs systemctl enable --now (or launchctl load) unless --no-start / --no-enable is passed
Verify the service:
- In Linux:
sudo systemctl status zisk-worker
sudo journalctl -u zisk-worker -f
- In macOS:
sudo launchctl print system/com.zisk.worker
sudo tail -f /var/log/zisk/zisk-worker-server.log
The worker starts immediately and uses its default coordinator URL
(http://127.0.0.1:50051).
Note: the default URL only works when the worker runs on the same host as the coordinator. When deploying workers on separate hosts, edit
[coordinator].urlin/etc/zisk/worker.tomlto point at the coordinator's worker-facing port (50051by default), then restart the service. Confirm registration in the coordinator log:INFO worker registered: <random-uuid> capacity=10
Configure the worker
Every setting is optional; the binary falls back to a built-in default for anything you leave out.
Override precedence (later wins): built-in defaults → config file →
ZISK_WORKER_* environment variables → CLI flags.
Edit /etc/zisk/worker.toml:
[worker] — identity, capacity, on-disk location.
| Setting | Default | Notes |
|---|---|---|
worker_id | random UUID | Pin to e.g. the hostname so log correlation works at scale. |
compute_capacity.compute_units | 10 | Start at one unit per physical CPU core (minus two for OS overhead), plus one per GPU stream. |
environment | development | development or production. |
inputs_folder | /var/lib/zisk-worker/inputs | Where the worker writes intermediate input files. Override only for a faster disk or separate partition. |
[coordinator] — registration target.
| Setting | Default | Notes |
|---|---|---|
url | http://127.0.0.1:50051 | gRPC URL of the coordinator's worker-facing port. |
[connection] — reaction to network trouble.
| Setting | Default | Notes |
|---|---|---|
reconnect_interval_seconds | 5 | Backoff between reconnect attempts when the coordinator is unreachable. |
heartbeat_timeout_seconds | 30 | How long to wait for a heartbeat before treating the connection dead. |
[logging] — same shape as the coordinator's [logging]
table.
After editing:
- In Linux:
sudo systemctl restart zisk-worker
- In macOS:
sudo launchctl kickstart -k system/com.zisk.worker
Add more workers
Run the install script on as many hosts as you want. All workers register against the same coordinator and receive work proportional to their advertised capacity.
┌──────────────────────────────┐
│ Application host │
│ ┌────────────────────────┐ │
│ │ host program │ │
│ │ (RemoteClient) │ │
│ └───────────┬────────────┘ │
└──────────────│───────────────┘
│
│ :7000
▼
┌──────────────────────────────┐
│ Coordinator host │
│ ┌────────────────────────┐ │
│ │ zisk-coordinator │ │
│ │ :7000 :50051 :9090 │ │
│ └───────────▲────────────┘ │
└──────────────│───────────────┘
│
┌─────────┼─────────┐
│ :50051 │ :50051 │ :50051
│ │ │
┌────┴────┐┌───┴─────┐┌──┴──────┐
│ Worker ││ Worker ││ Worker │
│ host A ││ host B ││ host C │
│(32 unit)││(32 unit)││(16 unit)│
│┌───────┐││┌───────┐││┌───────┐│
││zisk- ││││zisk- ││││zisk- ││
││worker ││││worker ││││worker ││
│└───────┘││└───────┘││└───────┘│
└─────────┘└─────────┘└─────────┘
Hints Stream
The hints stream accelerates proof generation by offloading expensive operations outside the zkVM execution, then feeding the results back as verifiable data through a high-performance, parallel pipeline. Hints are preprocessed results that allow operations to be handled externally while remaining fully verifiable inside the VM. The system supports two categories of hints:
- Precompile hints: Cryptographic operations (SHA-256, Keccak-256, elliptic curve operations, pairings, etc.) that are computationally expensive inside a zkVM.
- Input hints: Data that needs to be passed to the zkVM as input during execution.
The system is designed around three core principles:
- Pre-computing results outside the VM: The guest program emits hint requests describing the operation and its inputs.
- Streaming results back: A dedicated pipeline processes these requests in parallel, maintaining order, and feeds results to the prover via shared memory.
- Verifying inside the VM: The zkVM circuits verify that the precomputed results are correct, avoiding the cost of computing them inside the zkVM.
flowchart LR
A["Guest program<br/><small>Emits hints request</small>"] --> B["ZiskStream"]
B --> C["HintsProcessor<br/><small>Parallel engine</small>"]
C --> D["StreamSink<br/><small>ASM emulator/file output</small>"]
Table of Contents
- Hint Format and Protocol
- Using Hints with the SDK
- Hints in Distributed Execution
- Custom Hint Handlers
- Generating Hints in Guest Programs
1. Hint Format and Protocol
1.1. Hint Request Format
Hints are transmitted as a stream of u64 values. Each hint request consists of a header (1 u64) followed by data (N u64 values).
┌─────────────────────────────────────────────────────────────┐
│ Header (u64) │
├·····························································┤
│ Hint Code (32 bits) Length (32 bits). │
├─────────────────────────────────────────────────────────────┤
│ Data[0] (u64) │
├─────────────────────────────────────────────────────────────┤
│ Data[1] (u64) │
├─────────────────────────────────────────────────────────────┤
│ ... │
├─────────────────────────────────────────────────────────────┤
│ Data[N-1] (u64) │
└─────────────────────────────────────────────────────────────┘
where N = ceil(Length / 8)
- Hint Code (upper 32 bits): Control code or Data Hint Type
- Length (lower 32 bits): Payload data size in bytes. The last
u64may contain padding bytes.
1.2. Control Hint Types:
The following control codes are defined:
0x00(START): Start a new hint stream. Resets processor state and sequence counters. Must be the first hint in the first batch.0x01(END): End the current hint stream. The processor will wait for all pending hints to be processed before returning. Must be the last hint in its batch; only aCTRL_STARTmay follow in a subsequent batch.0x02(CANCEL): [Reserved for future use] Cancel current stream and stop processing further hints.0x03(ERROR): [Reserved for future use] Indicate an error has occurred; stop processing further hints.
Control codes are for control only and do not have any associated data (Length should be zero).
1.3. Data Hint Types
For data hints, the hint code (32 bits) is structured as follows:
- Bit 31 (MSB): Pass-through flag. When set, the data bypasses computation and is forwarded directly to the sink.
- Bits 0-30: The hint type identifier (control, built-in, or custom code).
(e.g.,
HINT_SHA256,HINT_BN254_G1_ADD,HINT_SECP256K1_RECOVER, etc.)
Example: A SHA-256 hint (0x0100) with a 32-byte input:
Header: 0x00000100_00000020
Data[0]: first_8_input_bytes_as_u64
Data[1]: next_8_input_bytes_as_u64
Data[2]: next_8_input_bytes_as_u64
Data[3]: last_8_input_bytes_as_u64
The same hint with the pass-through flag set (bit 31), forwarding pre-computed data directly to the sink without invoking the SHA-256 handler:
Header: 0x80000100_00000020
1.3.1 Stream Batching
The hints protocol supports chunking for individual hints that exceed the transport’s message size limit (currently 128 KB). Each message in the stream contains either a single complete hint or one chunk of a larger hint — hints are never combined in the same message.
When a hint exceeds the size limit, it must be split into multiple sequential chunks, each sent as a separate message. Each chunk includes a header specifying the total length of the complete hint, allowing the receiver to reassemble all chunks before processing. For example, a hint with a 300 KB payload would be split into three messages:
Message 2: Header (code + total length), Data[0..N] (second 128 KB chunk)
Message 3: Header (code + total length), Data[0..M] (final 44 KB chunk)
The receiver buffers incoming chunks and reassembles them based on the total length specified in the header before invoking the hint handler. This allows the system to handle arbitrarily large hints while respecting transport limitations.
1.3.2 Pass-Through Hints
When bit 31 of the hint code is set (e.g., 0x8000_0000 | actual_code), the hint is marked as pass-through:
- The data payload is forwarded directly to the sink without invoking any handler.
- No worker thread is spawned; the data is queued immediately in the reorder buffer.
- This is useful for pre-computed results that don't need processing.
1.4. Hint Code Types
| Category | Code Range | Description |
|---|---|---|
| Control | 0x0000-0x000F | Stream lifecycle management |
| Built-in | 0x0100-0x0800 | Cryptographic precompile operations |
| Input | 0xF0000 | Input data hints |
| Custom | User-defined | Application-specific handlers |
Note: Custom hint codes can technically use any value not occupied by control or built-in codes. By convention, codes
0xA000-0xFFFFare recommended for custom use to avoid future conflicts as new built-in types are added. The processor does not enforce a range restriction — any unrecognized code is treated as custom.
1.4.1. Control Codes
Control codes manage the stream lifecycle and do not carry computational data:
| Code | Name | Description |
|---|---|---|
0x0000 | CTRL_START | Resets processor state. Must be the first hint in the first batch. |
0x0001 | CTRL_END | Signals end of stream. Blocks until all pending hints complete. Must be the last hint. |
0x0002 | CTRL_CANCEL | [Reserved for future use] Cancels the current stream. Sets error flag and stops processing. |
0x0003 | CTRL_ERROR | [Reserved for future use] External error signal. Sets error flag and stops processing. |
1.4.2. Built-in Hint Types
| Code | Name | Description |
|---|---|---|
0x0100 | Sha256 | SHA-256 hash computation |
0x0200 | Bn254G1Add | BN254 G1 point addition |
0x0201 | Bn254G1Mul | BN254 G1 scalar multiplication |
0x0205 | Bn254PairingCheck | BN254 pairing check |
0x0300 | Secp256k1EcdsaAddressRecover | Secp256k1 ECDSA address recovery |
0x0301 | Secp256k1EcdsaVerifyAddressRecover | Secp256k1 ECDSA verify + address recovery |
0x0380 | Secp256r1EcdsaVerify | Secp256r1 (P-256) ECDSA verification |
0x0400 | Bls12_381G1Add | BLS12-381 G1 point addition |
0x0401 | Bls12_381G1Msm | BLS12-381 G1 multi-scalar multiplication |
0x0405 | Bls12_381G2Add | BLS12-381 G2 point addition |
0x0406 | Bls12_381G2Msm | BLS12-381 G2 multi-scalar multiplication |
0x040A | Bls12_381PairingCheck | BLS12-381 pairing check |
0x0410 | Bls12_381FpToG1 | BLS12-381 map field element to G1 |
0x0411 | Bls12_381Fp2ToG2 | BLS12-381 map field element to G2 |
0x0500 | ModExp | Modular exponentiation |
0x0600 | VerifyKzgProof | KZG polynomial commitment proof verification |
0x0700 | Keccak256 | Keccak-256 hash computation |
0x0800 | Blake2bCompress | Blake2b compression function |
1.4.3. Input Hint Type
Input hints allow passing data to the zkVM during execution. Unlike precompile hints that are processed by worker threads, input hints are forwarded directly to a separate inputs sink.
| Code | Name | Description |
|---|---|---|
0xF0000 | Input | Input data for the zkVM |
The input hint payload format is:
- First 8 bytes: Length of the input data (as
u64little-endian) - Remaining bytes: The actual input data, padded to 8-byte alignment
Input hints are not processed by the parallel worker pool; instead, they are immediately submitted to the inputs sink for consumption by the zkVM.
1.4.4. Custom Hint Types
Custom hint types allow users to define their own hint handlers for application-specific logic. Users can register custom handlers via the HintsProcessor builder API, providing a mapping from hint code to a processing function (see Custom Hint Handlers). By convention, codes in the range 0xA000-0xEFFFF are recommended for custom use to avoid conflicts with current and future built-in types. If a data hint is received with an unregistered code, the processor returns an error and stops processing immediately.
1.5. Stream Protocol
A valid hint stream follows this protocol:
CTRL_START ← Reset state, begin stream
[Hint_1] [Hint_2] ... [Hint_N] ← Data hints (precompile, input, or custom)
CTRL_END ← Wait for completion, end stream
2. Consuming Hints
Once a guest program has produced a hints binary file (see Section 5), you can feed it to the prover either programmatically through the ZisK SDK or via the ZisK CLI.
Note: Hints are only supported with the Assembly executor. The emulator-based executor does not use the hints pipeline.
2.1 SDK
Load the file with ZiskHints::from_file and pass it to .hints(...) on the executor:
use anyhow::Result; use zisk_sdk::{ExecutorKind, GuestProgram, ProverClient, ZiskStdin, ZiskHints}; #[tokio::main] async fn main() -> Result<()> { let elf_path = "hints/example/zec-reth.elf"; let program = GuestProgram::from_uri(elf_path)?; let hints_path = "hints/example/24654300_hints.bin"; let hints = ZiskHints::from_file(hints_path)?; let client = ProverClient::embedded() .executor(ExecutorKind::Assembly) .build()?; client.upload(&program).run()?; client.setup(&program).with_hints().run()?.await?; let result = client .execute(&program, ZiskStdin::new()) .hints(hints) .executor(ExecutorKind::Assembly) .run()? .await?; println!( "Program executed successfully: {} cycles in {:.2?} ms", result.get_execution_steps(), result.get_execution_time() ); Ok(()) }
Notes:
- Setup must be run with
.with_hints()so the assembly ROM is generated with hint support enabled. Without it, the prover will not consume the hints stream. ZiskHints::from_fileloads the binary produced by the guest's hint generation. The returned value can be reused across multiple.execute(...)/.prove(...)calls.- The same pattern works for
prove,verify-constraints, andstatsoperations exposed byProverClient.
A complete runnable example is available at examples/hints/host/src/main.rs.
2.2 CLI
Four cargo-zisk commands accept a --hints flag pointing to the hints file: execute, prove, verify-constraints, and stats. Pass the path with the file:// scheme:
--hints file://path → File stream reader
Example:
cargo-zisk prove --elf program.elf --hints file:///abs/path/hints.bin
--hints is mutually exclusive with --inputs (-i): if you provide hints, the inputs are recovered from the hint stream itself rather than from a separate input file.
3. Hints in Distributed Execution
In the distributed proving system, hints are received by the coordinator and broadcasted to all workers via gRPC. The coordinator runs a relay that validates incoming hint messages, assigns sequence numbers for ordering, and dispatches them to workers asynchronously. Workers buffer incoming messages and reorder them by sequence number before processing. The processed hints are then submitted to the sink in the correct order.
There is another mode where workers can load hints from a local path/URI instead of streaming from the coordinator, which is useful for debugging.
3.1. Architecture
flowchart TD
A["Guest program<br/><small>Emits hints request</small>"] --> B
subgraph H["Coordinator"]
B["ZiskStream"]
B --> C["Hints Relay<br/><small>Validates<br>Broadcast to all workers (async)</small>"]
end
C --> E["Worker 1<br/><small>Stream incoming hints + Reorder</small>"]
C --> F["Worker 2<br/><small>Stream incoming hints + Reorder</small>"]
C --> G["Worker N<br/><small>Stream incoming hints + Reorder</small>"]
E --> E1["HintsProcessor<br/><small>Parallel engine</small>"]
E1 --> E2["StreamSink<br/><small>ASM emulator/file output</small>"]
F --> F1["HintsProcessor<br/><small>Parallel engine</small>"]
F1 --> F2["StreamSink<br/><small>ASM emulator/file output</small>"]
G --> G1["HintsProcessor<br/><small>Parallel engine</small>"]
G1 --> G2["StreamSink<br/><small>ASM emulator/file output</small>"]
style H fill:transparent,stroke-dasharray: 5 5
When the coordinator receives a hint request from the guest program, it parses the incoming u64 stream, validates control codes, assigns sequence numbers for ordering, and broadcasts the data to all workers.
Three message types are sent over gRPC to workers:
| StreamMessageKind | When | Payload |
|---|---|---|
Start | On CTRL_START | None |
Data | For each data batch | Sequence number + raw bytes |
End | On CTRL_END | None |
Each worker receives the stream of hints, buffers them if they arrive out of order, and sends them to the HintsProcessor for parallel processing. The HintsProcessor ensures that results are submitted to the sink in the original order.
3.2. Hints Mode Configuration
When calling the coordinator with .hints() prepares to receive hints from the coordinator. A hints system can be configured in two ways:
- Streaming mode: Workers receive hints from the coordinator via gRPC. This is the default and recommended mode for production, as it allows real-time processing of hints as they are generated.
- Path mode: Workers load hints from a local path/URI. This is useful for debugging or when hints are pre-generated and stored in a file. In this mode, the coordinator does not send hints to workers; instead, each worker reads the hints directly from the specified path.
3.2.1 Coordinator Hints Streaming Mode
The transport for the live hints stream is chosen on the SDK side by constructing a ZiskStream and passing it as the hints source on the execute/prove call.
| Constructor | Transport |
|---|---|
ZiskStream::unix() | Unix domain socket at an auto-assigned path under /tmp/ |
ZiskStream::unix_at("/path") | Unix domain socket at an explicit path |
ZiskStream::quic("quic://host:port") | QUIC transport (use quic://127.0.0.1:0 to let the OS pick a port) |
ZiskStream::grpc() | gRPC push transport (data pushed to the coordinator via PushJobInput) |
Example launching a prove job with hints streamed over a Unix socket:
use anyhow::Result; use zisk_sdk::{ExecutorKind, GuestProgram, ProverClient, ZiskStdin, ZiskStream}; #[tokio::main] async fn main() -> Result<()> { let program = GuestProgram::from_uri("hints/example/zec-reth.elf")?; let client = ProverClient::remote("http://127.0.0.1:7000").build()?; let hints = ZiskStream::unix(); let prove_handle = client .prove(&program, ZiskStdin::new()) .hints(hints.clone()) .executor(ExecutorKind::Assembly) .run()?; let proof = prove_handle.await?; Ok(()) }
Switching transports is a one-line change at the call site — replace ZiskStream::unix() with ZiskStream::grpc() or ZiskStream::quic("quic://0.0.0.0:0").
3.2.2 Worker Hints non-Streaming Mode
Non-streaming mode is also selected from the SDK call. Instead of constructing a ZiskStream, build a ZiskHints from a pre-generated file (or in-memory bytes) and pass it to .hints(...). The coordinator skips broadcasting in this case — each worker loads the hints directly from the URI baked into the ZiskHints value. This is useful for debugging or when hints are pre-generated.
| Constructor | Source |
|---|---|
ZiskHints::from_file("/path") | Hints binary on disk (file path or file:// URI) |
ZiskHints::memory(bytes) | Hints already loaded into memory |
ZiskHints::from(&value) | Serializable Rust value (encoded with bincode) |
Example launching a prove job that loads hints from a file:
use anyhow::Result; use zisk_sdk::{ExecutorKind, GuestProgram, ProverClient, ZiskStdin, ZiskHints}; #[tokio::main] async fn main() -> Result<()> { let program = GuestProgram::from_uri("hints/example/zec-reth.elf")?; let client = ProverClient::remote("http://127.0.0.1:7000").build()?; let hints = ZiskHints::from_file("/var/lib/zisk/hints/24654300_hints.bin")?; let proof = client .prove(&program, ZiskStdin::new()) .hints(hints) .executor(ExecutorKind::Assembly) .run()? .await?; Ok(()) }
The same ZiskHints value can be reused across multiple .execute(...) / .prove(...) calls. As with streaming mode, no coordinator or worker flags are required to switch between sources — the SDK call decides.
4. Custom Hint Handlers
Register custom handlers via the builder pattern:
#![allow(unused)] fn main() { let processor = HintsProcessor::builder(my_sink) .custom_hint(0xA000, |data: &[u64]| -> Result<Vec<u64>> { // Custom processing logic Ok(vec![data[0] * 2]) }) .custom_hint(0xA001, |data| { // Another custom handler Ok(transform(data)) }) .build()?; }
Requirements:
- Handler function must be
Fn(&[u64]) -> Result<Vec<u64>> + Send + Sync + 'static. - Custom hint codes should not conflict with built-in codes (
0x0000-0x0700). By convention, use codes in the range0xA000-0xFFFF.
5. Generating Hints in Guest Programs
To generate hints from the guest program you need to follow these steps and requirements:
- Emit hint requests: Patch your code or dependent crates to call the external FFI Hints helper functions that generate the hints input data required later by the
HintsProcessor. See FFI Hints Helper Functions for the list of available built-in FFI Hints helper functions, or Custom Hints Generation to learn how to generate custom hints from the guest program. - Add the
ziskoscrate to your guestCargo.toml. - Initialize and finalize the hint stream: Call the hints init and close functions immediately before and after the section of code that executes precompile logic.
- Enable hints at compile time: Compile your guest program with
RUSTFLAGS='--cfg zisk_hints'for the native target to activate hint code generation and FFI helper functions in theziskoscrate. - Ensure deterministic execution: Verify that both the native execution that generates hints and the guest compiled for the
zkvm/zisktarget execute deterministically and produce/consume hints in the exact same order. See Deterministic Execution Requirement.
To illustrate these steps, consider the zec-reth guest program, which executes and verifies Ethereum Mainnet blocks using the ZisK zkVM:
https://github.com/0xPolygonHermez/zisk-eth-client/tree/main-reth/bin/guest
5.1 Emit Hint Requests
zec-reth relies on reth crates, which expose a Crypto trait that allows a guest program to override precompile implementations. This enables zkVM-optimized implementations while also emitting hints so the computation can be performed outside the zkVM.
For example, the BN254 elliptic curve addition (bn254_g1_add) implementation for the Crypto trait can be found here:
https://github.com/0xPolygonHermez/zisk-eth-client/blob/86b71b39d35efb9894696cab115a1177f3e47dbf/crates/guest-reth/src/crypto/impls.rs#L87
In that file, two target-specific implementations are provided: one for zkvm/zisk and one for native (non-zkVM) targets. When compiling with --cfg zisk_hints for the native target, the zkVM-specific implementation emits a hint request using the FFI helper:
#![allow(unused)] fn main() { #[cfg(zisk_hints)] unsafe { pub fn hint_bn254_g1_add(p1: *const u8, p2: *const u8); } }
This call generates the hint input data using the exact input values that will later be used by the ZisK zkVM when executing the zkvm/zisk target code. This hint input data is consumed later by the HintsProcessor, allowing the bn254_g1_add computation to be performed outside the zkVM while remaining fully verifiable inside the circuit.
After the hint generation, execution continues in the native target code to compute the bn254_g1_add result.
From the guest program, we generate hints containing the input data for the corresponding zisklib functions (in this example, the bn254_g1_add_c function). These zisklib functions may internally invoke one or more precompiles to produce the final result.
When the hints are processed by the HintsProcessor, it executes the same zisklib function using the implementation code for the zkvm/zisk target. This produces the exact precompile results expected when executing the guest ELF inside the zkVM.
As a result, for each zisklib function invocation, the HintsProcessor may generate one or more precompile hint results corresponding to the precompile inputs originally emitted by the guest.
5.2 Initialize/Finalize Hint Stream
When using the ziskos::entrypoint!(main) macro, hint generation is initialized and finalized automatically around your guest entry function. You only need to compile with --cfg zisk_hints (see 5.3) and, optionally, set environment variables to control the output paths.
The macro expands to roughly:
fn main() { zkvm_init(); // initialize hints super::ZISK_ENTRY(); // your guest entry function zkvm_deinit(); // closes hints }
zkvm_init and zkvm_deinit are also exposed as extern "C" symbols so they can be called from C guest programs (see 5.7 Using Hints from C Guest Programs).
5.2.1 Environment Variables
| Variable | Description | Default |
|---|---|---|
ZISK_HINTS_OUTPUT | Path to the hints binary file written by zkvm_init. | ./tmp/hints.bin |
ZISK_INPUT_FILE | Path to the input file consumed by read_input_slice. | build/input.bin |
The ./tmp/ directory is created automatically if it does not exist.
5.2.2 Manual API
If you need finer control (e.g., streaming hints over a Unix socket, configuring a debug file, providing a synchronization signal to the host), call the lower-level functions directly instead of relying on the entrypoint! macro.
#![allow(unused)] fn main() { pub fn init_hints_file(hints_file_path: PathBuf, ready: Option<oneshot::Sender<()>>) -> Result<()> }
Stores the generated hints in the file specified by hints_file_path.
#![allow(unused)] fn main() { pub fn init_hints_socket( socket_path: PathBuf, debug_file: Option<PathBuf>, write_flush_threshold: Option<usize>, ready: Option<oneshot::Sender<()>>, ) -> Result<()> }
Sends the hints through the Unix socket specified by socket_path.
- The optional
debug_filestores a copy of the hints sent through the socket, useful for later debugging. - The optional
write_flush_thresholdcontrols the buffered-write flush size;Noneuses the default. - The optional
readyparameter can be used for synchronization with the host when the guest is executed in a separate thread to generate hints in parallel. It signalsreadywhen the writer is ready to start sending hints over the socket.
To close hints generation:
#![allow(unused)] fn main() { pub fn close_hints() -> Result<()> }
Place these calls under #[cfg(zisk_hints)] so they are only compiled into the native target used for hints generation:
#![allow(unused)] fn main() { #[cfg(zisk_hints)] { // Initialization / finalization code ... } }
You can review how hints generation is initialized and finalized in the zec-reth guest here:
https://github.com/0xPolygonHermez/zisk-eth-client/blob/main-reth/bin/guest/src/main.rs
5.3 Enable Hints at Compile Time
Once the guest program is set up to generate hints for the native target, it must be compiled with the zisk_hints configuration flag enabled:
RUSTFLAGS='--cfg zisk_hints' cargo build --release
After compiling, executing the guest program will generate the hints. By default — when relying on the entrypoint! macro — the binary file is written to ./tmp/hints.bin; set ZISK_HINTS_OUTPUT to override the path. If you used the manual API instead, the file/socket location follows what was passed to init_hints_file/init_hints_socket.
If a hints file was generated, it can be consumed using the --hints flag in the cargo-zisk commands that support hints (as explained in Hints in CLI Execution).
If you want to display metrics in the console about the number of hints generated during native guest execution, you can additionally compile the guest with the --cfg zisk_hints_metrics flag.
To enable hint support when executing the guest inside the zkVM (ELF guest), you must pass the --hints flag when generating the assembly ROM using the cargo-zisk rom-setup command.
NOTE: Hint processing is not supported when executing the guest ELF file in emulation mode.
5.4 Deterministic Execution Requirement
An important requirement of the hints generation flow is that the native execution that generates the hints must be fully deterministic and always produce hints in the exact same order.
Furthermore, the order of hints generated during native execution must match the order in which the guest program compiled for the zkvm/zisk target expects to receive them. Since the zkVM execution is also deterministic, any divergence in hint ordering between native execution and zkVM execution will result in incorrect behavior.
To guarantee deterministic hint generation, the code paths that directly or indirectly generate hints must avoid:
- The use of threads or parallel execution.
- Data structures such as
HashMap(or any structure based on randomized hash seeds) when iterated in loops that directly or indirectly call precompile/hint functions.
Using threads or iterating over non-deterministically ordered data structures may cause the hint generation order to vary between runs, breaking the required alignment between native and zkVM executions.
5.5 FFI Hints Helper Functions
| Code | Function |
|---|---|
0x0100 | fn hint_sha256(f_ptr: *const u8, f_len: usize); |
0x0200 | fn hint_bn254_g1_add(p1: *const u8, p2: *const u8); |
0x0201 | fn hint_bn254_g1_mul(point: *const u8, scalar: *const u8); |
0x0205 | fn hint_bn254_pairing_check(pairs: *const u8, num_pairs: usize); |
0x0300 | fn hint_secp256k1_ecdsa_address_recover(sig: *const u8, recid: *const u8, msg: *const u8); |
0x0301 | fn hint_secp256k1_ecdsa_verify_and_address_recover(sig: *const u8, msg: *const u8, pk: *const u8); |
0x0380 | fn hint_secp256r1_ecdsa_verify(msg: *const u8, sig: *const u8, pk: *const u8); |
0x0400 | fn hint_bls12_381_g1_add(a: *const u8, b: *const u8); |
0x0401 | fn hint_bls12_381_g1_msm(pairs: *const u8, num_pairs: usize); |
0x0405 | fn hint_bls12_381_g2_add(a: *const u8, b: *const u8); |
0x0406 | fn hint_bls12_381_g2_msm(pairs: *const u8, num_pairs: usize); |
0x040A | fn hint_bls12_381_pairing_check(pairs: *const u8, num_pairs: usize); |
0x0410 | fn hint_bls12_381_fp_to_g1(fp: *const u8); |
0x0411 | fn hint_bls12_381_fp2_to_g2(fp2: *const u8); |
0x0500 | fn hint_modexp_bytes(base_ptr: *const u8, base_len: usize, exp_ptr: *const u8, exp_len: usize, modulus_ptr: *const u8, modulus_len: usize); |
0x0600 | fn hint_verify_kzg_proof(z: *const u8, y: *const u8, commitment: *const u8, proof: *const u8); |
0x0700 | fn hint_keccak256(input_ptr: *const u8, input_len: usize); |
0x0800 | fn hint_blake2b_compress(...); |
0xF0000 | fn hint_input_data(input_data_ptr: *const u8, input_data_len: usize); |
5.6 Custom Hints Generation
To extend the built-in hints, you can generate custom hints for new operations. The first step is to register the new hint in the HintsProcessor, as explained in section Custom Hint Handlers. Once the hint is registered, you can generate hints for it from the guest program using the following FFI function:
#![allow(unused)] fn main() { fn hint_custom(hint_id: u32, data_ptr: *const u8, data_len: usize, is_result: u8); }
and following the same guidelines described for the built-in FFI hint helper functions.
5.7 Using Hints from C Guest Programs
The ziskos crate is published as both an rlib and a staticlib, so C guest programs can link against the resulting .a archive and call the hint lifecycle functions through the C ABI. Two symbols are exposed:
extern void zkvm_init(void);
extern void zkvm_deinit(void);
zkvm_init initializes the hint stream and zkvm_deinit finalizes it. They are no-ops when compiled without --cfg zisk_hints, so the same C code works for both native (hint generation) and zkVM target builds without modification.
A minimal C guest program looks like:
extern void zkvm_init(void);
extern void zkvm_deinit(void);
int main(void) {
zkvm_init();
// Guest logic, including any FFI hint calls
// (hint_sha256, hint_keccak256, hint_input_data, ...)
zkvm_deinit();
return 0;
}
When linking the C guest against ziskos for native hint generation, the same environment variables described in 5.2.1 (ZISK_HINTS_OUTPUT, ZISK_INPUT_FILE) control the file paths used by zkvm_init and the input reader.
The FFI hint helper functions listed in 5.5 FFI Hints Helper Functions are all extern "C" and use the same signatures from C — declare them with extern in your C source and link against the same ziskos archive.
Ziskof
Riscof tests
The following test generates the riscof test files, converts the corresponding .elf files into ZisK ROMs, and executes them providing the output in stdout for comparison against a reference RISCV implementation. This process is not trivial and has been semi-automatized.
First, compile the ZisK Emulator:
$ cargo clean
$ cargo build --release
Second, download and run a docker image from the riscof repository to generate and run the riscof tests:
$ docker run --rm -v ./target/release/ziskemu:/program -v ./riscof/:/workspace/output/ -ti hermeznetwork/ziskof:latest
The test can take a few minutes to complete. Any error would be displayed in red.
Profiling Programs with ZiskEmu
ZiskEmu provides powerful profiling capabilities to analyze the cost and performance characteristics of your programs. This guide explains how to use these features to identify hotspots, optimize your code, and understand resource consumption.
What This Guide Covers
This guide walks you through ZiskEmu's profiling capabilities, progressing from high-level overviews to detailed analysis:
-
Introduction: Understanding profiling costs vs. final costs, symbol-based analysis, and detecting optimization opportunities
-
Basic Profiling: Global statistics showing cost distribution across major categories (base, main, opcodes, precompiles, memory)
-
SDK Report Mode: Streamlined, compact output format ideal for CI/CD and quick checks, with selective section display options
-
Function Name Display Options: Configure how long function names are displayed with compact and no-compact modes
-
Profile Tags: Instrument your code to measure specific sections, with immediate or deferred reporting of steps and costs
-
Firefox Profiler Integration: Export profiling data for advanced visualization and interactive analysis
-
Function-Level Profiling: Identifying which functions consume the most resources with cumulative analysis
-
Customizing ROI Display: Controlling how many functions to show and filtering by patterns
-
Detailed Caller Analysis: In-depth breakdown showing which operations are expensive within each function and who calls them
-
Tracking Function Calls: Logging individual call parameters to analyze usage patterns and optimize for common cases
-
PC Histogram Analysis: Low-level view of the most frequently executed RISC-V instruction sequences
-
Additional Options: Quick reference for other useful flags (steps, progress indicators, formatting)
-
Practical Example: Real-world case study analyzing Ethereum opcode costs in a block validator
Introduction
Understanding Profiling Costs vs. Final Costs
When profiling a program in ZisK, it's important to understand the difference between profiling costs and final costs:
Profiling Costs
Profiling costs represent the individual operational cost accrued directly within a function's own instructions, based on the best-case cost model for each operation. These costs:
- Exclude costs padding or aggregation costs
- Reflect a direct cause-and-effect relationship between code changes and cost variations
- Use the optimal cost for each operation type
- Allow you to observe how small program modifications affect performance
- Are ideal for optimization work because they show the direct impact of your code changes
For example, when you replace a function with a precompiled function or optimize a loop, the profiling cost will immediately reflect this improvement, making it easy to validate that your optimization is working as expected.
Final Costs
Final costs represent the real and exact cost of a specific execution, accounting for the actual resource consumption in the ZisK proving system. The key difference is that final costs measure cost at the instance granularity, not at the individual operation level.
In ZisK's architecture, multiple operations are grouped into instances (execution units in state machines), and the cost is determined by these instances:
-
Instance-based granularity: If you use 1 Keccak operation or 5,242 Keccak operations, you pay for one full Keccak instance. However, if you use 5,243 operations, you need a second instance, effectively doubling the cost for that single additional operation.
-
Planner strategies: The ZisK planner dynamically chooses execution strategies based on the operation mix. For example, depending on how many additions and binary operations you have, the planner might use a Binary state machine, a BinaryAdd state machine, or both. These decisions affect the final cost since each instance type has a different cost structure.
-
Aggregation across function calls: Final costs include both the function's own profiling cost and all costs from functions it calls, summed at the instance level.
Why use profiling costs for optimization? Because profiling costs provide a predictable and proportional metric directly tied to your code changes. When optimizing, you want to see the immediate effect of your changes at the operation level. Final costs, while representing the true execution cost, can show non-linear behavior due to instance boundaries and planning strategies. Once you've optimized based on profiling costs, the final costs will reflect the real resource savings in the proving system.
Example: Keccak Operations
Consider a program that performs Keccak hash operations:
Scenario 1: Using 1,000 Keccak operations
- Profiling cost: Proportional to 1,000 operations
- Final cost: 1 Keccak instance (fits within instance capacity)
Scenario 2: Using 5,000 Keccak operations
- Profiling cost: 5× the cost of Scenario 1 (proportional to operations)
- Final cost: Still 1 Keccak instance (if capacity is 5,242 operations)
Scenario 3: Using 5,243 Keccak operations
- Profiling cost: ~5.24× the cost of Scenario 1 (proportional increase)
- Final cost: 2 Keccak instances (crossed the instance boundary with just 1 extra operation!)
The profiling cost grows linearly with the number of operations, making it easy to predict the impact of adding or removing operations. The final cost, however, stays constant until you cross an instance boundary, then jumps significantly. This is why profiling costs are better for optimization: you can see the effect of every change, while final costs help you understand the actual proving cost in production.
Example: Comparing Optimization Alternatives
Suppose you have implemented two different optimizations for your program, and you need to decide which one is better. The difference between them is 1 million operations:
- Option A: Uses 1M 64-bit ADD operations
- Option B: Uses 1M 64-bit OR operations
In ZisK's architecture, there are specialized instances for 64-bit additions (BinaryAdd) that are much cheaper than the general binary instances (Binary) that can perform ADD, SUB, AND, OR, XOR, and other operations.
Analysis with Profiling Costs:
- Option A (ADD): Lower profiling cost (uses efficient specialized instances)
- Option B (OR): Higher profiling cost (requires general binary instances)
- Clear winner: Option A is better ✓
Analysis with Final Costs (Small Program):
If your program is small and doesn't fill a Binary instance:
- Both options may end up using the same Binary instance
- Final cost: Same for both options (no clear winner)
- Misleading conclusion: No difference between optimizations ✗
Analysis with Final Costs (Large Program):
If your program is larger and already uses separate instances:
- Option A uses a dedicated BinaryAdd instance (cheaper)
- Option B uses a Binary instance (more expensive)
- Final cost: Option A is clearly cheaper ✓
- Correct conclusion: Matches profiling cost analysis
Lesson: Profiling costs consistently show that Option A is better, regardless of program size. Final costs may give conflicting signals depending on whether instance boundaries are crossed. This is why profiling costs are the reliable metric for making optimization decisions—they provide a consistent signal that doesn't depend on the overall program context.
Symbol-Based Analysis
One of ZiskEmu's key advantages is that profiling works on any ELF file without requiring special instrumentation or debug information. The profiler uses symbol information already present in the binary, which means:
- Works with release builds (optimized binaries)
- No need to recompile with special flags
- No runtime overhead during execution
- Analyzes production-ready binaries (not stripped)
Detecting Optimization Opportunities
One of the most powerful uses of ZiskEmu's profiling is identifying where to apply patches and optimizations. The profiling costs help you answer critical questions:
Which crates/libraries are most performant for proof generation?
- Compare different library implementations to see their effect on verification costs
- Test alternative dependencies to find the most ZisK-efficient options
- Evaluate different algorithm implementations (e.g., hash libraries, cryptographic crates, serialization libraries) to determine which performs best in the ZisK proving system
- Make data-driven decisions when choosing between equivalent functionality from different crates
Validating optimizations:
- After applying a optimization or patch, run the profiler again to confirm the profiling cost decreased
- Compare before/after profiles to ensure the optimization is effective
Is patching being applied correctly?
- Verify that precompiles are being used where expected
- Detect cases or paths where generic code is running instead of optimized ZisK-specific implementations
- Identify functions that should be patched but aren't
Where should you apply patches?
- Find hotspot functions that would benefit most from ZisK precompiles
- Identify expensive cryptographic operations (SHA-256, Keccak, etc.) that could use hardware acceleration
- Locate arithmetic-heavy code that could leverage ZisK's optimized arithmetic operations
Example workflow:
- Profile your program to identify expensive functions
- Look for patterns that match available precompiles (hashing, big integer math, etc.)
- Patch the code to use:
- ZisK-optimized implementations
- Precompiles
- Change operations or how they're used, considering you're optimizing for ZisK architecture, not hardware
- Re-profile to verify the profiling cost reduction
This iterative approach, guided by profiling costs, ensures your optimizations target the right areas and produce measurable improvements.
Basic Profiling (statistics)
The simplest way to profile your program is to use the -X (or --stats) flag. This provides an overview of execution statistics including total costs, memory operations, and opcode usage.
Command
ziskemu -e \<elf\> -i \<input\> -X
Output Explanation
REPORT
----------------------------------------
STEPS 92,875,129
COST DISTRIBUTION COST %
------------------------------------------------
BASE 293,601,280 2.57%
MAIN 6,315,508,772 55.22%
OPCODES 1,334,639,984 11.67%
PRECOMPILES 2,565,960,716 22.43%
MEMORY 927,932,629 8.11%
TOTAL 11,437,643,381 100.00%
FROPS 963,440,253 8.42%
RAM USAGE 18,465,008 3.47%
Understanding the Report:
STEPS: The number of processor cycles or instructions executed during program execution. This is an indicator of how long the program is—more steps mean a longer program execution.
COST DISTRIBUTION: This shows the profiling cost (see the Understanding Profiling Costs section for detailed explanation). Each operation is costed individually using the proof area as the metric, which is the best indicator of proof generation time—higher cost means longer proof generation.
The cost is broken down into these categories:
-
BASE: Cost of fixed components such as tables, range checks, and other constant overhead that exists regardless of program logic.
-
MAIN: Cost of the processor itself without operation costs. This is directly proportional to the steps count and represents the base cost of executing instructions.
-
OPCODES: Cost of simple operations performed by the processor (additions, subtractions, etc.) in the format
a operation b = c, flag, where a, b, and c are 64-bit values. These are basic arithmetic and logical operations. -
PRECOMPILES: Cost of complex operations whose parameters don't fit in 64 bits, requiring memory as an exchange system. Examples include:
- 256-bit additions
- Elliptic curve operations
- Keccak hashing
- DMA operations
-
MEMORY: Cost of direct memory operations (read, write) and the additional state machines required for non-aligned memory access. This includes cases where:
- The address is not aligned to 8 bytes
- Operations don't work with 8-byte chunks (e.g., reading a single byte)
-
TOTAL: Sum of all costs. Each category shows the percentage (%) it represents of the total cost.
FROPS (FRequent OPerationS): These are operations that are very frequently used by the processor, such as:
- Adding 1 to a relatively small number (common in loop counters)
- Adding 8 to an address (typical for pointer arithmetic)
- Working with values < 256
These frequent operations are analyzed, detected, and pre-calculated, becoming part of the BASE cost but representing significant savings. In this example, FROPS show 8.42% - this is the cost the program would have if these optimizations were not applied. The actual savings are already reflected in the lower costs of the affected operations.
RAM USAGE: The amount of memory used out of the total available. This information is only available with the default allocator (bump allocator), which:
- Never frees memory - always allocates new memory
- Avoids the CPU cycles needed to manage the entire heap (typically >10% overhead)
- Is recommended as long as sufficient memory is available
- Provides better performance by eliminating heap management costs
Detailed Opcode Breakdown:
Below the summary, you'll see a detailed breakdown of each operation:
COST BY OPCODE COUNT % COST % RANK
-----------------------------------------------------------------------------
OP ltu 1,767,360 1.90% 106,041,600 0.93%
OP lt 389,360 0.42% 23,361,600 0.20%
OP eq 543,251 0.58% 32,595,060 0.28%
OP add 7,086,411 7.63% 177,160,275 1.55% #4
OP sub 693,157 0.75% 41,589,420 0.36%
OP and 3,740,044 4.03% 224,402,640 1.96% #3
OP or 7,482,273 8.06% 448,936,380 3.93% #2
OP xor 1,027,290 1.11% 61,637,400 0.54%
OP add_w 15,804 0.02% 948,240 0.01%
OP sub_w 4,085 0.00% 245,100 0.00%
OP sll 1,551,879 1.67% 82,249,587 0.72%
OP srl 611,361 0.66% 32,402,133 0.28%
OP sra 807,976 0.87% 42,822,728 0.37%
OP srl_w 84,289 0.09% 4,467,317 0.04%
OP sra_w 62 0.00% 3,286 0.00%
OP signextend_b 121,977 0.13% 6,464,781 0.06%
OP signextend_h 1,684 0.00% 89,252 0.00%
OP signextend_w 27,460 0.03% 1,455,380 0.01%
OP pubout 32 0.00% 0 0.00%
OP muluh 86,682 0.09% 8,234,790 0.07%
OP mul 409,765 0.44% 38,927,675 0.34%
OP divu 6,368 0.01% 604,960 0.01%
OP remu 4 0.00% 380 0.00%
OP dma_memcpy 302,551 0.33% 12,707,142 0.11%
OP dma_memcmp 91,454 0.10% 3,841,068 0.03%
OP dma_inputcpy 90 0.00% 3,780 0.00%
OP dma_xmemset 32,381 0.03% 1,360,002 0.01%
OP _dma_pre 140,043 0.15% 12,323,784 0.11%
OP _dma_post 164,752 0.18% 14,498,176 0.13%
OP keccak 32,650 0.04% 2,466,707,500 21.57% #1
OP arith256_mod 714 0.00% 1,016,736 0.01%
OP secp256k1_add 17,688 0.02% 25,187,712 0.22%
OP secp256k1_dbl 19,884 0.02% 28,314,816 0.25%
OP fcall_param 652 0.00% 0 0.00%
OP fcall 172 0.00% 0 0.00%
OP fcall_get 156 0.00% 0 0.00%
FROPS BY OPCODE COUNT HIT COST % RANK
----------------------------------------------------------------------------
FROP ltu 942,288 34.78% 56,537,280 0.49% #4
FROP lt 641,963 62.25% 38,517,780 0.34%
FROP eq 3,273,419 85.77% 196,405,140 1.72% #2
FROP add 1,597,142 18.39% 39,928,550 0.35%
FROP sub 357,871 34.05% 21,472,260 0.19%
FROP and 471,898 11.20% 28,313,880 0.25%
FROP or 1,303,629 14.84% 78,217,740 0.68% #3
FROP xor 105,118 9.28% 6,307,080 0.06%
FROP add_w 75,366 82.67% 4,521,960 0.04%
FROP sub_w 2,177 34.77% 130,620 0.00%
FROP sll 8,729,869 84.91% 462,683,057 4.05% #1
FROP srl 376,620 38.12% 19,960,860 0.17%
FROP sra 5,962 0.73% 315,986 0.00%
FROP srl_w 66,935 44.26% 3,547,555 0.03%
FROP sra_w 60 49.18% 3,180 0.00%
FROP muluh 25,590 22.79% 2,431,050 0.02%
FROP mul 43,603 9.62% 4,142,285 0.04%
FROP divu 42 0.66% 3,990 0.00%
COST BY OPCODE Table:
This table shows detailed statistics for each operation or precompile executed:
- COUNT: Number of times this operation was called
- %: Percentage of steps (cycles) that use this operation
- COST: Total profiling cost for all executions of this operation
- %: Percentage of total cost that this operation represents
- RANK: The top 4 most expensive operations are marked with
#1,#2,#3,#4
Important: Operations are not sorted by cost. They maintain a consistent order across executions to facilitate comparison between different runs. Look for the #N markers to identify the most expensive operations.
For example, in this output, keccak was executed 32,650 times (0.03% of steps) but accounts for 21.41% of the total cost, making it the #1 most expensive operation. This indicates that Keccak operations dominate the cost despite being relatively infrequent.
FROPS BY OPCODE Table:
FROPS (Frequently-used OPerationS) are highly common operations that have been analyzed and optimized through pre-calculation. These include operations like:
- Incrementing by 1 (loop counters)
- Adding 8 (pointer arithmetic)
- Working with small values (< 256)
The table shows:
- COUNT: Number of times the FROP variant was executed
- HIT: Hit rate percentage - how often the frequent operation pattern was matched and the optimization applied
- COST: Total cost with the optimization benefit already applied
- %: Percentage of total cost
- RANK: Top ranked FROPS by cost
High hit rates indicate that the program uses these common patterns frequently, benefiting from the pre-calculated optimizations. The FROPS total shown earlier (8.42% in this example) represents the cost that would be added if these optimizations were not available.
Key Insights from Statistics:
Use this information to:
- Identify which operation types dominate your program's cost
- Find operations with high count but disproportionate cost (optimization candidates)
- Verify that precompiles are being used where expected
- Understand the balance between computation (OPCODES), memory access (MEMORY), and complex operations (PRECOMPILES)
SDK Report Mode
For a cleaner, more compact output ideal for continuous integration or quick checks, use the --sdk flag. This provides a streamlined report with only the essential summary information.
Command
ziskemu -e <elf> -i <input> --sdk
Output Example
╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║ ◆ REPORT SUMMARY ║
╠══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
║ STEPS 92,875,129 ║
║ COST 11,437,643,381 ║
║ RAM 17.61 MB / 64.00 MB ║
╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║ ◆ COST DISTRIBUTION SUMMARY ║
╠══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
║ CATEGORY ∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ COST % ║
║ ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄ ║
║ Base ▎∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ 293,601,280 2.6% ║
║ Main ███████████████████████████████████████████████████████∙∙∙∙ 6,315,508,772 55.2% ║
║ Opcodes ████████████▊∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ 1,334,639,984 11.7% ║
║ Precompiles █████████████████████████▊∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ 2,565,960,716 22.4% ║
║ Memory █████████▎∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙ 927,932,629 8.1% ║
║ ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄ ║
║ Total 11,437,643,381 100.0% ║
╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
The SDK report provides:
- Clean visual layout with box-drawing characters
- Progress bars showing the proportional cost of each category
- Essential metrics only: steps, total cost, RAM usage, and cost distribution
- No detailed breakdowns - ideal for automated testing or quick cost checks
SDK Selective Sections
By default, the SDK report shows only the summary. You can selectively enable additional sections:
Show Opcode Details (--opcodes)
Adds a section showing the top 10 most expensive opcodes with their cost distribution and FROPS hit rates:
ziskemu -e <elf> -i <input> --sdk --opcodes
This adds a COST DISTRIBUTION BY OPCODE section comparing regular operations vs frequent operations (FROPS).
Show Top Functions (--top-functions)
Lists the functions with highest cost. Requires -S to read symbols:
ziskemu -e <elf> -i <input> --sdk --top-functions -S
This adds a TOP COST FUNCTIONS section with automatic compacting of long function names.
Note: Using --top-functions automatically enables symbol reading (-S), so you can omit the -S flag if you only need it for this feature.
Show Profile Tags (--profile-tags)
Displays accumulated profile tag measurements from your code. Requires profile tags in your program (see Profile Tags section):
ziskemu -e <elf> -i <input> --sdk --profile-tags
This shows sections like STEPS PROFILE TAGS and COST PROFILE TAGS if you've instrumented your code with profile markers.
Combining Options
You can combine multiple flags to customize the report:
# Show summary + opcodes + top functions
ziskemu -e <elf> -i <input> --sdk --opcodes --top-functions -S
# Show all optional sections
ziskemu -e <elf> -i <input> --sdk --opcodes --top-functions --profile-tags -S
Behavior Note: If you specify any of the selective flags (--opcodes, --top-functions, --profile-tags), only the summary plus the explicitly requested sections will be shown. If you don't specify any selective flags, you get only the summary.
SDK Width Configuration
Control the width of the SDK report output with --sdk-width:
# Use wider report (150 characters)
ziskemu -e <elf> -i <input> --sdk --sdk-width=150
# Use narrower report (100 characters)
ziskemu -e <elf> -i <input> --sdk --sdk-width=100
Default width: 120 characters. Wider reports provide more space for progress bars and function names, while narrower reports fit better in smaller terminals or log viewers.
Function Name Display Options
When displaying function-level profiling information with -S, function names can become very long, especially in Rust with its fully-qualified paths and generic parameters. ZiskEmu provides options to control how these names are displayed.
Compact Names (Default)
By default, long function names are automatically shortened to 160 characters using intelligent compacting:
# Default behavior - compact to 160 characters
ziskemu -e <elf> -i <input> -X -S
The compacting algorithm:
- Collapses nested generic parameters:
<A<B<C>>>→<A<…>> - Elides intermediate path segments:
std::io::default_write_fmt::Adapter→std::..::Adapter - Maintains readability while reducing length
Custom Compact Length
Specify a different maximum length:
# Compact to 80 characters
ziskemu -e <elf> -i <input> -X -S --compact-names=80
# Compact to 200 characters
ziskemu -e <elf> -i <input> -X -S --compact-names=200
Disable Compacting
To see complete, uncompacted function names:
ziskemu -e <elf> -i <input> -X -S --no-compact-names
When to use each option:
- Default (160 chars): Good balance for most terminal widths and readability
- Shorter (80-100 chars): When viewing in narrow terminals or want very concise output
- Longer (200+ chars): When you need more context from the function path
- No compacting: When you need to see the complete, exact function signatures (e.g., for copy-pasting into code searches)
Profile Tags
Profile tags allow you to instrument your code to measure specific code sections, loops, or algorithms. This is useful when you want to:
- Measure the cost or steps of a specific algorithm
- Compare different implementation approaches
- Track performance of critical sections across multiple calls
- Identify hotspots within a single function
How Profile Tags Work
You add markers in your guest code using macros provided by ziskos. These markers:
- Have zero overhead when not running in the ZiskEmu profiler
- Work at the source code level - you decide what to measure
- Can measure either steps (execution cycles) or cost (profiling cost)
- Can either print immediately or accumulate for a summary report
Setting Up Profile Tags
In your guest code's Cargo.toml, add the ziskos dependency:
[dependencies]
ziskos = { path = "../../ziskos" } # Adjust path as needed
In your guest source code:
use ziskos::{profile_start, profile_end}; use ziskos::{profile_report_start, profile_report_end}; use ziskos::{profile_steps_start, profile_steps_end}; use ziskos::{profile_report_steps_start, profile_report_steps_end}; fn main() { // Example usage in your code profile_start!(hash_computation); let result = expensive_hash_function(&data); profile_end!(hash_computation); // ... more code }
Profile Tag Macros
There are 8 macros organized in 2 dimensions:
Dimension 1 - What to measure:
- Cost macros (
profile_start!/profile_end!): Measure profiling cost - Steps macros (
profile_steps_start!/profile_steps_end!): Measure execution steps
Dimension 2 - When to report:
- Immediate (
profile_start!/profile_end!): Print result after eachend!call - Report (
profile_report_start!/profile_report_end!): Accumulate and show at program end
Immediate Output Macros
Print the measurement immediately after the end! call:
#![allow(unused)] fn main() { // Measure and print COST after each execution profile_start!(my_algorithm); run_my_algorithm(); profile_end!(my_algorithm); // Prints: [my_algorithm] 12345 // Measure and print STEPS after each execution profile_steps_start!(my_loop); for i in 0..1000 { expensive_operation(i); } profile_steps_end!(my_loop); // Prints: [my_loop] 45678 }
Use case: When you want to track each individual execution, or when the measured section is called only once or a few times.
Report Macros
Accumulate measurements and show statistics at the end:
#![allow(unused)] fn main() { for batch in batches { profile_report_start!(process_batch); process_batch(&batch); profile_report_end!(process_batch); } // No output during execution // At program end, you'll see accumulated statistics: // Total, average, min, max for all executions }
Use case: When measuring sections called many times (loops, repeated operations) and you want aggregate statistics rather than individual measurements.
Complete Example
use ziskos::{ profile_start, profile_end, profile_report_start, profile_report_end, profile_steps_start, profile_steps_end, profile_report_steps_start, profile_report_steps_end }; fn main() { // Measure total cost once profile_start!(total_execution); // Accumulate statistics for repeated calls for i in 0..100 { profile_report_steps_start!(loop_iteration); expensive_computation(i); profile_report_steps_end!(loop_iteration); } // Nested measurements profile_steps_start!(data_processing); profile_report_start!(hash_phase); for item in items { compute_hash(item); } profile_report_end!(hash_phase); profile_steps_end!(data_processing); profile_end!(total_execution); }
Viewing Profile Tag Results
To see the accumulated profile tag statistics, add --profile-tags to your command:
# With standard report
ziskemu -e <elf> -i <input> -X --profile-tags
# With SDK report
ziskemu -e <elf> -i <input> --sdk --profile-tags
The output shows aggregated statistics for all profile tags used with the report variants:
PROFILE TAGS STEPS (STEPS, % STEPS, CALLS, AVG, MIN, MAX)
----------------------------------------------------------
10,234,567 11.02% 100 102,345 98,123 125,678 loop_iteration
3,456,789 3.72% 50 69,135 45,000 89,000 hash_phase
PROFILE TAGS COST (COST, % COST, CALLS, AVG, MIN, MAX)
-------------------------------------------------------
1,234,567,890 10.79% 100 12,345,678 10,000,000 15,000,000 total_execution
456,789,012 3.99% 50 9,135,780 5,000,000 12,000,000 hash_phase
Statistics shown:
- TOTAL: Sum of all measurements
- % TOTAL: Percentage of total steps or cost
- CALLS: Number of times the tag was executed
- AVG: Average per call
- MIN: Minimum value observed
- MAX: Maximum value observed
Best Practices
- Use descriptive tag names:
hash_computationis better thantag1 - Choose report vs. immediate based on frequency:
- Few calls (1-10): Use immediate variants
- Many calls (100+): Use report variants
- Match start/end pairs: Always use matching macro pairs (same tag name, same variant)
- Don't nest same tag names: Each tag should represent a unique code section
- Combine with function profiling: Profile tags show "what", function profiling shows "where"
Firefox Profiler Integration
ZiskEmu can export profiling data to Firefox Profiler format, enabling advanced visualization and analysis of your program's execution.
Generating Profiler Data
Use --profiler-output to specify the output file:
# Generate compressed profiler data (recommended)
ziskemu -e <elf> -i <input> -X -S --profiler-output=profile.json.gz
# Generate uncompressed JSON
ziskemu -e <elf> -i <input> -X -S --profiler-output=profile.json
Requirements: The -S flag is required to load symbol information. The -X flag is recommended for complete profiling data.
Default: If you use -X -S without specifying --profiler-output, a file named profile.json.gz is created automatically.
Viewing in Firefox Profiler
- Go to https://profiler.firefox.com
- Click "Load a profile from file"
- Select your
profile.json.gzfile
The Firefox Profiler provides:
- Call tree visualization showing the function call hierarchy
- Flame graphs for identifying performance hotspots
- Timeline view showing execution progress over time
- Function details with cumulative costs
- Search and filtering capabilities
Use Cases
Firefox Profiler is particularly useful when:
- You need to visualize complex call graphs
- Standard text reports are too verbose
- You want to share profiling results with team members
- You need to compare multiple profiling runs
- You want interactive exploration of the call stack
File Format
The exported file follows the Firefox Profiler format specification, making it compatible with other tools that support this format.
Function-Level Profiling
To understand which functions contribute most to your program's cost, add the -S (or --read-symbols) flag to read symbol information from the ELF file.
Command
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S
Output Explanation
When symbol reading is enabled, ZiskEmu simulates a call stack to evaluate functions cumulatively. This means it tracks not only the cycles and cost of each function's own code, but also all the calls made within that function. This cumulative analysis provides a complete picture of each function's contribution to the total execution cost.
Note: Initial calls to _start or _main are filtered out as they represent 100% of the program and don't provide useful optimization insights.
ZiskEmu provides two complementary analyses:
1. TOP STEP FUNCTIONS - Analysis by execution cycles:
TOP STEP FUNCTIONS (STEPS, % STEPS, CALLS, STEPS/CALL, FUNCTION)
----------------------------------------------------------------
54,831,894 59.04% 1 54,831,894 <reth_evm::execute::BasicBlockExecutor<&reth_evm
53,951,767 58.09% 1 53,951,767 <alloy_evm::eth::block::EthBlockExecutor<alloy_e
52,133,363 56.13% 70 744,762 <revm_handler::mainnet_handler::MainnetHandler<r
48,406,973 52.12% 41,793 1,158 <zeth_mpt::mpt::node::Node<zeth_mpt::mpt::memoiz
26,004,168 28.00% 1 26,004,168 <zeth_mpt_state::SparseState as stateless::trie:
21,389,831 23.03% 41,590 514 <zeth_mpt::mpt::node::Node<zeth_mpt::mpt::memoiz
16,104,120 17.34% 1,039 15,499 <revm_context::journal::inner::JournalInner<revm
15,999,662 17.23% 841 19,024 <revm_context::journal::inner::JournalInner<revm
15,635,579 16.84% 1,239 12,619 <revm_database::states::state::State<stateless::
15,498,490 16.69% 388 39,944 <&mut revm_database::states::state::State<statel
15,014,347 16.17% 770 19,499 <revm_context::context::Context<revm_context::bl
14,994,327 16.14% 770 19,473 <revm_context::journal::Journal<&mut revm_databa
14,299,020 15.40% 618 23,137 revm_interpreter::instructions::contract::call_h
14,253,493 15.35% 618 23,063 revm_interpreter::instructions::contract::call_h
14,230,009 15.32% 618 23,025 revm_interpreter::instructions::contract::call_h
13,714,388 14.77% 10,505 1,305 ziskos::zisklib::lib::keccak256::keccak256
Shows for each function:
- STEPS: Total cumulative cycles used by the function (including all nested calls)
- % STEPS: Percentage of total program cycles this function represents
- CALLS: Number of times this function was called
- STEPS/CALL: Average cycles per call to this function
- FUNCTION: Function name from symbol table
2. TOP COST FUNCTIONS - Analysis by profiling cost:
TOP COST FUNCTIONS (COST, % COST, CALLS, COST/CALL, FUNCTION)
-------------------------------------------------------------
5,255,204,123 45.95% 1 5,255,204,123 <reth_evm::execute::BasicBlockExecutor<&reth_evm
5,172,696,823 45.23% 1 5,172,696,823 <alloy_evm::eth::block::EthBlockExecutor<alloy_e
4,997,989,104 43.70% 70 71,399,844 <revm_handler::mainnet_handler::MainnetHandler<r
4,530,507,470 39.61% 41,793 108,403 <zeth_mpt::mpt::node::Node<zeth_mpt::mpt::memoiz
4,014,605,785 35.10% 1 4,014,605,785 <zeth_mpt_state::SparseState as stateless::trie:
3,759,934,537 32.87% 10,505 357,918 ziskos::zisklib::lib::keccak256::keccak256
Shows for each function:
- COST: Total cumulative profiling cost of the function (including all nested calls)
- % COST: Percentage of total program cost this function represents
- CALLS: Number of times this function was called
- COST/CALL: Average profiling cost per call to this function
- FUNCTION: Function name from symbol table
Key insights:
Both tables show cumulative metrics - each function includes the cost/cycles of everything it calls. This helps identify:
- Which high-level functions consume the most resources
- Whether optimization should focus on a function's implementation or the functions it calls
- Functions with high cost per call that might benefit from caching or optimization
- Functions called frequently that could benefit from batching or precompiles
By comparing the STEPS and COST analyses, you can identify cases where functions have many cycles but relatively low cost (efficient operations) versus high cost per cycle (expensive operations like precompiles).
For example, ziskos::zisklib::lib::keccak256::keccak256 shows:
- Called 10,505 times
- 13,714,388 steps (14.77% of total) with ~1,305 steps/call
- 3,759,934,537 cost (32.87% of total) with ~357,918 cost/call
This indicates that while Keccak uses 14.77% of cycles, it represents 32.87% of the total cost - showing it's an expensive operation relative to its cycle count, typical of precompile operations.
Customizing ROI Display
Showing More or Fewer Functions
Use the -T (or --top-roi) flag to control how many top functions are displayed:
# Show top 50 functions
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S -T 50
# Show only top 10 functions
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S -T 10
Specifying the Main Entry Point
If your program's entry point isn't named main, use the -M (or --main-name) flag:
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S -M custom_entry
Filtering Functions by Pattern
For large programs, you may want to focus analysis on specific functions or modules. Use the --roi-filter flag with a regular expression pattern to mark functions of interest:
# Filter functions containing "sha256" in their name
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S --roi-filter "sha256"
# Filter multiple patterns
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S --roi-filter "hash|crypto|encode"
When combined with --top-roi-filter, the display will show only functions that match the specified pattern:
# Show only functions matching the filter pattern
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S \
--roi-filter "keccak" --top-roi-filter
This is useful when you want to:
- Focus optimization efforts on a specific subsystem or module
- Analyze only cryptographic functions
- Compare different implementations of similar functionality
- Filter out noise from unrelated code
Detailed Caller Analysis
The -D (or --top-roi-detail) flag provides an in-depth breakdown of each top function, showing exactly where costs come from and who calls the function. This detailed analysis helps pinpoint optimization opportunities at a granular level.
Command
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S -D
What This Shows
For each top function, the detailed analysis provides:
- Overall metrics: Total steps and cost for the function
- Cost by opcode: Breakdown showing which operations (opcodes and precompiles) consume the most resources within this function, with ranking of the top 4 most expensive operations
- Top step callers: List of functions that call this function, showing:
- Number of calls from each caller
- Total steps attributed to calls from that caller
- Percentage of this function's total steps coming from each caller
This information helps you understand:
- What makes a function expensive (which operations dominate)
- Who is responsible for calling it (caller distribution)
- Where to focus optimization (expensive operations vs. frequent callers)
Output Explanation
DETAIL FUNCTION ziskos::zisklib::lib::keccak256::keccak256
----------------------------------------------------------
STEPS 13,714,388 14.77%
COST 3,759,934,537 32.87%
| COST BY OPCODE COUNT COST % RANK
| ---------------------------------------------------------------------
| OP ltu 28,516 1,710,960 0.05%
| OP add 169,207 4,230,175 0.11%
| OP sub 3,644 218,640 0.01%
| OP and 94,545 5,672,700 0.15%
| OP or 2,489,249 149,354,940 3.97% #2
| OP xor 492,192 29,531,520 0.79% #3
| OP sll 360,008 19,080,424 0.51% #4
| OP dma_memcpy 21,010 882,420 0.02%
| OP dma_xmemset 21,010 882,420 0.02%
| OP _dma_pre 2,346 206,448 0.01%
| OP _dma_post 9,863 867,944 0.02%
| OP keccak 32,650 2,466,707,500 65.61% #1
| TOP STEP CALLERS (calls, steps)
| -------------------------------
| 3,974 9,749,694 71.09% <zeth_mpt_state::SparseState as stateless::trie::State
| 2,332 2,778,890 20.26% <zeth_mpt::mpt::node::Node<zeth_mpt::mpt::memoize::Cac
| 1,284 217,150 1.58% revm_interpreter::instructions::system::keccak256::<re
| 1,266 188,634 1.38% <revm_database::states::state::State<stateless::witnes
| 720 107,280 0.78% <alloy_primitives::bits::bloom::Bloom>::accrue_log
| 429 63,921 0.47% <reth_trie_common::hashed_state::HashedPostState>::fro
| 202 30,098 0.22% <revm_database::states::state::State<stateless::witnes
| 144 350,053 2.55% <alloy_trie::hash_builder::HashBuilder>::update
| 66 102,536 0.75% stateless::recover_block::verify_and_compute_sender
| 58 110,681 0.81% alloy_primitives::utils::keccak256_impl
Understanding the detailed report:
Function Header:
DETAIL FUNCTION ziskos::zisklib::lib::keccak256::keccak256
----------------------------------------------------------
STEPS 13,714,388 14.77%
COST 3,759,934,537 32.87%
Shows the total cumulative steps and profiling cost for this function (including nested calls).
COST BY OPCODE section:
| COST BY OPCODE COUNT COST % RANK
| ---------------------------------------------------------------------
| OP keccak 32,650 2,466,707,500 65.61% #1
| OP or 2,489,249 149,354,940 3.97% #2
| OP xor 492,192 29,531,520 0.79% #3
Breaks down which operations consume resources within this function:
- COUNT: Number of times each operation was executed
- COST: Total profiling cost for all executions
- %: Percentage of this function's total cost
- RANK: Top 4 most expensive operations marked
#1through#4
This shows that keccak precompile dominates this function's cost at 65.61%, making it the primary optimization target.
TOP STEP CALLERS section:
| TOP STEP CALLERS (calls, steps)
| -------------------------------
| 3,974 9,749,694 71.09% <zeth_mpt_state::SparseState...
| 2,332 2,778,890 20.26% <zeth_mpt::mpt::node::Node...
Shows which functions call this function and how steps are distributed:
- First column: Number of calls from this caller
- Second column: Total steps consumed when called from this caller
- Percentage: How much of this function's total steps come from this caller
- Function name: The calling function
This reveals that SparseState is responsible for 71% of this function's execution, making it the primary call path to analyze.
Controlling Detail Level
Use the -C (or --roi-callers) flag to control how many callers are shown in the detailed analysis for each function:
# Show top 20 callers for each function in the detailed report
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S -D -C 20
# Show only top 5 callers for each function
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S -D -C 5
The default value is 10 callers per function. Increasing this number provides more complete call path information but may make the output more verbose.
Tracking Function Calls
Sometimes you need to analyze each individual call to a function to understand:
- Which parameter values are most frequently used
- What patterns exist in the arguments
- Which specific input values trigger expensive code paths
This information is valuable for optimization strategies. For example, if you discover that certain parameter values are very common, you could:
- Add fast paths for those frequent values
- Use lookup tables or caching for common inputs
- Optimize the general case based on typical parameter distributions
How It Works
Use the --track-call-args feature combined with --roi-filter to log parameter values for each call to matching functions:
--roi-filter "pattern": Specifies which functions to track (using a regular expression)--track-call-args N: Specifies how many parameters to log (up to 8, corresponding to RISC-V a0-a7 registers)
Important limitation: The tool logs the raw parameter values from registers. This means:
- For scalar values (integers, booleans): You get the actual value
- For pointers/addresses: You get only the address itself, not the data it points to
- This makes tracking most useful for functions with scalar parameters or when you're interested in address patterns
Command
# Track calls to filtered functions, logging first 4 parameters
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -S \
--roi-filter "hash_function" --track-call-args 4 --track-output-path ./traces
Options
--roi-filter "pattern": Regular expression to match function names you want to track (required)--track-call-args N: Number of parameters to log (1-8, corresponding to RISC-V a0-a7 registers)--track-separator "SEP": Character used to separate parameter values in output (default:;)--track-output-path PATH: Directory where tracking files will be written (default: current directory)
Output
For each matched function, a text file is created (<function_name>.txt) with one line per call:
# ROI: hash_function (PC: 0x00012a0-0x00012f8)
# Separator: ';'
# Parameters: a0-a3
0x7fff8200;0x00000100;0x7fff8400;0x00000000
0x7fff8300;0x00000040;0x7fff8400;0x00000001
0x7fff8450;0x00000080;0x7fff8400;0x00000002
Each line contains the parameter values (in hexadecimal) for one function call, separated by the chosen separator. You can then analyze this file to:
- Find the most common parameter combinations
- Identify patterns in memory addresses
- Detect outliers or unusual parameter values
- Build histograms of value distributions
PC Histogram Analysis
The -H (or --histogram) flag provides a low-level view of the most frequently executed code positions in your program. Unlike function-level profiling, this analysis operates at the program counter (PC) level, showing you the exact assembly instructions that execute most often.
What This Shows
This analysis:
- Identifies the most executed individual instructions by their program counter address
- Groups consecutive instructions together automatically
- Attributes these instruction groups to their parent function (when symbols are loaded with
-S) - Helps identify hot loops, critical paths, and instruction-level bottlenecks
This is particularly useful for:
- Understanding which specific code sequences dominate execution time
- Identifying tight loops that could benefit from optimization
- Verifying that optimizations are affecting the intended code paths
- Finding unexpected hotspots at the instruction level
Command
# Show top 50 most executed instruction groups
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S -H 50
The histogram requires -S to display function names. The number after -H controls how many instruction groups to display.
Output Explanation
TOP PC HISTOGRAM (EXECUTIONS, % EXECUTIONS, PC)
-----------------------------------------------
796,670 0.86% 0x801230b8: lbu r16, 0x0(r14)
796,670 0.86% 0x801230bc: beq r16, r12, 0xffffffd4
1,593,340 1.72% ----------- <revm_bytecode::legacy::raw::LegacyRawBytecode>::into_analyzed
755,644 0.81% 0x801230c0: slli r17, r16, 0x38
755,644 0.81% 0x801230c4: srai r17, r17, 0x38
755,644 0.81% 0x801230c8: bge r15, r17, 0x14
2,266,932 2.44% ----------- <revm_bytecode::legacy::raw::LegacyRawBytecode>::into_analyzed
547,858 0.59% 0x801230dc: addi r14, r14, 0x1
547,858 0.59% 0x801230e0: bltu r14, r10, 0xffffffd8
1,095,716 1.18% ----------- <revm_bytecode::legacy::raw::LegacyRawBytecode>::into_analyzed
429,174 0.46% 0x800a38ec: ld r10, 0x60(r21)
429,174 0.46% 0x800a38f0: lbu r11, 0x0(r10)
429,174 0.46% 0x800a38f4: addi r10, r10, 0x1
429,174 0.46% 0x800a38f8: sd r10, 0x60(r21)
429,174 0.46% 0x800a38fc: slli r10, r11, 0x4
429,174 0.46% 0x800a3900: add r10, r19, r10
429,174 0.46% 0x800a3904: ld r11, 0x8(r10)
429,174 0.46% 0x800a3908: ld r12, 0x180(r21)
429,174 0.46% 0x800a390c: sub r13, r12, r11
429,174 0.46% 0x800a3910: sd r13, 0x180(r21)
429,174 0.46% 0x800a3914: bltu r12, r11, 0x20
429,174 0.46% 0x800a3918: ld r12, 0x0(r10)
429,174 0.46% 0x800a391c: addi r10, r21, 0x0 => copyb
429,174 0.46% 0x800a3920: addi r11, r9, 0x0 => copyb
429,174 0.46% 0x800a3924: jalr r1, r12, 0x0
429,174 0.46% 0x800a3928: lbu r10, 0x68(r21)
429,174 0.46% 0x800a392c: bne r10, r0, 0xffffffc0
7,295,958 7.86% ----------- <revm_handler::mainnet_handler::MainnetHandler<revm_context::evm::Ev
Understanding the histogram:
The output is organized into instruction groups, where each group consists of:
-
Individual instruction lines: Each shows:
- EXECUTIONS: Number of times this specific instruction was executed
- % EXECUTIONS: Percentage of total program steps
- PC: Program counter address in hexadecimal
- Instruction: The RISC-V assembly instruction at that address
-
Group summary line (with dashes):
- Total executions: Sum of all instructions in this group
- % EXECUTIONS: Cumulative percentage for the entire group
- Function name: The function to which these instructions belong
Key insights from the example:
The first group shows a simple loop checking bytes:
796,670 0.86% 0x801230b8: lbu r16, 0x0(r14) # Load byte
796,670 0.86% 0x801230bc: beq r16, r12, 0xffffffd4 # Branch if equal
1,593,340 1.72% ----------- <revm_bytecode::legacy::raw::LegacyRawBytecode>::into_analyzed
This tight 2-instruction sequence executed 796,670 times, representing 1.72% of total execution.
The large group at the bottom represents a complex instruction dispatcher:
429,174 0.46% 0x800a38ec: ld r10, 0x60(r21) # Load from context
...
429,174 0.46% 0x800a392c: bne r10, r0, 0xffffffc0 # Loop back
7,295,958 7.86% ----------- <revm_handler::mainnet_handler::MainnetHandler...
This 17-instruction sequence accounts for 7.86% of total execution, making it a prime optimization target.
When to use histogram analysis:
- After function-level profiling: Once you identify expensive functions, use histograms to see which specific instruction sequences within those functions dominate
- Validating compiler optimizations: Verify that loops are unrolled or optimized as expected
- Finding unexpected hotspots: Sometimes a small instruction sequence accounts for disproportionate execution time
- Comparing implementations: See how different code structures affect instruction-level execution patterns
Additional Options
Show Steps Without Full Statistics
For quick execution time checks without generating full statistics, use the --steps flag:
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin --steps
Progress Indicators
For long-running programs, show progress updates every 16M steps with --with-progress:
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin --with-progress
Disable Thousands Separator
For machine-readable output, disable the thousands separator with --no-thousands-sep:
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X --no-thousands-sep
Complete Example: Comprehensive Profiling
Here's a complete example that uses most profiling features together:
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest \
-i input.bin \
-X \
-S \
-D \
-T 30 \
-C 15 \
-H 50 \
--roi-filter "sha256|hash" \
--track-call-args 6 \
--track-output-path ./profiling_data \
-m
This command will:
- Generate full statistics (
-X) - Read and use symbol information (
-S) - Show detailed caller analysis (
-D) - Display top 30 functions by cost (
-T 30) - Show top 15 callers for each function (
-C 15) - Display top 50 most executed instructions (
-H 50) - Filter to sha256/hash-related functions (
--roi-filter) - Track first 6 parameters of filtered function calls (
--track-call-args) - Save tracking data to ./profiling_data directory
- Show performance metrics (
-m)
Tips for Effective Profiling
Start Simple, Add Detail
Begin with basic statistics (-X) to get an overview, then progressively add more detailed analysis:
- Basic:
ziskemu -e program.elf -i input.bin -X - Functions:
ziskemu -e program.elf -i input.bin -X -S - Callers:
ziskemu -e program.elf -i input.bin -X -S -D - Detailed: Add
-Has needed
Focus on High Impact
Use the final_cost percentage to identify functions with the highest impact. Optimizing a function that represents 50% of execution time will have much more effect than optimizing one at 1%.
Understand Profiling Cost vs. Final Cost
When a function has high final cost but low profiling cost, the optimization opportunity lies in the functions it calls, not in the function itself. Focus your optimization efforts where profiling costs are highest, as these represent direct computational work that can be improved through code changes or patching with precompiles.
Use Filtering for Large Codebases
In programs with hundreds of functions, use --roi-filter to focus on specific subsystems or modules of interest.
Track Representative Inputs
Profile with realistic, representative inputs. The cost distribution can vary significantly based on input characteristics.
Practical Example: Analyzing Ethereum Opcode Costs
This example demonstrates how to analyze the cost distribution of Ethereum opcodes in a real-world client implementation. By filtering for the EVM instruction interpreter functions, we can obtain a detailed breakdown of which Ethereum operations consume the most resources during block validation.
Scenario
You want to understand which Ethereum opcodes are most expensive in terms of ZisK proving costs when validating a specific block. This information helps you:
- Identify which EVM operations would benefit most from optimization
- Understand the cost profile of real-world Ethereum transactions
- Guide decisions about which precompiles or patches to prioritize
Command
target/release/ziskemu \
-S \
-X \
-e ../zisk-eth-client/bin/guests/stateless-validator-reth/target/riscv64ima-zisk-zkvm-elf/release/zec-reth \
-i ../data/benchmark_inputs/24654304_30c8b8.bin \
--roi-filter "revm_interpreter::instructions::" \
--top-roi-filter \
-T 200
What this does:
-S: Load symbol information from the ELF file-X: Generate full statistics with cost breakdown-e <path>: Path to the compiled Ethereum client (reth implementation)-i <input>: Block data to validate (block 24,654,304)--roi-filter "revm_interpreter::instructions::": Filter to show only functions in the EVM instruction interpreter namespace (where all Ethereum opcodes are implemented)--top-roi-filter: Display only the filtered functions in the top ROI lists-T 200: Show top 200 functions (to capture all EVM opcodes)
Expected Output
The output will show the TOP COST FUNCTIONS filtered to only include EVM instruction implementations, giving you a clear view of which Ethereum opcodes dominate the proving cost for this specific block:
TOP COST FUNCTIONS (COST, % COST, CALLS, COST/CALL, FUNCTION)
-------------------------------------------------------------
9,433,353,231 10.32% 5,824 1,619,737 revm_interpreter::instructions::contract::call_helpers::load_acc_
9,396,093,086 10.28% 5,824 1,613,340 revm_interpreter::instructions::contract::call_helpers::load_acco
9,377,741,662 10.26% 5,824 1,610,189 revm_interpreter::instructions::contract::call_helpers::load_acco
8,344,978,788 9.13% 1,695 4,923,291 revm_interpreter::instructions::contract::call::<revm_interpreter
4,599,658,812 5.03% 342,951 13,412 revm_interpreter::instructions::stack::swap::<1, revm_interpreter
2,772,734,752 3.03% 128,956 21,501 revm_interpreter::instructions::memory::mload::<revm_interpreter:
2,580,388,569 2.82% 10,675 241,722 revm_interpreter::instructions::host::sload::<revm_interpreter::i
1,726,257,923 1.89% 105,903 16,300 revm_interpreter::instructions::memory::mstore::<revm_interpreter
1,599,904,068 1.75% 119,289 13,412 revm_interpreter::instructions::stack::swap::<2, revm_interpreter
1,576,416,043 1.72% 13,627 115,683 revm_interpreter::instructions::arithmetic::mulmod::<revm_interpr
1,499,796,900 1.64% 111,825 13,412 revm_interpreter::instructions::stack::swap::<3, revm_interpreter
1,430,041,088 1.56% 106,624 13,412 revm_interpreter::instructions::stack::swap::<4, revm_interpreter
1,045,628,445 1.14% 2,201 475,069 revm_interpreter::instructions::contract::static_call::<revm_inte
896,353,301 0.98% 184,312 4,863 revm_interpreter::instructions::control::jumpi::<revm_interpreter
812,869,552 0.89% 561,374 1,448 revm_interpreter::instructions::stack::push::<1, revm_interpreter
806,652,474 0.88% 465,922 1,731 revm_interpreter::instructions::stack::push::<2, revm_interpreter
763,874,190 0.84% 6,781 112,649 revm_interpreter::instructions::host::sstore::<revm_interpreter::
691,435,073 0.76% 5,682 121,688 revm_interpreter::instructions::system::keccak256::<revm_interpre
669,514,638 0.73% 245,798 2,723 revm_interpreter::instructions::arithmetic::add::<revm_interprete
638,632,995 0.70% 102,549 6,227 revm_interpreter::instructions::arithmetic::mul::<revm_interprete
620,675,903 0.68% 239,701 2,589 revm_interpreter::instructions::control::jump::<revm_interpreter:
527,546,726 0.58% 83,391 6,326 revm_interpreter::instructions::bitwise::shr::<revm_interpreter::
452,376,936 0.49% 302,391 1,496 revm_interpreter::instructions::stack::dup::<2, revm_interpreter:
325,487,994 0.36% 41,683 7,808 revm_interpreter::instructions::bitwise::sar::<revm_interpreter::
311,851,955 0.34% 25,502 12,228 revm_interpreter::instructions::system::codecopy::<revm_interpret
289,141,110 0.32% 120,407 2,401 revm_interpreter::instructions::bitwise::iszero::<revm_interprete
264,613,976 0.29% 176,881 1,496 revm_interpreter::instructions::stack::dup::<3, revm_interpreter:
262,969,735 0.29% 18,608 14,132 revm_interpreter::instructions::system::calldataload::<revm_inter
252,430,047 0.28% 41,031 6,152 revm_interpreter::instructions::bitwise::sgt::<revm_interpreter::
248,940,076 0.27% 1,928 129,118 revm_interpreter::instructions::contract::delegate_call::<revm_in
242,086,315 0.26% 192 1,260,866 revm_interpreter::instructions::host::extcodesize::<revm_interpre
229,785,355 0.25% 10,852 21,174 revm_interpreter::instructions::stack::push::<32, revm_interprete
This filtered view allows you to quickly identify:
- Most expensive opcodes: Which EVM operations have the highest total cost
- Frequently called opcodes: Operations with many calls but lower individual cost
- Optimization targets: Opcodes that would benefit most from ZisK-specific optimizations or precompiles
Important note: With this method, no modification to the ELF file is required. The profiling works directly on the compiled binary using existing symbol information. However, you do need to know the naming convention used for the functions that implement each opcode. In this case, the REVM interpreter uses the namespace revm_interpreter::instructions:: consistently, making it easy to filter all opcode implementations with a single pattern.
Conclusion
ZiskEmu's profiling capabilities provide deep insights into your program's resource consumption and performance characteristics. By understanding profiling and final costs, analyzing regions of interest, and using the various filtering and tracking options, you can effectively identify optimization opportunities and improve the efficiency of your ZisK programs.
Use profiling costs as your primary optimization metric, as they provide a direct cause-and-effect relationship with code changes. This makes them ideal for detecting where patches should be applied, validating that optimizations are working correctly, and ensuring that precompiles are being used where expected.
Remember that profiling works on any ELF file with symbols, including release builds, making it easy to analyze production-ready code without special compilation flags or instrumentation.