Introduction

ZisK is a high-performance zkVM (Zero-Knowledge Virtual Machine) designed to generate zero-knowledge proofs of arbitrary program execution. It enables developers to prove the correctness of a computation without revealing its internal state, making ZisK a powerful tool for privacy-preserving and verifiable computation.

Proving systems traditionally involve complex cryptographic operations that require deep expertise and significant computational resources. ZisK abstracts these complexities by providing an optimized toolstack that minimizes computational overhead, making ZK technology accessible to a broader range of developers. With Rust-based execution and planned multi-language support, ZisK is designed to be developer-friendly while maintaining high performance and robust security.

Why ZisK?

  • High-performance architecture optimized for low-latency proof generation.
  • Rust-based zkVM, with future support for additional languages.
  • No recompilation required across different programs.
  • Standardized prover interface (JSON-RPC, GRPC, CLI).
  • Flexible integration: usable as a standalone service or as a library.
  • Decentralized architecture for trustless proof generation.
  • Optimized proof generation costs for real-world applications.
  • Fully open-source and backed by Polygon zkEVM and Plonky3 technology.

Installation Guide

ZisK can be installed from prebuilt binaries (recommended) or by building the ZisK tools, toolchain and setup files from source.

System Requirements

ZisK currently supports Linux x86_64 and macOS platforms (see note below).

Note: On macOS, proof generation is not yet optimized, so some proofs may take longer to generate.

Required Tools

Ensure the following tools are installed:

  • Rust
  • Git
  • To enable GPU support in ZisK, you must have NVIDIA Driver version 525.60.13 or later installed.
  • If you use zisk-sdk crate, you must also have CUDA Toolkit version 12.9 or later installed.

Installing Dependencies

Ubuntu

Ubuntu 22.04 or higher is required.

Install all required dependencies with:

sudo apt-get install -y xz-utils jq curl build-essential qemu-system libomp-dev libgmp-dev nlohmann-json3-dev protobuf-compiler uuid-dev libgrpc++-dev libsecp256k1-dev libsodium-dev libpqxx-dev nasm libopenmpi-dev openmpi-bin openmpi-common libclang-dev clang gcc-riscv64-unknown-elf

ZisK uses shared memory to exchange data between processes. The system must be configured to allow enough locked memory per process:

$ ulimit -l
unlimited

A way to achieve it is to edit the file /etc/systemd/system.conf and add the line DefaultLimitMEMLOCK=infinity. Reboot for changes to take effect.

macOS

macOS 14 or higher is required.

You must have Homebrew and Xcode installed.

Install all required dependencies with:

brew reinstall jq curl libomp protobuf openssl nasm pkgconf open-mpi libffi nlohmann-json libsodium riscv-tools

Installing ZisK

  1. To install ZisK using ziskup, run the following command in your terminal:

    curl https://raw.githubusercontent.com/0xPolygonHermez/zisk/main/ziskup/install.sh  | bash
    
  2. During installation, ziskup will detect whether CUDA is available on your machine. If so, it will install ZisK binaries with GPU support. Otherwise, you will be prompted to choose between CPU binaries (default) or GPU binaries.

  3. Also during the installation, you will be prompted to select a setup option. You can choose from the following:

    1. Install proving key (default) – Required for generating and verifying proofs.
    2. Install proving key (no constant tree files) – Install proving key but without constant tree files generation.
    3. Install verify key – Needed only if you want to verify proofs.
    4. None – Choose this if you only want to compile programs and execute them using the ZisK emulator.
  4. Verify the Rust toolchain: (which includes support for the riscv64ima-zisk-zkvm compilation target):

    rustup toolchain list
    

    The output should include an entry for zisk, similar to this:

    stable-x86_64-unknown-linux-gnu (default)
    nightly-x86_64-unknown-linux-gnu
    zisk
    
  5. Verify the cargo-zisk CLI tool:

    cargo-zisk --version
    

    It should show cargo-zisk X.X.X [gpu] if the GPU version is installed, or cargo-zisk X.X.X [cpu] otherwise

Updating ZisK

To update ZisK to the latest version, simply run: bash ziskup

You can use the flags --provingkey, --verifykey or --nokey to specify the installation setup and skip the selection prompt.

To install the PLONK proving key (provingKeySnark), run: bash ziskup setup_snark

Option 2: Building from Source

Build ZisK

  1. Clone the ZisK repository:

    git clone https://github.com/0xPolygonHermez/zisk.git
    cd zisk
    
  2. Build ZisK tools:

    cargo build --release
    

    Note: The build process will automatically detect whether CUDA is available on your machine. If so, it will build the GPU-enabled binaries; otherwise, it will build the CPU version. To force the CPU version, use the --features cpu-only flag.

    Note: By default, the build process auto-detects the GPU architecture of the host machine. Use the CUDA_ARCHS environment variable to control which architectures are compiled:

    # Single architecture (faster build — e.g. Ada Lovelace sm_89 / RTX 4090)
    CUDA_ARCHS="89" cargo build --release
    
    # Multiple architectures (e.g. Ada + Hopper)
    CUDA_ARCHS="89,90" cargo build --release
    
    # All major architectures — portable binary for distribution
    # (sm_80, sm_86, sm_89, sm_90, sm_100, sm_120 + PTX forward compatibility)
    # Note: this takes significantly longer to compile
    CUDA_ARCHS="major" cargo build --release
    
  3. Copy the tools to ~/.zisk/bin directory:

    mkdir -p $HOME/.zisk/bin
    cp target/release/cargo-zisk target/release/ziskemu target/release/riscv2zisk target/release/zisk-coordinator target/release/zisk-worker target/release/libziskclib.a $HOME/.zisk/bin
    
  4. Copy required files for assembly rom setup:

    Note: This is only needed on Linux x86_64, since assembly execution is not supported on macOS

    mkdir -p $HOME/.zisk/zisk/emulator-asm
    cp -r ./emulator-asm/src $HOME/.zisk/zisk/emulator-asm
    cp ./emulator-asm/Makefile $HOME/.zisk/zisk/emulator-asm
    cp -r ./lib-c $HOME/.zisk/zisk
    
  5. Add ~/.zisk/bin to your system PATH:

    If you are using bash or zsh:

    PROFILE=$([[ "$(uname)" == "Darwin" ]] && echo ".zshenv" || echo ".bashrc")
    echo >>$HOME/$PROFILE && echo "export PATH=\"\$PATH:$HOME/.zisk/bin\"" >> $HOME/$PROFILE
    source $HOME/$PROFILE
    
  6. Install the ZisK Rust toolchain:

    cargo-zisk toolchain install
    

    Note: This command installs the ZisK Rust toolchain from prebuilt binaries. If you prefer to build the toolchain from source, follow these steps:

    1. Ensure all dependencies required to build the Rust toolchain from source are installed.

    2. Build and install the Rust ZisK toolchain:

    cargo-zisk toolchain build
    
  7. Verify the installation:

    rustup toolchain list
    

    Confirm that zisk appears in the list of installed toolchains.

  8. Verify the cargo-zisk CLI tool:

    cargo-zisk --version
    

    It should show cargo-zisk X.X.X [gpu] if the GPU version is built, or cargo-zisk X.X.X [cpu] otherwise.

Build Setup

Please note that the process can be long, taking approximately 45-60 minutes depending on the machine used.

NodeJS version 20.x or higher is required to build the setup files.

  1. Clone the following repositories in the parent folder of the zisk folder created in the previous section:

    git clone https://github.com/0xPolygonHermez/pil2-compiler.git
    git clone https://github.com/0xPolygonHermez/pil2-proofman.git
    git clone https://github.com/0xPolygonHermez/pil2-proofman-js
    
  2. Install packages:

    (cd pil2-compiler && npm i)
    (cd pil2-proofman-js && npm i)
    
  3. All subsequent commands must be executed from the zisk folder created in the previous section:

    cd zisk
    
  4. Generate fixed data:

    cargo run --release --bin arith_frops_fixed_gen
    cargo run --release --bin binary_basic_frops_fixed_gen
    cargo run --release --bin binary_extension_frops_fixed_gen
    
  5. Compile ZisK PIL:

    node --max-old-space-size=16384 ../pil2-compiler/src/pil.js pil/zisk.pil -I pil,../pil2-proofman/pil2-components/lib/std/pil,state-machines,precompiles -o pil/zisk.pilout -u tmp/fixed -O fixed-to-file
    

    This command will create the pil/zisk.pilout file

  6. Generate setup data: (this step may take 30-45 minutes):

    node --max-old-space-size=16384 --stack-size=8192 ../pil2-proofman-js/src/main_setup.js -a ./pil/zisk.pilout -b build -t ../pil2-proofman/pil2-components/lib/std/pil -u tmp/fixed -r -s ./state-machines/starkstructs.json
    

    This command generates the build/provingKey directory.

    Additionally, to generate the snark wrapper:

    node  ../pil2-proofman-js/src/main_setup_snark.js -b build -t ../pil2-proofman/pil2-components/lib/std/pil -f -w ../powersOfTau28_hez_final_27.ptau -p ./state-machines/publics.json -n plonk
    

    It is stored under the build/provingKeySnark directory.

  7. Copy (or move) the build/provingKey directory to $HOME/.zisk directory:

    cp -R build/provingKey $HOME/.zisk
    

Uninstall Zisk

  1. Uninstall ZisK toolchain:

    rustup uninstall zisk
    
  2. Delete ZisK folder

    rm -rf $HOME/.zisk
    

Quickstart

In this guide, you will learn how to install ZisK, create a simple program and run it using ZisK.

Installation

ZisK currently supports Linux x86_64 and macOS platforms (see note below).

Note: On macOS, proof generation is not yet optimized, so some proofs may take longer to generate.

Ubuntu 22.04 or higher is required.

macOS 14 or higher with Xcode installed is required.

  1. Make sure you have Rust installed.

  2. Install all required dependencies with:

    • Ubuntu:
      sudo apt-get install -y xz-utils jq curl build-essential qemu-system libomp-dev libgmp-dev nlohmann-json3-dev protobuf-compiler uuid-dev libgrpc++-dev libsecp256k1-dev libsodium-dev libpqxx-dev nasm libopenmpi-dev openmpi-bin openmpi-common libclang-dev clang gcc-riscv64-unknown-elf
      
    • macOS:
      brew reinstall jq curl libomp protobuf openssl nasm pkgconf open-mpi libffi nlohmann-json libsodium
      
  3. To install ZisK using ziskup, run the following command in your terminal:

    curl https://raw.githubusercontent.com/0xPolygonHermez/zisk/main/ziskup/install.sh | bash
    

Create a Project

The first step is to generate a new example project using the cargo-zisk new <name> command. This command creates a new directory named <name> in your current directory. For example:

cargo-zisk new sha_hasher
cd sha_hasher

This will create a project with the following structure:

.
├── common
|   ├── src
|   |    └── main.rs
|   └── Cargo.toml
├── guest
|   ├── src
|   |    └── main.rs
|   └── Cargo.toml
├── host
|   ├── src
|   |    └── main.rs
|   ├── bin
|   |    ├── execute.rs
|   |    ├── minimal.rs
|   |    ├── prove.rs
|   |    ├── plonk.rs
|   |    └── run.rs
|   ├── Cargo.toml
|   └── build.rs
└── Cargo.toml

The example program takes a number n as input and computes the SHA-256 hash n times.

Build

The next step is to build the program to generate an ELF file (RISC-V), which will be used later to generate the proof. Execute:

cargo build --release

This command builds the program using the zkvm target. The resulting sha_hasher ELF file (without extension) is generated in the ./target/elf/riscv64ima-zisk-zkvm-elf/release directory.

Execute

Before generating a proof, you can test the program using the ZisK emulator to ensure its correctness:

cargo run --release --bin execute

The emulator will execute the program and display the public outputs:

Public outputs:
  Hash: 0x36c1cb4f826ae42ceba848227e0c5f786178ca9dceca6772e5d728d09c30a2f6
  Iterations: 1000
  Magic number: 0xdeadbeef

These outputs should match the native execution, confirming the program works correctly.

Prove

To generate a cryptographic proof of execution, run:

mkdir tmp
cargo run --release --bin prove

This will:

  1. Execute the program and generate the execution trace
  2. Compute witness values for all state machines
  3. Generate the polynomial commitments
  4. Create the zk-STARK proof

The proof will be saved in the ./tmp directory. This process may take several minutes depending on the program complexity.

Compressed Proof (Optional)

After generating the proof, you can optionally create a compressed version to reduce the proof size:

cargo run --release --bin minimal

This generates an additional compressed proof on top of the existing one using recursive composition. The compressed proof is significantly smaller while maintaining the same security guarantees.

Writing Programs

This document explains how to write or modify a Rust program for execution in ZisK.

Setup

Code changes

Writing a Rust program for ZisK is similar to writing a standard Rust program, with a few minor modifications. Follow these steps:

  1. Modify main.rs file:

    Add the following code to mark the main function as the entry point for ZisK:

    
    #![allow(unused)]
    #![no_main]
    fn main() {
    ziskos::entrypoint!(main);
    }
    
  2. Modify Cargo.toml file:

    Add the ziskos crate as a dependency:

    [dependencies]
    ziskos = { git = "https://github.com/0xPolygonHermez/zisk.git" }
    

Let's show these changes using the example program from the Quickstart section.

Example program

main.rs:

// This example program takes a number `n` as input and computes the SHA-256 hash `n` times sequentially.

// Mark the main function as the entry point for ZisK
#![no_main]
ziskos::entrypoint!(main);

use alloy_sol_types::SolValue;
use common::Output;
use sha2::{Digest, Sha256};

fn main() {
    // Read the input data
    let n: u32 = ziskos::io::read();

    let mut hash = [0u8; 32];

    // Compute SHA-256 hashing 'n' times
    for _ in 0..n {
        let mut hasher = Sha256::new();
        hasher.update(hash);
        let digest = &hasher.finalize();
        hash = Into::<[u8; 32]>::into(*digest);
    }

    let output = Output {
        hash: hash.into(),
        iterations: n,
        magic_number: 0xDEADBEEF,
    };

    println!("Computed hash: {:02x?}", output.hash);
    println!("Iterations: {}", output.iterations);

    let bytes = output.abi_encode();

    println!("Bytes to commit: {:?}", bytes);

    // Write raw ABI-encoded bytes directly (no bincode serialization)
    ziskos::io::commit_slice(&bytes);
}

Cargo.toml:

[package]
name = "guest"
version = "0.1.0"
edition = "2024"

[dependencies]
byteorder = "1.5.0"
sha2 = "0.10.8"
serde = { version = "1.0", default-features = false, features = ["derive"] }
ziskos = { workspace = true }
alloy-sol-types = "1.5.7"
common = { path = "../common" }

Input/Output Data

To read input data in your ZisK program, use the ziskos::io::read() function, which deserializes data from the input:


#![allow(unused)]
fn main() {
// Read a u32 value from input
let n: u32 = ziskos::io::read();
}

You can also read custom types that implement the Deserialize trait:


#![allow(unused)]
fn main() {
// Read a custom struct from input
let my_data: MyStruct = ziskos::io::read();
}

To write public output data, use the ziskos::io::commit_slice() function, which commits a slice to the output:


#![allow(unused)]
fn main() {
    let bytes = output.abi_encode();

    println!("Bytes to commit: {:?}", bytes);

    // Write raw ABI-encoded bytes directly (no bincode serialization)
    ziskos::io::commit_slice(&bytes);
}

You can also use commit() function to output any type that implements the Serialize trait. The data will be serialized and made available as public outputs that can be verified by anyone checking the proof.

Build

Before compiling your program for ZisK, you can test it on the native architecture just like any regular Rust program using the cargo command.

Once your program is ready to run on ZisK, compile it into an ELF file (RISC-V architecture), using the cargo-zisk CLI tool from the guest project folder:

cargo-zisk build

This command compiles the program using the zisk target. The resulting guest ELF file (without extension) is generated in the ./target/elf/riscv64ima-zisk-zkvm-elf/debug directory.

For production, compile the ELF file with the --release flag, similar to how you compile Rust projects:

cargo-zisk build --release

In this case, the guest ELF file will be generated in the ./target/elf/riscv64ima-zisk-zkvm-elf/release directory.

Execute

You can test your compiled program using the emulator before generating a proof. Use the -i (--inputs) flag to specify the location of the input file:

cargo-zisk run --release -i ../host/tmp/input.bin

If the program requires a large number of ZisK steps, you might encounter the following error:

Error during emulation: EmulationNoCompleted
Error: Error executing Run command

To resolve this, use ziskemu directly and increase the number of execution steps using the -n (--max-steps) flag. For example:

ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i ../host/tmp/input.bin -n 10000000000

Metrics and Statistics

Performance Metrics

You can get performance metrics related to the program execution in ZisK using the -m (--log-metrics) flag in ziskemu tool:

ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i ../host/tmp/input.bin -m

The output will include details such as execution time, throughput, and clock cycles per step:

process_rom() steps=4450270 duration=0.0436 tp=102.0505 Msteps/s freq=3504.0000 34.3359 clocks/step
...

Execution Statistics

You can get statistics related to the program execution in Zisk using the -p (--profiling) flag with summary in cargo-zisk:

cargo-zisk run --release -i ../host/tmp/input.bin -p summary

The output will include details such as cost definitions, total cost, opcode statistics, etc:

R╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║  ◆ REPORT SUMMARY                                                                                                    ║
╠══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
║  STEPS                                                                                                    4,450,270  ║
║  COST                                                                                                   787,338,404  ║
║  RAM                                                                                            0.00 MB / 507.75 MB  ║
╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║  ◆ COST DISTRIBUTION SUMMARY                                                                                         ║
╠══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
║  CATEGORY                                                                                               COST      %  ║
║  ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄  ║
║  Base         █████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░     293,601,280  37.3%  ║
║  Main         ██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░     302,618,360  38.4%  ║
║  Opcodes      █████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░     174,799,164  22.2%  ║
║  Precompiles  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░         234,155   0.0%  ║
║  Memory       ██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░      16,085,445   2.0%  ║
║  ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄  ║
║  Total                                                                                           787,338,404 100.0%  ║
╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║  ◆ COST DISTRIBUTION BY OPCODE                                            ║  ◆ OPS vs FROPS                          ║
╠══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
║  OPCODE                                                      COST      %  ║      OPS + FROPS           FROPS      %  ║
║  ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄  ║  ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄  ║
║  xor                      █░░░░░░░░░░░░░░░░░░░░░░      41,398,920   5.3%  ║       42,240,480         841,560   2.0%  ║
║  or                       █░░░░░░░░░░░░░░░░░░░░░░      36,646,620   4.7%  ║       38,881,560       2,234,940   5.7%  ║
║  srl_w                    █░░░░░░░░░░░░░░░░░░░░░░      34,606,615   4.4%  ║       36,040,000       1,433,385   4.0%  ║
║  sll                      █░░░░░░░░░░░░░░░░░░░░░░      30,019,783   3.8%  ║       34,007,662       3,987,879  11.7%  ║
║  add                      ░░░░░░░░░░░░░░░░░░░░░░░      16,846,475   2.1%  ║       16,998,100         151,625   0.9%  ║
║  and                      ░░░░░░░░░░░░░░░░░░░░░░░      12,917,580   1.6%  ║       13,456,080         538,500   4.0%  ║
║  signextend_w             ░░░░░░░░░░░░░░░░░░░░░░░         849,590   0.1%  ║          849,590               0   0.0%  ║
║  signextend_b             ░░░░░░░░░░░░░░░░░░░░░░░         848,053   0.1%  ║          848,053               0   0.0%  ║
║  srl                      ░░░░░░░░░░░░░░░░░░░░░░░         429,883   0.1%  ║          439,953          10,070   2.3%  ║
║  dma_xmemset              ░░░░░░░░░░░░░░░░░░░░░░░         200,496   0.0%  ║                                          ║
║  ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄  ║  ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄  ║
║  Total                                                175,033,319  22.2%  ║      184,735,683       9,702,364   5.3%  ║
╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║  ◆ TOP COST FUNCTIONS                                                                                                ║
╠══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
║   0 sha2::sha256::compress256                                           ████████████░░░░░░░░     473,976,966  60.2%  ║
║   1 std::io::stdio::_print                                              ░░░░░░░░░░░░░░░░░░░░       4,290,957   0.5%  ║
║   2 core::fmt::write                                                    ░░░░░░░░░░░░░░░░░░░░       4,258,155   0.5%  ║
║   3 <alloc::vec::Vec<u8> as core::fmt::Debug>::fmt                      ░░░░░░░░░░░░░░░░░░░░       3,852,860   0.5%  ║
║   4 <core::fmt::builders::DebugSet>::entry                              ░░░░░░░░░░░░░░░░░░░░       3,746,448   0.5%  ║
║   5 <std::..::Adapter<…> as core::fmt::Write>::write_str                ░░░░░░░░░░░░░░░░░░░░       2,549,696   0.3%  ║
║   6 <&u8 as core::fmt::Debug>::fmt                                      ░░░░░░░░░░░░░░░░░░░░       2,193,178   0.3%  ║
║   7 <u8 as core::fmt::Display>::fmt                                     ░░░░░░░░░░░░░░░░░░░░       2,105,434   0.3%  ║
║   8 <std::..::LineWriterShim<…> as std::io::Write>::write_all           ░░░░░░░░░░░░░░░░░░░░       1,953,802   0.2%  ║
║   9 <core::fmt::Formatter>::pad_integral                                ░░░░░░░░░░░░░░░░░░░░       1,820,586   0.2%  ║
║  10 core::slice::memchr::memrchr                                        ░░░░░░░░░░░░░░░░░░░░         843,066   0.1%  ║
║  11 memset                                                              ░░░░░░░░░░░░░░░░░░░░         499,356   0.1%  ║
║  12 <std::io::buffered::bufwriter::BufWriter<…>>::flush_buf             ░░░░░░░░░░░░░░░░░░░░         202,008   0.0%  ║
║  13 sys_write                                                           ░░░░░░░░░░░░░░░░░░░░         196,791   0.0%  ║
║  14 <core::fmt::Formatter>::pad_integral::write_prefix                  ░░░░░░░░░░░░░░░░░░░░         190,411   0.0%  ║
║  15 memcpy                                                              ░░░░░░░░░░░░░░░░░░░░         117,529   0.0%  ║
║  16 ziskos::io::commit_slice                                            ░░░░░░░░░░░░░░░░░░░░          85,079   0.0%  ║
║  17 <alloy_primitives::..::FixedBytes<…> as core::fmt::Debug>::fmt      ░░░░░░░░░░░░░░░░░░░░          57,891   0.0%  ║
║  18 <u32 as core::fmt::Display>::fmt                                    ░░░░░░░░░░░░░░░░░░░░          29,674   0.0%  ║
║  19 <core::fmt::Formatter as core::fmt::Write>::write_str               ░░░░░░░░░░░░░░░░░░░░          19,363   0.0%  ║
║  20 <core::fmt::Formatter>::debug_list                                  ░░░░░░░░░░░░░░░░░░░░          13,582   0.0%  ║
║  21 <core::fmt::builders::DebugList>::finish                            ░░░░░░░░░░░░░░░░░░░░          13,189   0.0%  ║
║  22 <…>::initialize::<…>                                                ░░░░░░░░░░░░░░░░░░░░           7,830   0.0%  ║
║  23 <u32>::_fmt_inner                                                   ░░░░░░░░░░░░░░░░░░░░           7,338   0.0%  ║
║  24 std::io::stdio::print_to_buffer_if_capture_used                     ░░░░░░░░░░░░░░░░░░░░           6,165   0.0%  ║
╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

Prove

Program Setup

Before generating a proof, you need to generate the program setup files. This must be done the first time after building the program ELF file, or any time it changes:

cargo-zisk program-setup

The program setup files will be generated in the cache directory located at $HOME/.zisk.

To clean the cache directory content, use the following command:

cargo-zisk utils clean-cache --all

Generate Proof

To generate a proof, run the following command:

cargo-zisk prove -i ../host/tmp/input.bin -o proof.bin

In this command:

  • -i (--input) specifies the input file location.
  • -o (--output) determines the output directory (in this example proof).

Note: If you have installed the GPU version of the ZisK binaries, you can use the --gpu flag to enable GPU acceleration during proof generation.

If the process is successful, you should see a message similar to:

...
INFO: --- PROVE SUMMARY ------------------------
INFO: Proof Time: 5.097 seconds
INFO: Execution completed in 5097ms, steps: 4450272
INFO: Execution summary: Proofman 4910ms + Execution 34ms + Count&Plan 17ms + Count&Plan MO 0ms

Concurrent Proof Generation

Zisk proofs can be generated using multiple processes concurrently to improve performance and scalability. The standard MPI (Message Passing Interface) approach is used to launch these processes, which can run either on the same server or across multiple servers.

To execute a Zisk proof using multiple processes, use the following command:

mpirun --bind-to none -np <num_processes> -x OMP_NUM_THREADS=<num_threads_per_process> -x RAYON_NUM_THREADS=<num_threads_per_process> target/release/cargo-zisk <zisk arguments>

In this command:

  • <num_processes> specifies the number of processes to launch.
  • <num_threads_per_process> sets the number of threads used by each process via the OMP_NUM_THREADS and RAYON_NUM_THREADS environment variables.
  • --bind-to none prevents binding processes to specific cores, allowing the operating system to schedule them dynamically for better load balancing.

Running a Zisk proof with multiple processes enables efficient workload distribution across multiple servers. On a single server with many cores, splitting execution into smaller subsets of cores generally improves performance by increasing concurrency. As a general rule, <num_processes> * <num_threads_per_process> should match the number of available CPU cores or double that if hyperthreading is enabled.

The total memory requirement increases proportionally with the number of processes. If each process requires approximately 25GB of memory, running P processes will require roughly (25 * P)GB of memory. Ensure that the system has sufficient available memory to accommodate all running processes.

Verify Proof

To verify a generated proof, use the following command:

cargo-zisk verify -p proof.bin

In this command:

  • -p (--proof) specifies the final proof file generated with cargo-zisk prove.
  • The remaining flags specify the files required for verification; they are optional, set by default to the files found in the $HOME/.zisk directory.

Precompiles

Precompiles are built-in system functions within ZisK’s operating system that accelerate computationally expensive and frequently used operations such as the Keccak-f permutation and Secp256k1 addition and doubling.

These precompiles improve proving efficiency by offloading intensive computations from ZisK programs to dedicated, pre-integrated sub-processors.

How Precompiles Work

Precompiles are primarily used to patch third-party crates, replacing costly operations with system calls. This ensures that commonly used cryptographic primitives like Keccak hashing and elliptic curve operations can be efficiently executed within ZisK programs.

Typically, precompiles are used to patch third-party crates that implement these operations and are then used as dependencies in the Zisk programs we write.

You can see here an example of the patched tiny-keccak crate.

Available Precompiles in ZisK

Below is a summary of the precompiles currently available in ZisK:

Distributed Execution

Generating a ZisK proof means proving the full execution trace of a program. For real workloads, that trace is too large and too slow to prove on a single machine. A ZisK cluster splits the trace into pieces, proves each in parallel on separate machines, and aggregates the results into a single final proof. Throughput and latency scale with the number of machines you give it.

This guide covers the three things you need to run distributed proving: the cluster's architecture, a single-host quickstart that gets a job through the binaries, and the production path that deploys the same binaries on bare Linux hosts with systemd.


Architecture

A ZisK cluster is two binaries: a single zisk-coordinator and one or more zisk-worker instances.

                    ┌─────────────────────────┐
                    │    Host application     │
                    │     (RemoteClient)      │
                    └────────────┬────────────┘
                                 │
                                 │ gRPC :7000
                                 │ prove request
                                 ▼
    ╔════════════════════════════════════════════════════════╗
    ║                    ZisK cluster                        ║
    ║                                                        ║
    ║         ┌──────────────────────────────────┐           ║
    ║         │        zisk-coordinator          │           ║
    ║         │     :7000   :50051   :9090       │           ║
    ║         └───┬──────────┬──────────┬────────┘           ║
    ║             │          │          │                    ║
    ║      assign │   assign │   assign │                    ║
    ║    segments │ segments │ segments │                    ║
    ║             ▼          ▼          ▼                    ║
    ║      ┌──────────┐ ┌──────────┐ ┌──────────┐            ║
    ║      │ worker 1 │ │ worker 2 │ │ worker 3 │            ║
    ║      └─────┬────┘ └─────┬────┘ └─────┬────┘            ║
    ║            │            │            │                 ║
    ║    segment │    segment │    segment │                 ║
    ║      proof │      proof │      proof │                 ║
    ║            ▼            ▼            ▼                 ║
    ║         ┌──────────────────────────────┐               ║
    ║         │      Aggregation tree        │               ║
    ║         └──────────────┬───────────────┘               ║
    ╚════════════════════════│═══════════════════════════════╝
                             │
                             ▼
                    ┌─────────────────┐
                    │   Final proof   │
                    └────────┬────────┘
                             │
                             │ return proof
                             ▼
                    ┌─────────────────────────┐
                    │    Host application     │
                    └─────────────────────────┘

The coordinator

The coordinator is the only stateful process in the cluster. It exposes a public gRPC interface that hosts use to submit proof requests, poll job status, and retrieve results. From the host's point of view, the coordinator is the only endpoint it ever talks to; workers are an invisible implementation detail.

Internally, the coordinator splits each job into segments, assigns them to workers, and returns the final proof. It also caches the proving keys derived from each uploaded guest ELF, so subsequent jobs for the same program skip the expensive setup step.

Workers

Workers are the proving processes. Each worker connects outbound to the coordinator and waits for proof assignments. Workers are stateless across jobs, holding only the segments they are currently proving. You can add, remove, or restart them without touching the coordinator or losing cluster state.

The first worker to send its partial proof to the coordinator is automatically promoted to aggregator for that job. The aggregator collects the remaining segment proofs and assembles the final proof, then returns it to the coordinator.

Proving pipeline

Once a job is submitted, the coordinator selects workers from the available pool and runs three phases:

  1. Partial contributions. Each assigned worker processes its segments and returns partial challenges. The coordinator collects them and derives a single global challenge.
  2. Prove. The coordinator broadcasts the global challenge to all workers. Each worker computes its partial proofs and returns them.
  3. Aggregation. The first worker to deliver its partial proof is promoted to aggregator and builds a binary aggregation tree, folding the remaining partial proofs in as they land and returning the final proof to the coordinator.
Client            Coordinator            Workers
     │                    │                    │
     │  prove(request)    │                    │
     ├───────────────────>│                    │
     │                    │  assign segments   │
     │                    ├───────────────────>│
     │                    │                    │
     │       ╔════════════╧════════════════════╧════════════╗
     │       ║   Phase 1: Partial contributions             ║
     │       ╚════════════╤════════════════════╤════════════╝
     │                    │ partial challenges │
     │                    │<───────────────────┤
     │                    │                    │
     │       ╔════════════╧════════════════════╧════════════╗
     │       ║   Phase 2: Prove                             ║
     │       ╚════════════╤════════════════════╤════════════╝
     │                    │  global challenge  │
     │                    ├───────────────────>│
     │                    │                    │
     │                    │   partial proofs   │
     │                    │<───────────────────┤
     │                    │                    │
     │       ╔════════════╧════════════════════╧════════════╗
     │       ║   Phase 3: Aggregation                       ║
     │       ║   ┌──────────────────────────────┐           ║
     │       ║   │  First worker to reply       │           ║
     │       ║   │  becomes aggregator          │           ║
     │       ║   └──────────────────────────────┘           ║
     │       ╚════════════╤════════════════════╤════════════╝
     │                    │     aggregate      │
     │                    ├───────────────────>│
     │                    │                    │
     │                    │    final proof     │
     │                    │<───────────────────┤
     │   return proof     │                    │
     │<───────────────────┤                    │
     │                    │                    │
     ▼                    ▼                    ▼

Quickstart: single-host cluster

This brings up one coordinator and one worker on the same machine, then submits a real proving job. It is the smallest deployment that exercises the production binaries end-to-end.

Prerequisites

  • Rust toolchain (cargo --version should work)
  • ~32 GB free RAM (Assembly emulator preallocates large shared regions)
  • Zisk installed. Follow installation guide.

Clone the repo:

git clone https://github.com/0xPolygonHermez/zisk.git
cd zisk

Start the coordinator

zisk-coordinator

The coordinator binds three default ports on startup:

PortPurpose
7000Client-facing gRPC API. Host applications connect here.
50051Worker-facing gRPC port. Workers connect here.
9090Prometheus metrics endpoint and /health liveness probe.

If the coordinator exits with Address already in use, override the offending port:

zisk-coordinator --api-port 8000 --cluster-port 60000 --metrics-port 5245

Start a worker

In a second terminal:

zisk-worker --config distributed/deploy/config/worker.toml

If you built ZisK with CUDA support and want the worker to use the GPU, append --gpu.

worker.toml points the worker at http://127.0.0.1:50051, advertises ten compute units, and sets the log level to info. On a successful handshake:

INFO registered as worker <random-uuid> (capacity 10)

The coordinator logs the matching side:

INFO worker registered: <random-uid> capacity=10

Health check

With the coordinator and worker both running, verify the cluster in two steps: a liveness probe and an end-to-end proving job.

Liveness probe. In a third terminal:

curl http://127.0.0.1:9090/health

A healthy coordinator returns 200 OK with an empty body.

Smoke-test proof. Submit a real job from the included example:

cd examples/sha-hasher/host
cargo run --release --bin prove-remote

The prove-remote binary builds a ProverClient::remote("http://127.0.0.1:7000"), uploads the guest ELF, and waits for the final proof. End-to-end: the coordinator splits the trace into segments and hands them to the worker, the worker produces the STARK proofs. Terminals 1 and 2 show the matching coordinator and worker activity.

CLI references

A handful of operational knobs are CLI-only and not exposed in the TOML:

FlagDefaultDescription
--proving-key~/.zisk/provingKeyPath to the proving-key folder
--elf(none)Path to the ELF file
--shared-tablesfalseShare tables when running in a cluster
--verify-constraintsfalseVerify constraints after witness gen
-n, --number-threads-witness(none)Threads for witness computation
-g, --gpufalseEnable GPU mode (CUDA build only)
-t, --max-streams(none)Maximum GPU streams

CLI flags override the config file for one-off testing:

zisk-coordinator --api-port 8000 --cluster-port 60000 --log-level debug
zisk-worker --coordinator-url http://prod-coord:50051 --compute-capacity 32

Deployment with scripts

This section deploys the same two binaries on bare hosts under systemd, the canonical path for a ZisK cluster.

Prerequisites

  • ~32 GB free RAM (for Assembly emulator to preallocate large shared regions)

Install the coordinator

On the coordinator host run:

curl https://raw.githubusercontent.com/0xPolygonHermez/zisk/refs/heads/main/distributed/deploy/scripts/coordinator/install.sh | sudo bash

The script:

  • Creates the zisk system user and group (home /var/empty, no login)
  • Drops the zisk-coordinator-server binary at /usr/local/bin/
  • Writes the config to /etc/zisk/coordinator.toml (or installs the example if none provided)
  • Creates the working directory at /var/lib/zisk with a pre-made .zisk/cache subdir, owned by the service user
  • Writes a hardened systemd unit at /etc/systemd/system/zisk-coordinator.service (Linux) or a launchd plist at /Library/LaunchDaemons/ plus a newsyslog rotation rule (macOS)
  • Runs systemctl enable --now (or launchctl load) unless --no-start / --no-enable is passed

Verify the service:

  • In Linux:
sudo systemctl status zisk-coordinator
sudo journalctl -u zisk-coordinator -f
  • In macOS:
sudo launchctl print system/com.zisk.coordinator
sudo tail -f /var/log/zisk/zisk-coordinator-server.log

If the service is failed, the logs above show the underlying error (most often a port conflict or a missing config field).

Configure the coordinator

Every setting is optional; the binary falls back to a built-in default for anything you leave out.

Override precedence (later wins): built-in defaults → config file → ZISK_COORDINATOR_* environment variables → CLI flags.

Edit /etc/zisk/coordinator.toml:

[service] — coordinator identity.

SettingDefaultNotes
name"ZisK Coordinator"Shown in logs and status output.
environmentdevelopmentOne of development, staging, production. Use production.

[server] — client-facing gRPC API.

SettingDefaultNotes
host0.0.0.0Listen address. Bind to a specific interface to restrict access.
port7000Client gRPC port. CLI: --api-port, env: ZISK_COORDINATOR_API_PORT.
shutdown_timeout_seconds30Drain time after a shutdown signal before forced exit.

[coordinator] — worker-facing port and core tuning.

SettingDefaultNotes
port50051Worker gRPC port. CLI: --cluster-port, env: ZISK_COORDINATOR_CLUSTER_PORT.
config_file(none)Optional path to a coordinator-core tuning file.

[metrics] — Prometheus endpoint.

SettingDefaultNotes
enabledtrueSet false to disable /metrics. /health stays available either way.
host0.0.0.0Listen address for the scrape endpoint.
port9090Scrape port. CLI: --metrics-port, env: ZISK_COORDINATOR_METRICS_PORT.

[logging] — what gets logged and where.

SettingDefaultNotes
levelinfotrace, debug, info, warn, error. RUST_LOG takes precedence.
formatprettypretty, json (production aggregators), or compact.
file_path(none)Rotating daily log file. Leave unset on systemd hosts; journald captures stdout.

After editing:

  • In Linux:
sudo systemctl restart zisk-coordinator
  • In macOS:
sudo launchctl kickstart -k system/com.zisk.coordinator

Install workers

Run the installer, with the following command:

  • In Linux:
curl https://raw.githubusercontent.com/0xPolygonHermez/zisk/refs/heads/main/distributed/deploy/scripts/worker/install.sh | sudo bash
  • In macOS:
curl https://raw.githubusercontent.com/0xPolygonHermez/zisk/refs/heads/main/distributed/deploy/scripts/worker/install.sh | sudo bash -s -- --no-mpi

This script:

  • Creates the zisk system user and group (home /var/empty, no login)
  • Drops the zisk-worker binary at /usr/local/bin/
  • Writes the config to /etc/zisk/worker.toml (or installs the example if none provided)
  • Creates the working directory at /var/lib/zisk with a pre-made .zisk/cache subdir, owned by the service user
  • Writes a hardened systemd unit at /etc/systemd/system/zisk-worker.service (Linux) or a launchd plist at /Library/LaunchDaemons/ plus a newsyslog rotation rule (macOS)
  • Runs systemctl enable --now (or launchctl load) unless --no-start / --no-enable is passed

Verify the service:

  • In Linux:
sudo systemctl status zisk-worker
sudo journalctl -u zisk-worker -f
  • In macOS:
sudo launchctl print system/com.zisk.worker
sudo tail -f /var/log/zisk/zisk-worker-server.log

The worker starts immediately and uses its default coordinator URL (http://127.0.0.1:50051).

Note: the default URL only works when the worker runs on the same host as the coordinator. When deploying workers on separate hosts, edit [coordinator].url in /etc/zisk/worker.toml to point at the coordinator's worker-facing port (50051 by default), then restart the service. Confirm registration in the coordinator log:

INFO worker registered: <random-uuid> capacity=10

Configure the worker

Every setting is optional; the binary falls back to a built-in default for anything you leave out.

Override precedence (later wins): built-in defaults → config file → ZISK_WORKER_* environment variables → CLI flags.

Edit /etc/zisk/worker.toml:

[worker] — identity, capacity, on-disk location.

SettingDefaultNotes
worker_idrandom UUIDPin to e.g. the hostname so log correlation works at scale.
compute_capacity.compute_units10Start at one unit per physical CPU core (minus two for OS overhead), plus one per GPU stream.
environmentdevelopmentdevelopment or production.
inputs_folder/var/lib/zisk-worker/inputsWhere the worker writes intermediate input files. Override only for a faster disk or separate partition.

[coordinator] — registration target.

SettingDefaultNotes
urlhttp://127.0.0.1:50051gRPC URL of the coordinator's worker-facing port.

[connection] — reaction to network trouble.

SettingDefaultNotes
reconnect_interval_seconds5Backoff between reconnect attempts when the coordinator is unreachable.
heartbeat_timeout_seconds30How long to wait for a heartbeat before treating the connection dead.

[logging] — same shape as the coordinator's [logging] table.

After editing:

  • In Linux:
sudo systemctl restart zisk-worker
  • In macOS:
sudo launchctl kickstart -k system/com.zisk.worker

Add more workers

Run the install script on as many hosts as you want. All workers register against the same coordinator and receive work proportional to their advertised capacity.

   ┌──────────────────────────────┐
   │      Application host        │
   │  ┌────────────────────────┐  │
   │  │     host program       │  │
   │  │     (RemoteClient)     │  │
   │  └───────────┬────────────┘  │
   └──────────────│───────────────┘
                  │
                  │ :7000
                  ▼
   ┌──────────────────────────────┐
   │      Coordinator host        │
   │  ┌────────────────────────┐  │
   │  │    zisk-coordinator    │  │
   │  │  :7000  :50051  :9090  │  │
   │  └───────────▲────────────┘  │
   └──────────────│───────────────┘
                  │
        ┌─────────┼─────────┐
        │ :50051  │ :50051  │ :50051
        │         │         │
   ┌────┴────┐┌───┴─────┐┌──┴──────┐
   │ Worker  ││ Worker  ││ Worker  │
   │ host A  ││ host B  ││ host C  │
   │(32 unit)││(32 unit)││(16 unit)│
   │┌───────┐││┌───────┐││┌───────┐│
   ││zisk-  ││││zisk-  ││││zisk-  ││
   ││worker ││││worker ││││worker ││
   │└───────┘││└───────┘││└───────┘│
   └─────────┘└─────────┘└─────────┘

Hints Stream

The hints stream accelerates proof generation by offloading expensive operations outside the zkVM execution, then feeding the results back as verifiable data through a high-performance, parallel pipeline. Hints are preprocessed results that allow operations to be handled externally while remaining fully verifiable inside the VM. The system supports two categories of hints:

  1. Precompile hints: Cryptographic operations (SHA-256, Keccak-256, elliptic curve operations, pairings, etc.) that are computationally expensive inside a zkVM.
  2. Input hints: Data that needs to be passed to the zkVM as input during execution.

The system is designed around three core principles:

  1. Pre-computing results outside the VM: The guest program emits hint requests describing the operation and its inputs.
  2. Streaming results back: A dedicated pipeline processes these requests in parallel, maintaining order, and feeds results to the prover via shared memory.
  3. Verifying inside the VM: The zkVM circuits verify that the precomputed results are correct, avoiding the cost of computing them inside the zkVM.
flowchart LR
    A["Guest program<br/><small>Emits hints request</small>"] --> B["ZiskStream"]
    B --> C["HintsProcessor<br/><small>Parallel engine</small>"]
    C --> D["StreamSink<br/><small>ASM emulator/file output</small>"]

Table of Contents

  1. Hint Format and Protocol
  2. Using Hints with the SDK
  3. Hints in Distributed Execution
  4. Custom Hint Handlers
  5. Generating Hints in Guest Programs

1. Hint Format and Protocol

1.1. Hint Request Format

Hints are transmitted as a stream of u64 values. Each hint request consists of a header (1 u64) followed by data (N u64 values).

┌─────────────────────────────────────────────────────────────┐
│                         Header (u64)                        │
├·····························································┤
│      Hint Code (32 bits)           Length (32 bits).        │
├─────────────────────────────────────────────────────────────┤
│                        Data[0] (u64)                        │
├─────────────────────────────────────────────────────────────┤
│                        Data[1] (u64)                        │
├─────────────────────────────────────────────────────────────┤
│                             ...                             │
├─────────────────────────────────────────────────────────────┤
│                       Data[N-1] (u64)                       │
└─────────────────────────────────────────────────────────────┘
where N = ceil(Length / 8)
  • Hint Code (upper 32 bits): Control code or Data Hint Type
  • Length (lower 32 bits): Payload data size in bytes. The last u64 may contain padding bytes.

1.2. Control Hint Types:

The following control codes are defined:

  • 0x00 (START): Start a new hint stream. Resets processor state and sequence counters. Must be the first hint in the first batch.
  • 0x01 (END): End the current hint stream. The processor will wait for all pending hints to be processed before returning. Must be the last hint in its batch; only a CTRL_START may follow in a subsequent batch.
  • 0x02 (CANCEL): [Reserved for future use] Cancel current stream and stop processing further hints.
  • 0x03 (ERROR): [Reserved for future use] Indicate an error has occurred; stop processing further hints.

Control codes are for control only and do not have any associated data (Length should be zero).

1.3. Data Hint Types

For data hints, the hint code (32 bits) is structured as follows:

  • Bit 31 (MSB): Pass-through flag. When set, the data bypasses computation and is forwarded directly to the sink.
  • Bits 0-30: The hint type identifier (control, built-in, or custom code). (e.g., HINT_SHA256, HINT_BN254_G1_ADD, HINT_SECP256K1_RECOVER, etc.)

Example: A SHA-256 hint (0x0100) with a 32-byte input:

Header: 0x00000100_00000020
Data[0]: first_8_input_bytes_as_u64
Data[1]: next_8_input_bytes_as_u64
Data[2]: next_8_input_bytes_as_u64
Data[3]: last_8_input_bytes_as_u64

The same hint with the pass-through flag set (bit 31), forwarding pre-computed data directly to the sink without invoking the SHA-256 handler:

Header: 0x80000100_00000020

1.3.1 Stream Batching

The hints protocol supports chunking for individual hints that exceed the transport’s message size limit (currently 128 KB). Each message in the stream contains either a single complete hint or one chunk of a larger hint — hints are never combined in the same message.

When a hint exceeds the size limit, it must be split into multiple sequential chunks, each sent as a separate message. Each chunk includes a header specifying the total length of the complete hint, allowing the receiver to reassemble all chunks before processing. For example, a hint with a 300 KB payload would be split into three messages:

Message 2: Header (code + total length), Data[0..N] (second 128 KB chunk)
Message 3: Header (code + total length), Data[0..M] (final 44 KB chunk)

The receiver buffers incoming chunks and reassembles them based on the total length specified in the header before invoking the hint handler. This allows the system to handle arbitrarily large hints while respecting transport limitations.

1.3.2 Pass-Through Hints

When bit 31 of the hint code is set (e.g., 0x8000_0000 | actual_code), the hint is marked as pass-through:

  • The data payload is forwarded directly to the sink without invoking any handler.
  • No worker thread is spawned; the data is queued immediately in the reorder buffer.
  • This is useful for pre-computed results that don't need processing.

1.4. Hint Code Types

CategoryCode RangeDescription
Control0x0000-0x000FStream lifecycle management
Built-in0x0100-0x0800Cryptographic precompile operations
Input0xF0000Input data hints
CustomUser-definedApplication-specific handlers

Note: Custom hint codes can technically use any value not occupied by control or built-in codes. By convention, codes 0xA000-0xFFFF are recommended for custom use to avoid future conflicts as new built-in types are added. The processor does not enforce a range restriction — any unrecognized code is treated as custom.

1.4.1. Control Codes

Control codes manage the stream lifecycle and do not carry computational data:

CodeNameDescription
0x0000CTRL_STARTResets processor state. Must be the first hint in the first batch.
0x0001CTRL_ENDSignals end of stream. Blocks until all pending hints complete. Must be the last hint.
0x0002CTRL_CANCEL[Reserved for future use] Cancels the current stream. Sets error flag and stops processing.
0x0003CTRL_ERROR[Reserved for future use] External error signal. Sets error flag and stops processing.

1.4.2. Built-in Hint Types

CodeNameDescription
0x0100Sha256SHA-256 hash computation
0x0200Bn254G1AddBN254 G1 point addition
0x0201Bn254G1MulBN254 G1 scalar multiplication
0x0205Bn254PairingCheckBN254 pairing check
0x0300Secp256k1EcdsaAddressRecoverSecp256k1 ECDSA address recovery
0x0301Secp256k1EcdsaVerifyAddressRecoverSecp256k1 ECDSA verify + address recovery
0x0380Secp256r1EcdsaVerifySecp256r1 (P-256) ECDSA verification
0x0400Bls12_381G1AddBLS12-381 G1 point addition
0x0401Bls12_381G1MsmBLS12-381 G1 multi-scalar multiplication
0x0405Bls12_381G2AddBLS12-381 G2 point addition
0x0406Bls12_381G2MsmBLS12-381 G2 multi-scalar multiplication
0x040ABls12_381PairingCheckBLS12-381 pairing check
0x0410Bls12_381FpToG1BLS12-381 map field element to G1
0x0411Bls12_381Fp2ToG2BLS12-381 map field element to G2
0x0500ModExpModular exponentiation
0x0600VerifyKzgProofKZG polynomial commitment proof verification
0x0700Keccak256Keccak-256 hash computation
0x0800Blake2bCompressBlake2b compression function

1.4.3. Input Hint Type

Input hints allow passing data to the zkVM during execution. Unlike precompile hints that are processed by worker threads, input hints are forwarded directly to a separate inputs sink.

CodeNameDescription
0xF0000InputInput data for the zkVM

The input hint payload format is:

  • First 8 bytes: Length of the input data (as u64 little-endian)
  • Remaining bytes: The actual input data, padded to 8-byte alignment

Input hints are not processed by the parallel worker pool; instead, they are immediately submitted to the inputs sink for consumption by the zkVM.

1.4.4. Custom Hint Types

Custom hint types allow users to define their own hint handlers for application-specific logic. Users can register custom handlers via the HintsProcessor builder API, providing a mapping from hint code to a processing function (see Custom Hint Handlers). By convention, codes in the range 0xA000-0xEFFFF are recommended for custom use to avoid conflicts with current and future built-in types. If a data hint is received with an unregistered code, the processor returns an error and stops processing immediately.

1.5. Stream Protocol

A valid hint stream follows this protocol:

CTRL_START                          ← Reset state, begin stream
  [Hint_1] [Hint_2] ... [Hint_N]   ← Data hints (precompile, input, or custom)
CTRL_END                            ← Wait for completion, end stream

2. Consuming Hints

Once a guest program has produced a hints binary file (see Section 5), you can feed it to the prover either programmatically through the ZisK SDK or via the ZisK CLI.

Note: Hints are only supported with the Assembly executor. The emulator-based executor does not use the hints pipeline.

2.1 SDK

Load the file with ZiskHints::from_file and pass it to .hints(...) on the executor:

use anyhow::Result;
use zisk_sdk::{ExecutorKind, GuestProgram, ProverClient, ZiskStdin, ZiskHints};

#[tokio::main]
async fn main() -> Result<()> {
    let elf_path = "hints/example/zec-reth.elf";
    let program = GuestProgram::from_uri(elf_path)?;

    let hints_path = "hints/example/24654300_hints.bin";
    let hints = ZiskHints::from_file(hints_path)?;

    let client = ProverClient::embedded()
        .executor(ExecutorKind::Assembly)
        .build()?;

    client.upload(&program).run()?;
    client.setup(&program).with_hints().run()?.await?;

    let result = client
        .execute(&program, ZiskStdin::new())
        .hints(hints)
        .executor(ExecutorKind::Assembly)
        .run()?
        .await?;

    println!(
        "Program executed successfully: {} cycles in {:.2?} ms",
        result.get_execution_steps(),
        result.get_execution_time()
    );

    Ok(())
}

Notes:

  • Setup must be run with .with_hints() so the assembly ROM is generated with hint support enabled. Without it, the prover will not consume the hints stream.
  • ZiskHints::from_file loads the binary produced by the guest's hint generation. The returned value can be reused across multiple .execute(...) / .prove(...) calls.
  • The same pattern works for prove, verify-constraints, and stats operations exposed by ProverClient.

A complete runnable example is available at examples/hints/host/src/main.rs.

2.2 CLI

Four cargo-zisk commands accept a --hints flag pointing to the hints file: execute, prove, verify-constraints, and stats. Pass the path with the file:// scheme:

--hints file://path      → File stream reader

Example:

cargo-zisk prove --elf program.elf --hints file:///abs/path/hints.bin

--hints is mutually exclusive with --inputs (-i): if you provide hints, the inputs are recovered from the hint stream itself rather than from a separate input file.

3. Hints in Distributed Execution

In the distributed proving system, hints are received by the coordinator and broadcasted to all workers via gRPC. The coordinator runs a relay that validates incoming hint messages, assigns sequence numbers for ordering, and dispatches them to workers asynchronously. Workers buffer incoming messages and reorder them by sequence number before processing. The processed hints are then submitted to the sink in the correct order. There is another mode where workers can load hints from a local path/URI instead of streaming from the coordinator, which is useful for debugging.

3.1. Architecture

flowchart TD
    A["Guest program<br/><small>Emits hints request</small>"] --> B

    subgraph H["Coordinator"]
        B["ZiskStream"]
        B --> C["Hints Relay<br/><small>Validates<br>Broadcast to all workers (async)</small>"]
    end

    C --> E["Worker 1<br/><small>Stream incoming hints + Reorder</small>"]
    C --> F["Worker 2<br/><small>Stream incoming hints + Reorder</small>"]
    C --> G["Worker N<br/><small>Stream incoming hints + Reorder</small>"]

    E --> E1["HintsProcessor<br/><small>Parallel engine</small>"]
    E1 --> E2["StreamSink<br/><small>ASM emulator/file output</small>"]

    F --> F1["HintsProcessor<br/><small>Parallel engine</small>"]
    F1 --> F2["StreamSink<br/><small>ASM emulator/file output</small>"]

    G --> G1["HintsProcessor<br/><small>Parallel engine</small>"]
    G1 --> G2["StreamSink<br/><small>ASM emulator/file output</small>"]

    style H fill:transparent,stroke-dasharray: 5 5

When the coordinator receives a hint request from the guest program, it parses the incoming u64 stream, validates control codes, assigns sequence numbers for ordering, and broadcasts the data to all workers.

Three message types are sent over gRPC to workers:

StreamMessageKindWhenPayload
StartOn CTRL_STARTNone
DataFor each data batchSequence number + raw bytes
EndOn CTRL_ENDNone

Each worker receives the stream of hints, buffers them if they arrive out of order, and sends them to the HintsProcessor for parallel processing. The HintsProcessor ensures that results are submitted to the sink in the original order.

3.2. Hints Mode Configuration

When calling the coordinator with .hints() prepares to receive hints from the coordinator. A hints system can be configured in two ways:

  • Streaming mode: Workers receive hints from the coordinator via gRPC. This is the default and recommended mode for production, as it allows real-time processing of hints as they are generated.
  • Path mode: Workers load hints from a local path/URI. This is useful for debugging or when hints are pre-generated and stored in a file. In this mode, the coordinator does not send hints to workers; instead, each worker reads the hints directly from the specified path.

3.2.1 Coordinator Hints Streaming Mode

The transport for the live hints stream is chosen on the SDK side by constructing a ZiskStream and passing it as the hints source on the execute/prove call.

ConstructorTransport
ZiskStream::unix()Unix domain socket at an auto-assigned path under /tmp/
ZiskStream::unix_at("/path")Unix domain socket at an explicit path
ZiskStream::quic("quic://host:port")QUIC transport (use quic://127.0.0.1:0 to let the OS pick a port)
ZiskStream::grpc()gRPC push transport (data pushed to the coordinator via PushJobInput)

Example launching a prove job with hints streamed over a Unix socket:

use anyhow::Result;
use zisk_sdk::{ExecutorKind, GuestProgram, ProverClient, ZiskStdin, ZiskStream};

#[tokio::main]
async fn main() -> Result<()> {
    let program = GuestProgram::from_uri("hints/example/zec-reth.elf")?;

    let client = ProverClient::remote("http://127.0.0.1:7000").build()?;
    let hints = ZiskStream::unix();

    let prove_handle = client
        .prove(&program, ZiskStdin::new())
        .hints(hints.clone())
        .executor(ExecutorKind::Assembly)
        .run()?;

    let proof = prove_handle.await?;

    Ok(())
}

Switching transports is a one-line change at the call site — replace ZiskStream::unix() with ZiskStream::grpc() or ZiskStream::quic("quic://0.0.0.0:0").

3.2.2 Worker Hints non-Streaming Mode

Non-streaming mode is also selected from the SDK call. Instead of constructing a ZiskStream, build a ZiskHints from a pre-generated file (or in-memory bytes) and pass it to .hints(...). The coordinator skips broadcasting in this case — each worker loads the hints directly from the URI baked into the ZiskHints value. This is useful for debugging or when hints are pre-generated.

ConstructorSource
ZiskHints::from_file("/path")Hints binary on disk (file path or file:// URI)
ZiskHints::memory(bytes)Hints already loaded into memory
ZiskHints::from(&value)Serializable Rust value (encoded with bincode)

Example launching a prove job that loads hints from a file:

use anyhow::Result;
use zisk_sdk::{ExecutorKind, GuestProgram, ProverClient, ZiskStdin, ZiskHints};

#[tokio::main]
async fn main() -> Result<()> {
    let program = GuestProgram::from_uri("hints/example/zec-reth.elf")?;

    let client = ProverClient::remote("http://127.0.0.1:7000").build()?;
    let hints = ZiskHints::from_file("/var/lib/zisk/hints/24654300_hints.bin")?;

    let proof = client
        .prove(&program, ZiskStdin::new())
        .hints(hints)
        .executor(ExecutorKind::Assembly)
        .run()?
        .await?;

    Ok(())
}

The same ZiskHints value can be reused across multiple .execute(...) / .prove(...) calls. As with streaming mode, no coordinator or worker flags are required to switch between sources — the SDK call decides.

4. Custom Hint Handlers

Register custom handlers via the builder pattern:


#![allow(unused)]
fn main() {
let processor = HintsProcessor::builder(my_sink)
    .custom_hint(0xA000, |data: &[u64]| -> Result<Vec<u64>> {
        // Custom processing logic
        Ok(vec![data[0] * 2])
    })
    .custom_hint(0xA001, |data| {
        // Another custom handler
        Ok(transform(data))
    })
    .build()?;
}

Requirements:

  • Handler function must be Fn(&[u64]) -> Result<Vec<u64>> + Send + Sync + 'static.
  • Custom hint codes should not conflict with built-in codes (0x0000-0x0700). By convention, use codes in the range 0xA000-0xFFFF.

5. Generating Hints in Guest Programs

To generate hints from the guest program you need to follow these steps and requirements:

  1. Emit hint requests: Patch your code or dependent crates to call the external FFI Hints helper functions that generate the hints input data required later by the HintsProcessor. See FFI Hints Helper Functions for the list of available built-in FFI Hints helper functions, or Custom Hints Generation to learn how to generate custom hints from the guest program.
  2. Add the ziskos crate to your guest Cargo.toml.
  3. Initialize and finalize the hint stream: Call the hints init and close functions immediately before and after the section of code that executes precompile logic.
  4. Enable hints at compile time: Compile your guest program with RUSTFLAGS='--cfg zisk_hints' for the native target to activate hint code generation and FFI helper functions in the ziskos crate.
  5. Ensure deterministic execution: Verify that both the native execution that generates hints and the guest compiled for the zkvm/zisk target execute deterministically and produce/consume hints in the exact same order. See Deterministic Execution Requirement.

To illustrate these steps, consider the zec-reth guest program, which executes and verifies Ethereum Mainnet blocks using the ZisK zkVM:

https://github.com/0xPolygonHermez/zisk-eth-client/tree/main-reth/bin/guest

5.1 Emit Hint Requests

zec-reth relies on reth crates, which expose a Crypto trait that allows a guest program to override precompile implementations. This enables zkVM-optimized implementations while also emitting hints so the computation can be performed outside the zkVM.

For example, the BN254 elliptic curve addition (bn254_g1_add) implementation for the Crypto trait can be found here:

https://github.com/0xPolygonHermez/zisk-eth-client/blob/86b71b39d35efb9894696cab115a1177f3e47dbf/crates/guest-reth/src/crypto/impls.rs#L87

In that file, two target-specific implementations are provided: one for zkvm/zisk and one for native (non-zkVM) targets. When compiling with --cfg zisk_hints for the native target, the zkVM-specific implementation emits a hint request using the FFI helper:


#![allow(unused)]
fn main() {
#[cfg(zisk_hints)]
unsafe {
    pub fn hint_bn254_g1_add(p1: *const u8, p2: *const u8);
}
}

This call generates the hint input data using the exact input values that will later be used by the ZisK zkVM when executing the zkvm/zisk target code. This hint input data is consumed later by the HintsProcessor, allowing the bn254_g1_add computation to be performed outside the zkVM while remaining fully verifiable inside the circuit.

After the hint generation, execution continues in the native target code to compute the bn254_g1_add result.

From the guest program, we generate hints containing the input data for the corresponding zisklib functions (in this example, the bn254_g1_add_c function). These zisklib functions may internally invoke one or more precompiles to produce the final result.

When the hints are processed by the HintsProcessor, it executes the same zisklib function using the implementation code for the zkvm/zisk target. This produces the exact precompile results expected when executing the guest ELF inside the zkVM.

As a result, for each zisklib function invocation, the HintsProcessor may generate one or more precompile hint results corresponding to the precompile inputs originally emitted by the guest.

5.2 Initialize/Finalize Hint Stream

When using the ziskos::entrypoint!(main) macro, hint generation is initialized and finalized automatically around your guest entry function. You only need to compile with --cfg zisk_hints (see 5.3) and, optionally, set environment variables to control the output paths.

The macro expands to roughly:

fn main() {
    zkvm_init();         // initialize hints
    super::ZISK_ENTRY();  // your guest entry function
    zkvm_deinit();       // closes hints
}

zkvm_init and zkvm_deinit are also exposed as extern "C" symbols so they can be called from C guest programs (see 5.7 Using Hints from C Guest Programs).

5.2.1 Environment Variables

VariableDescriptionDefault
ZISK_HINTS_OUTPUTPath to the hints binary file written by zkvm_init../tmp/hints.bin
ZISK_INPUT_FILEPath to the input file consumed by read_input_slice.build/input.bin

The ./tmp/ directory is created automatically if it does not exist.

5.2.2 Manual API

If you need finer control (e.g., streaming hints over a Unix socket, configuring a debug file, providing a synchronization signal to the host), call the lower-level functions directly instead of relying on the entrypoint! macro.


#![allow(unused)]
fn main() {
pub fn init_hints_file(hints_file_path: PathBuf, ready: Option<oneshot::Sender<()>>) -> Result<()>
}

Stores the generated hints in the file specified by hints_file_path.


#![allow(unused)]
fn main() {
pub fn init_hints_socket(
    socket_path: PathBuf,
    debug_file: Option<PathBuf>,
    write_flush_threshold: Option<usize>,
    ready: Option<oneshot::Sender<()>>,
) -> Result<()>
}

Sends the hints through the Unix socket specified by socket_path.

  • The optional debug_file stores a copy of the hints sent through the socket, useful for later debugging.
  • The optional write_flush_threshold controls the buffered-write flush size; None uses the default.
  • The optional ready parameter can be used for synchronization with the host when the guest is executed in a separate thread to generate hints in parallel. It signals ready when the writer is ready to start sending hints over the socket.

To close hints generation:


#![allow(unused)]
fn main() {
pub fn close_hints() -> Result<()>
}

Place these calls under #[cfg(zisk_hints)] so they are only compiled into the native target used for hints generation:


#![allow(unused)]
fn main() {
#[cfg(zisk_hints)]
{
    // Initialization / finalization code
    ...
}
}

You can review how hints generation is initialized and finalized in the zec-reth guest here:

https://github.com/0xPolygonHermez/zisk-eth-client/blob/main-reth/bin/guest/src/main.rs

5.3 Enable Hints at Compile Time

Once the guest program is set up to generate hints for the native target, it must be compiled with the zisk_hints configuration flag enabled:

RUSTFLAGS='--cfg zisk_hints' cargo build --release

After compiling, executing the guest program will generate the hints. By default — when relying on the entrypoint! macro — the binary file is written to ./tmp/hints.bin; set ZISK_HINTS_OUTPUT to override the path. If you used the manual API instead, the file/socket location follows what was passed to init_hints_file/init_hints_socket.

If a hints file was generated, it can be consumed using the --hints flag in the cargo-zisk commands that support hints (as explained in Hints in CLI Execution).

If you want to display metrics in the console about the number of hints generated during native guest execution, you can additionally compile the guest with the --cfg zisk_hints_metrics flag.

To enable hint support when executing the guest inside the zkVM (ELF guest), you must pass the --hints flag when generating the assembly ROM using the cargo-zisk rom-setup command.

NOTE: Hint processing is not supported when executing the guest ELF file in emulation mode.

5.4 Deterministic Execution Requirement

An important requirement of the hints generation flow is that the native execution that generates the hints must be fully deterministic and always produce hints in the exact same order.

Furthermore, the order of hints generated during native execution must match the order in which the guest program compiled for the zkvm/zisk target expects to receive them. Since the zkVM execution is also deterministic, any divergence in hint ordering between native execution and zkVM execution will result in incorrect behavior.

To guarantee deterministic hint generation, the code paths that directly or indirectly generate hints must avoid:

  • The use of threads or parallel execution.
  • Data structures such as HashMap (or any structure based on randomized hash seeds) when iterated in loops that directly or indirectly call precompile/hint functions.

Using threads or iterating over non-deterministically ordered data structures may cause the hint generation order to vary between runs, breaking the required alignment between native and zkVM executions.

5.5 FFI Hints Helper Functions

CodeFunction
0x0100fn hint_sha256(f_ptr: *const u8, f_len: usize);
0x0200fn hint_bn254_g1_add(p1: *const u8, p2: *const u8);
0x0201fn hint_bn254_g1_mul(point: *const u8, scalar: *const u8);
0x0205fn hint_bn254_pairing_check(pairs: *const u8, num_pairs: usize);
0x0300fn hint_secp256k1_ecdsa_address_recover(sig: *const u8, recid: *const u8, msg: *const u8);
0x0301fn hint_secp256k1_ecdsa_verify_and_address_recover(sig: *const u8, msg: *const u8, pk: *const u8);
0x0380fn hint_secp256r1_ecdsa_verify(msg: *const u8, sig: *const u8, pk: *const u8);
0x0400fn hint_bls12_381_g1_add(a: *const u8, b: *const u8);
0x0401fn hint_bls12_381_g1_msm(pairs: *const u8, num_pairs: usize);
0x0405fn hint_bls12_381_g2_add(a: *const u8, b: *const u8);
0x0406fn hint_bls12_381_g2_msm(pairs: *const u8, num_pairs: usize);
0x040Afn hint_bls12_381_pairing_check(pairs: *const u8, num_pairs: usize);
0x0410fn hint_bls12_381_fp_to_g1(fp: *const u8);
0x0411fn hint_bls12_381_fp2_to_g2(fp2: *const u8);
0x0500fn hint_modexp_bytes(base_ptr: *const u8, base_len: usize, exp_ptr: *const u8, exp_len: usize, modulus_ptr: *const u8, modulus_len: usize);
0x0600fn hint_verify_kzg_proof(z: *const u8, y: *const u8, commitment: *const u8, proof: *const u8);
0x0700fn hint_keccak256(input_ptr: *const u8, input_len: usize);
0x0800fn hint_blake2b_compress(...);
0xF0000fn hint_input_data(input_data_ptr: *const u8, input_data_len: usize);

5.6 Custom Hints Generation

To extend the built-in hints, you can generate custom hints for new operations. The first step is to register the new hint in the HintsProcessor, as explained in section Custom Hint Handlers. Once the hint is registered, you can generate hints for it from the guest program using the following FFI function:


#![allow(unused)]
fn main() {
fn hint_custom(hint_id: u32, data_ptr: *const u8, data_len: usize, is_result: u8);
}

and following the same guidelines described for the built-in FFI hint helper functions.

5.7 Using Hints from C Guest Programs

The ziskos crate is published as both an rlib and a staticlib, so C guest programs can link against the resulting .a archive and call the hint lifecycle functions through the C ABI. Two symbols are exposed:

extern void zkvm_init(void);
extern void zkvm_deinit(void);

zkvm_init initializes the hint stream and zkvm_deinit finalizes it. They are no-ops when compiled without --cfg zisk_hints, so the same C code works for both native (hint generation) and zkVM target builds without modification.

A minimal C guest program looks like:

extern void zkvm_init(void);
extern void zkvm_deinit(void);

int main(void) {
    zkvm_init();

    // Guest logic, including any FFI hint calls
    // (hint_sha256, hint_keccak256, hint_input_data, ...)

    zkvm_deinit();
    return 0;
}

When linking the C guest against ziskos for native hint generation, the same environment variables described in 5.2.1 (ZISK_HINTS_OUTPUT, ZISK_INPUT_FILE) control the file paths used by zkvm_init and the input reader.

The FFI hint helper functions listed in 5.5 FFI Hints Helper Functions are all extern "C" and use the same signatures from C — declare them with extern in your C source and link against the same ziskos archive.

Ziskof

Riscof tests

The following test generates the riscof test files, converts the corresponding .elf files into ZisK ROMs, and executes them providing the output in stdout for comparison against a reference RISCV implementation. This process is not trivial and has been semi-automatized.

First, compile the ZisK Emulator:

$ cargo clean
$ cargo build --release

Second, download and run a docker image from the riscof repository to generate and run the riscof tests:

$ docker run --rm -v ./target/release/ziskemu:/program -v ./riscof/:/workspace/output/ -ti  hermeznetwork/ziskof:latest

The test can take a few minutes to complete. Any error would be displayed in red.

Profiling Programs with ZiskEmu

ZiskEmu provides powerful profiling capabilities to analyze the cost and performance characteristics of your programs. This guide explains how to use these features to identify hotspots, optimize your code, and understand resource consumption.

What This Guide Covers

This guide walks you through ZiskEmu's profiling capabilities, progressing from high-level overviews to detailed analysis:

  1. Introduction: Understanding profiling costs vs. final costs, symbol-based analysis, and detecting optimization opportunities

  2. Basic Profiling: Global statistics showing cost distribution across major categories (base, main, opcodes, precompiles, memory)

  3. SDK Report Mode: Streamlined, compact output format ideal for CI/CD and quick checks, with selective section display options

  4. Function Name Display Options: Configure how long function names are displayed with compact and no-compact modes

  5. Profile Tags: Instrument your code to measure specific sections, with immediate or deferred reporting of steps and costs

  6. Firefox Profiler Integration: Export profiling data for advanced visualization and interactive analysis

  7. Function-Level Profiling: Identifying which functions consume the most resources with cumulative analysis

  8. Customizing ROI Display: Controlling how many functions to show and filtering by patterns

  9. Detailed Caller Analysis: In-depth breakdown showing which operations are expensive within each function and who calls them

  10. Tracking Function Calls: Logging individual call parameters to analyze usage patterns and optimize for common cases

  11. PC Histogram Analysis: Low-level view of the most frequently executed RISC-V instruction sequences

  12. Additional Options: Quick reference for other useful flags (steps, progress indicators, formatting)

  13. Practical Example: Real-world case study analyzing Ethereum opcode costs in a block validator

Introduction

Understanding Profiling Costs vs. Final Costs

When profiling a program in ZisK, it's important to understand the difference between profiling costs and final costs:

Profiling Costs

Profiling costs represent the individual operational cost accrued directly within a function's own instructions, based on the best-case cost model for each operation. These costs:

  • Exclude costs padding or aggregation costs
  • Reflect a direct cause-and-effect relationship between code changes and cost variations
  • Use the optimal cost for each operation type
  • Allow you to observe how small program modifications affect performance
  • Are ideal for optimization work because they show the direct impact of your code changes

For example, when you replace a function with a precompiled function or optimize a loop, the profiling cost will immediately reflect this improvement, making it easy to validate that your optimization is working as expected.

Final Costs

Final costs represent the real and exact cost of a specific execution, accounting for the actual resource consumption in the ZisK proving system. The key difference is that final costs measure cost at the instance granularity, not at the individual operation level.

In ZisK's architecture, multiple operations are grouped into instances (execution units in state machines), and the cost is determined by these instances:

  • Instance-based granularity: If you use 1 Keccak operation or 5,242 Keccak operations, you pay for one full Keccak instance. However, if you use 5,243 operations, you need a second instance, effectively doubling the cost for that single additional operation.

  • Planner strategies: The ZisK planner dynamically chooses execution strategies based on the operation mix. For example, depending on how many additions and binary operations you have, the planner might use a Binary state machine, a BinaryAdd state machine, or both. These decisions affect the final cost since each instance type has a different cost structure.

  • Aggregation across function calls: Final costs include both the function's own profiling cost and all costs from functions it calls, summed at the instance level.

Why use profiling costs for optimization? Because profiling costs provide a predictable and proportional metric directly tied to your code changes. When optimizing, you want to see the immediate effect of your changes at the operation level. Final costs, while representing the true execution cost, can show non-linear behavior due to instance boundaries and planning strategies. Once you've optimized based on profiling costs, the final costs will reflect the real resource savings in the proving system.

Example: Keccak Operations

Consider a program that performs Keccak hash operations:

Scenario 1: Using 1,000 Keccak operations

  • Profiling cost: Proportional to 1,000 operations
  • Final cost: 1 Keccak instance (fits within instance capacity)

Scenario 2: Using 5,000 Keccak operations

  • Profiling cost: 5× the cost of Scenario 1 (proportional to operations)
  • Final cost: Still 1 Keccak instance (if capacity is 5,242 operations)

Scenario 3: Using 5,243 Keccak operations

  • Profiling cost: ~5.24× the cost of Scenario 1 (proportional increase)
  • Final cost: 2 Keccak instances (crossed the instance boundary with just 1 extra operation!)

The profiling cost grows linearly with the number of operations, making it easy to predict the impact of adding or removing operations. The final cost, however, stays constant until you cross an instance boundary, then jumps significantly. This is why profiling costs are better for optimization: you can see the effect of every change, while final costs help you understand the actual proving cost in production.

Example: Comparing Optimization Alternatives

Suppose you have implemented two different optimizations for your program, and you need to decide which one is better. The difference between them is 1 million operations:

  • Option A: Uses 1M 64-bit ADD operations
  • Option B: Uses 1M 64-bit OR operations

In ZisK's architecture, there are specialized instances for 64-bit additions (BinaryAdd) that are much cheaper than the general binary instances (Binary) that can perform ADD, SUB, AND, OR, XOR, and other operations.

Analysis with Profiling Costs:

  • Option A (ADD): Lower profiling cost (uses efficient specialized instances)
  • Option B (OR): Higher profiling cost (requires general binary instances)
  • Clear winner: Option A is better ✓

Analysis with Final Costs (Small Program):

If your program is small and doesn't fill a Binary instance:

  • Both options may end up using the same Binary instance
  • Final cost: Same for both options (no clear winner)
  • Misleading conclusion: No difference between optimizations ✗

Analysis with Final Costs (Large Program):

If your program is larger and already uses separate instances:

  • Option A uses a dedicated BinaryAdd instance (cheaper)
  • Option B uses a Binary instance (more expensive)
  • Final cost: Option A is clearly cheaper ✓
  • Correct conclusion: Matches profiling cost analysis

Lesson: Profiling costs consistently show that Option A is better, regardless of program size. Final costs may give conflicting signals depending on whether instance boundaries are crossed. This is why profiling costs are the reliable metric for making optimization decisions—they provide a consistent signal that doesn't depend on the overall program context.

Symbol-Based Analysis

One of ZiskEmu's key advantages is that profiling works on any ELF file without requiring special instrumentation or debug information. The profiler uses symbol information already present in the binary, which means:

  • Works with release builds (optimized binaries)
  • No need to recompile with special flags
  • No runtime overhead during execution
  • Analyzes production-ready binaries (not stripped)

Detecting Optimization Opportunities

One of the most powerful uses of ZiskEmu's profiling is identifying where to apply patches and optimizations. The profiling costs help you answer critical questions:

Which crates/libraries are most performant for proof generation?

  • Compare different library implementations to see their effect on verification costs
  • Test alternative dependencies to find the most ZisK-efficient options
  • Evaluate different algorithm implementations (e.g., hash libraries, cryptographic crates, serialization libraries) to determine which performs best in the ZisK proving system
  • Make data-driven decisions when choosing between equivalent functionality from different crates

Validating optimizations:

  • After applying a optimization or patch, run the profiler again to confirm the profiling cost decreased
  • Compare before/after profiles to ensure the optimization is effective

Is patching being applied correctly?

  • Verify that precompiles are being used where expected
  • Detect cases or paths where generic code is running instead of optimized ZisK-specific implementations
  • Identify functions that should be patched but aren't

Where should you apply patches?

  • Find hotspot functions that would benefit most from ZisK precompiles
  • Identify expensive cryptographic operations (SHA-256, Keccak, etc.) that could use hardware acceleration
  • Locate arithmetic-heavy code that could leverage ZisK's optimized arithmetic operations

Example workflow:

  1. Profile your program to identify expensive functions
  2. Look for patterns that match available precompiles (hashing, big integer math, etc.)
  3. Patch the code to use:
    • ZisK-optimized implementations
    • Precompiles
    • Change operations or how they're used, considering you're optimizing for ZisK architecture, not hardware
  4. Re-profile to verify the profiling cost reduction

This iterative approach, guided by profiling costs, ensures your optimizations target the right areas and produce measurable improvements.

Basic Profiling (statistics)

The simplest way to profile your program is to use the -X (or --stats) flag. This provides an overview of execution statistics including total costs, memory operations, and opcode usage.

Command

ziskemu -e \<elf\> -i \<input\> -X

Output Explanation

REPORT                                  
----------------------------------------
STEPS                         92,875,129

COST DISTRIBUTION                   COST       %
------------------------------------------------
BASE                         293,601,280   2.57%
MAIN                       6,315,508,772  55.22%
OPCODES                    1,334,639,984  11.67%
PRECOMPILES                2,565,960,716  22.43%
MEMORY                       927,932,629   8.11%

TOTAL                     11,437,643,381 100.00%

FROPS                        963,440,253   8.42%
RAM USAGE                     18,465,008   3.47%

Understanding the Report:

STEPS: The number of processor cycles or instructions executed during program execution. This is an indicator of how long the program is—more steps mean a longer program execution.

COST DISTRIBUTION: This shows the profiling cost (see the Understanding Profiling Costs section for detailed explanation). Each operation is costed individually using the proof area as the metric, which is the best indicator of proof generation time—higher cost means longer proof generation.

The cost is broken down into these categories:

  • BASE: Cost of fixed components such as tables, range checks, and other constant overhead that exists regardless of program logic.

  • MAIN: Cost of the processor itself without operation costs. This is directly proportional to the steps count and represents the base cost of executing instructions.

  • OPCODES: Cost of simple operations performed by the processor (additions, subtractions, etc.) in the format a operation b = c, flag, where a, b, and c are 64-bit values. These are basic arithmetic and logical operations.

  • PRECOMPILES: Cost of complex operations whose parameters don't fit in 64 bits, requiring memory as an exchange system. Examples include:

    • 256-bit additions
    • Elliptic curve operations
    • Keccak hashing
    • DMA operations
  • MEMORY: Cost of direct memory operations (read, write) and the additional state machines required for non-aligned memory access. This includes cases where:

    • The address is not aligned to 8 bytes
    • Operations don't work with 8-byte chunks (e.g., reading a single byte)
  • TOTAL: Sum of all costs. Each category shows the percentage (%) it represents of the total cost.

FROPS (FRequent OPerationS): These are operations that are very frequently used by the processor, such as:

  • Adding 1 to a relatively small number (common in loop counters)
  • Adding 8 to an address (typical for pointer arithmetic)
  • Working with values < 256

These frequent operations are analyzed, detected, and pre-calculated, becoming part of the BASE cost but representing significant savings. In this example, FROPS show 8.42% - this is the cost the program would have if these optimizations were not applied. The actual savings are already reflected in the lower costs of the affected operations.

RAM USAGE: The amount of memory used out of the total available. This information is only available with the default allocator (bump allocator), which:

  • Never frees memory - always allocates new memory
  • Avoids the CPU cycles needed to manage the entire heap (typically >10% overhead)
  • Is recommended as long as sufficient memory is available
  • Provides better performance by eliminating heap management costs

Detailed Opcode Breakdown:

Below the summary, you'll see a detailed breakdown of each operation:

COST BY OPCODE                     COUNT       %            COST       % RANK
-----------------------------------------------------------------------------
OP ltu                         1,767,360   1.90%     106,041,600   0.93%
OP lt                            389,360   0.42%      23,361,600   0.20%
OP eq                            543,251   0.58%      32,595,060   0.28%
OP add                         7,086,411   7.63%     177,160,275   1.55% #4
OP sub                           693,157   0.75%      41,589,420   0.36%
OP and                         3,740,044   4.03%     224,402,640   1.96% #3
OP or                          7,482,273   8.06%     448,936,380   3.93% #2
OP xor                         1,027,290   1.11%      61,637,400   0.54%
OP add_w                          15,804   0.02%         948,240   0.01%
OP sub_w                           4,085   0.00%         245,100   0.00%
OP sll                         1,551,879   1.67%      82,249,587   0.72%
OP srl                           611,361   0.66%      32,402,133   0.28%
OP sra                           807,976   0.87%      42,822,728   0.37%
OP srl_w                          84,289   0.09%       4,467,317   0.04%
OP sra_w                              62   0.00%           3,286   0.00%
OP signextend_b                  121,977   0.13%       6,464,781   0.06%
OP signextend_h                    1,684   0.00%          89,252   0.00%
OP signextend_w                   27,460   0.03%       1,455,380   0.01%
OP pubout                             32   0.00%               0   0.00%
OP muluh                          86,682   0.09%       8,234,790   0.07%
OP mul                           409,765   0.44%      38,927,675   0.34%
OP divu                            6,368   0.01%         604,960   0.01%
OP remu                                4   0.00%             380   0.00%
OP dma_memcpy                    302,551   0.33%      12,707,142   0.11%
OP dma_memcmp                     91,454   0.10%       3,841,068   0.03%
OP dma_inputcpy                       90   0.00%           3,780   0.00%
OP dma_xmemset                    32,381   0.03%       1,360,002   0.01%
OP _dma_pre                      140,043   0.15%      12,323,784   0.11%
OP _dma_post                     164,752   0.18%      14,498,176   0.13%
OP keccak                         32,650   0.04%   2,466,707,500  21.57% #1
OP arith256_mod                      714   0.00%       1,016,736   0.01%
OP secp256k1_add                  17,688   0.02%      25,187,712   0.22%
OP secp256k1_dbl                  19,884   0.02%      28,314,816   0.25%
OP fcall_param                       652   0.00%               0   0.00%
OP fcall                             172   0.00%               0   0.00%
OP fcall_get                         156   0.00%               0   0.00%

FROPS BY OPCODE                    COUNT    HIT            COST       % RANK
----------------------------------------------------------------------------
FROP ltu                         942,288  34.78%      56,537,280   0.49% #4
FROP lt                          641,963  62.25%      38,517,780   0.34%
FROP eq                        3,273,419  85.77%     196,405,140   1.72% #2
FROP add                       1,597,142  18.39%      39,928,550   0.35%
FROP sub                         357,871  34.05%      21,472,260   0.19%
FROP and                         471,898  11.20%      28,313,880   0.25%
FROP or                        1,303,629  14.84%      78,217,740   0.68% #3
FROP xor                         105,118   9.28%       6,307,080   0.06%
FROP add_w                        75,366  82.67%       4,521,960   0.04%
FROP sub_w                         2,177  34.77%         130,620   0.00%
FROP sll                       8,729,869  84.91%     462,683,057   4.05% #1
FROP srl                         376,620  38.12%      19,960,860   0.17%
FROP sra                           5,962   0.73%         315,986   0.00%
FROP srl_w                        66,935  44.26%       3,547,555   0.03%
FROP sra_w                            60  49.18%           3,180   0.00%
FROP muluh                        25,590  22.79%       2,431,050   0.02%
FROP mul                          43,603   9.62%       4,142,285   0.04%
FROP divu                             42   0.66%           3,990   0.00%

COST BY OPCODE Table:

This table shows detailed statistics for each operation or precompile executed:

  • COUNT: Number of times this operation was called
  • %: Percentage of steps (cycles) that use this operation
  • COST: Total profiling cost for all executions of this operation
  • %: Percentage of total cost that this operation represents
  • RANK: The top 4 most expensive operations are marked with #1, #2, #3, #4

Important: Operations are not sorted by cost. They maintain a consistent order across executions to facilitate comparison between different runs. Look for the #N markers to identify the most expensive operations.

For example, in this output, keccak was executed 32,650 times (0.03% of steps) but accounts for 21.41% of the total cost, making it the #1 most expensive operation. This indicates that Keccak operations dominate the cost despite being relatively infrequent.

FROPS BY OPCODE Table:

FROPS (Frequently-used OPerationS) are highly common operations that have been analyzed and optimized through pre-calculation. These include operations like:

  • Incrementing by 1 (loop counters)
  • Adding 8 (pointer arithmetic)
  • Working with small values (< 256)

The table shows:

  • COUNT: Number of times the FROP variant was executed
  • HIT: Hit rate percentage - how often the frequent operation pattern was matched and the optimization applied
  • COST: Total cost with the optimization benefit already applied
  • %: Percentage of total cost
  • RANK: Top ranked FROPS by cost

High hit rates indicate that the program uses these common patterns frequently, benefiting from the pre-calculated optimizations. The FROPS total shown earlier (8.42% in this example) represents the cost that would be added if these optimizations were not available.

Key Insights from Statistics:

Use this information to:

  • Identify which operation types dominate your program's cost
  • Find operations with high count but disproportionate cost (optimization candidates)
  • Verify that precompiles are being used where expected
  • Understand the balance between computation (OPCODES), memory access (MEMORY), and complex operations (PRECOMPILES)

SDK Report Mode

For a cleaner, more compact output ideal for continuous integration or quick checks, use the --sdk flag. This provides a streamlined report with only the essential summary information.

Command

ziskemu -e <elf> -i <input> --sdk

Output Example

╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║  ◆ REPORT SUMMARY                                                                                                    ║
╠══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
║  STEPS                                                                                                    92,875,129  ║
║  COST                                                                                              11,437,643,381  ║
║  RAM                                                                                                  17.61 MB /  64.00 MB  ║
╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
║  ◆ COST DISTRIBUTION SUMMARY                                                                                         ║
╠══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣
║  CATEGORY     ∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙          COST      %  ║
║  ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄  ║
║  Base         ▎∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙     293,601,280   2.6%  ║
║  Main         ███████████████████████████████████████████████████████∙∙∙∙   6,315,508,772  55.2%  ║
║  Opcodes      ████████████▊∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙   1,334,639,984  11.7%  ║
║  Precompiles  █████████████████████████▊∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙   2,565,960,716  22.4%  ║
║  Memory       █████████▎∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙∙     927,932,629   8.1%  ║
║  ┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄  ║
║  Total                                                                       11,437,643,381 100.0%  ║
╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

The SDK report provides:

  • Clean visual layout with box-drawing characters
  • Progress bars showing the proportional cost of each category
  • Essential metrics only: steps, total cost, RAM usage, and cost distribution
  • No detailed breakdowns - ideal for automated testing or quick cost checks

SDK Selective Sections

By default, the SDK report shows only the summary. You can selectively enable additional sections:

Show Opcode Details (--opcodes)

Adds a section showing the top 10 most expensive opcodes with their cost distribution and FROPS hit rates:

ziskemu -e <elf> -i <input> --sdk --opcodes

This adds a COST DISTRIBUTION BY OPCODE section comparing regular operations vs frequent operations (FROPS).

Show Top Functions (--top-functions)

Lists the functions with highest cost. Requires -S to read symbols:

ziskemu -e <elf> -i <input> --sdk --top-functions -S

This adds a TOP COST FUNCTIONS section with automatic compacting of long function names.

Note: Using --top-functions automatically enables symbol reading (-S), so you can omit the -S flag if you only need it for this feature.

Show Profile Tags (--profile-tags)

Displays accumulated profile tag measurements from your code. Requires profile tags in your program (see Profile Tags section):

ziskemu -e <elf> -i <input> --sdk --profile-tags

This shows sections like STEPS PROFILE TAGS and COST PROFILE TAGS if you've instrumented your code with profile markers.

Combining Options

You can combine multiple flags to customize the report:

# Show summary + opcodes + top functions
ziskemu -e <elf> -i <input> --sdk --opcodes --top-functions -S

# Show all optional sections
ziskemu -e <elf> -i <input> --sdk --opcodes --top-functions --profile-tags -S

Behavior Note: If you specify any of the selective flags (--opcodes, --top-functions, --profile-tags), only the summary plus the explicitly requested sections will be shown. If you don't specify any selective flags, you get only the summary.

SDK Width Configuration

Control the width of the SDK report output with --sdk-width:

# Use wider report (150 characters)
ziskemu -e <elf> -i <input> --sdk --sdk-width=150

# Use narrower report (100 characters) 
ziskemu -e <elf> -i <input> --sdk --sdk-width=100

Default width: 120 characters. Wider reports provide more space for progress bars and function names, while narrower reports fit better in smaller terminals or log viewers.

Function Name Display Options

When displaying function-level profiling information with -S, function names can become very long, especially in Rust with its fully-qualified paths and generic parameters. ZiskEmu provides options to control how these names are displayed.

Compact Names (Default)

By default, long function names are automatically shortened to 160 characters using intelligent compacting:

# Default behavior - compact to 160 characters
ziskemu -e <elf> -i <input> -X -S

The compacting algorithm:

  1. Collapses nested generic parameters: <A<B<C>>><A<…>>
  2. Elides intermediate path segments: std::io::default_write_fmt::Adapterstd::..::Adapter
  3. Maintains readability while reducing length

Custom Compact Length

Specify a different maximum length:

# Compact to 80 characters
ziskemu -e <elf> -i <input> -X -S --compact-names=80

# Compact to 200 characters  
ziskemu -e <elf> -i <input> -X -S --compact-names=200

Disable Compacting

To see complete, uncompacted function names:

ziskemu -e <elf> -i <input> -X -S --no-compact-names

When to use each option:

  • Default (160 chars): Good balance for most terminal widths and readability
  • Shorter (80-100 chars): When viewing in narrow terminals or want very concise output
  • Longer (200+ chars): When you need more context from the function path
  • No compacting: When you need to see the complete, exact function signatures (e.g., for copy-pasting into code searches)

Profile Tags

Profile tags allow you to instrument your code to measure specific code sections, loops, or algorithms. This is useful when you want to:

  • Measure the cost or steps of a specific algorithm
  • Compare different implementation approaches
  • Track performance of critical sections across multiple calls
  • Identify hotspots within a single function

How Profile Tags Work

You add markers in your guest code using macros provided by ziskos. These markers:

  • Have zero overhead when not running in the ZiskEmu profiler
  • Work at the source code level - you decide what to measure
  • Can measure either steps (execution cycles) or cost (profiling cost)
  • Can either print immediately or accumulate for a summary report

Setting Up Profile Tags

In your guest code's Cargo.toml, add the ziskos dependency:

[dependencies]
ziskos = { path = "../../ziskos" }  # Adjust path as needed

In your guest source code:

use ziskos::{profile_start, profile_end};
use ziskos::{profile_report_start, profile_report_end};
use ziskos::{profile_steps_start, profile_steps_end};
use ziskos::{profile_report_steps_start, profile_report_steps_end};

fn main() {
    // Example usage in your code
    profile_start!(hash_computation);
    let result = expensive_hash_function(&data);
    profile_end!(hash_computation);
    
    // ... more code
}

Profile Tag Macros

There are 8 macros organized in 2 dimensions:

Dimension 1 - What to measure:

  • Cost macros (profile_start! / profile_end!): Measure profiling cost
  • Steps macros (profile_steps_start! / profile_steps_end!): Measure execution steps

Dimension 2 - When to report:

  • Immediate (profile_start! / profile_end!): Print result after each end! call
  • Report (profile_report_start! / profile_report_end!): Accumulate and show at program end

Immediate Output Macros

Print the measurement immediately after the end! call:


#![allow(unused)]
fn main() {
// Measure and print COST after each execution
profile_start!(my_algorithm);
run_my_algorithm();
profile_end!(my_algorithm);
// Prints: [my_algorithm] 12345

// Measure and print STEPS after each execution  
profile_steps_start!(my_loop);
for i in 0..1000 {
    expensive_operation(i);
}
profile_steps_end!(my_loop);
// Prints: [my_loop] 45678
}

Use case: When you want to track each individual execution, or when the measured section is called only once or a few times.

Report Macros

Accumulate measurements and show statistics at the end:


#![allow(unused)]
fn main() {
for batch in batches {
    profile_report_start!(process_batch);
    process_batch(&batch);
    profile_report_end!(process_batch);
}
// No output during execution

// At program end, you'll see accumulated statistics:
// Total, average, min, max for all executions
}

Use case: When measuring sections called many times (loops, repeated operations) and you want aggregate statistics rather than individual measurements.

Complete Example

use ziskos::{
    profile_start, profile_end,
    profile_report_start, profile_report_end,
    profile_steps_start, profile_steps_end,
    profile_report_steps_start, profile_report_steps_end
};

fn main() {
    // Measure total cost once
    profile_start!(total_execution);
    
    // Accumulate statistics for repeated calls
    for i in 0..100 {
        profile_report_steps_start!(loop_iteration);
        expensive_computation(i);
        profile_report_steps_end!(loop_iteration);
    }
    
    // Nested measurements
    profile_steps_start!(data_processing);
    
    profile_report_start!(hash_phase);
    for item in items {
        compute_hash(item);
    }
    profile_report_end!(hash_phase);
    
    profile_steps_end!(data_processing);
    
    profile_end!(total_execution);
}

Viewing Profile Tag Results

To see the accumulated profile tag statistics, add --profile-tags to your command:

# With standard report
ziskemu -e <elf> -i <input> -X --profile-tags

# With SDK report  
ziskemu -e <elf> -i <input> --sdk --profile-tags

The output shows aggregated statistics for all profile tags used with the report variants:

PROFILE TAGS STEPS (STEPS, % STEPS, CALLS, AVG, MIN, MAX)
----------------------------------------------------------
     10,234,567  11.02%        100     102,345     98,123     125,678  loop_iteration
      3,456,789   3.72%         50      69,135     45,000      89,000  hash_phase

PROFILE TAGS COST (COST, % COST, CALLS, AVG, MIN, MAX)
-------------------------------------------------------
  1,234,567,890  10.79%        100  12,345,678  10,000,000  15,000,000  total_execution
    456,789,012   3.99%         50   9,135,780   5,000,000  12,000,000  hash_phase

Statistics shown:

  • TOTAL: Sum of all measurements
  • % TOTAL: Percentage of total steps or cost
  • CALLS: Number of times the tag was executed
  • AVG: Average per call
  • MIN: Minimum value observed
  • MAX: Maximum value observed

Best Practices

  1. Use descriptive tag names: hash_computation is better than tag1
  2. Choose report vs. immediate based on frequency:
    • Few calls (1-10): Use immediate variants
    • Many calls (100+): Use report variants
  3. Match start/end pairs: Always use matching macro pairs (same tag name, same variant)
  4. Don't nest same tag names: Each tag should represent a unique code section
  5. Combine with function profiling: Profile tags show "what", function profiling shows "where"

Firefox Profiler Integration

ZiskEmu can export profiling data to Firefox Profiler format, enabling advanced visualization and analysis of your program's execution.

Generating Profiler Data

Use --profiler-output to specify the output file:

# Generate compressed profiler data (recommended)
ziskemu -e <elf> -i <input> -X -S --profiler-output=profile.json.gz

# Generate uncompressed JSON
ziskemu -e <elf> -i <input> -X -S --profiler-output=profile.json

Requirements: The -S flag is required to load symbol information. The -X flag is recommended for complete profiling data.

Default: If you use -X -S without specifying --profiler-output, a file named profile.json.gz is created automatically.

Viewing in Firefox Profiler

  1. Go to https://profiler.firefox.com
  2. Click "Load a profile from file"
  3. Select your profile.json.gz file

The Firefox Profiler provides:

  • Call tree visualization showing the function call hierarchy
  • Flame graphs for identifying performance hotspots
  • Timeline view showing execution progress over time
  • Function details with cumulative costs
  • Search and filtering capabilities

Use Cases

Firefox Profiler is particularly useful when:

  • You need to visualize complex call graphs
  • Standard text reports are too verbose
  • You want to share profiling results with team members
  • You need to compare multiple profiling runs
  • You want interactive exploration of the call stack

File Format

The exported file follows the Firefox Profiler format specification, making it compatible with other tools that support this format.

Function-Level Profiling

To understand which functions contribute most to your program's cost, add the -S (or --read-symbols) flag to read symbol information from the ELF file.

Command

ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S

Output Explanation

When symbol reading is enabled, ZiskEmu simulates a call stack to evaluate functions cumulatively. This means it tracks not only the cycles and cost of each function's own code, but also all the calls made within that function. This cumulative analysis provides a complete picture of each function's contribution to the total execution cost.

Note: Initial calls to _start or _main are filtered out as they represent 100% of the program and don't provide useful optimization insights.

ZiskEmu provides two complementary analyses:

1. TOP STEP FUNCTIONS - Analysis by execution cycles:

TOP STEP FUNCTIONS (STEPS, % STEPS, CALLS, STEPS/CALL, FUNCTION)
----------------------------------------------------------------
     54,831,894  59.04%          1      54,831,894 <reth_evm::execute::BasicBlockExecutor<&reth_evm
     53,951,767  58.09%          1      53,951,767 <alloy_evm::eth::block::EthBlockExecutor<alloy_e
     52,133,363  56.13%         70         744,762 <revm_handler::mainnet_handler::MainnetHandler<r
     48,406,973  52.12%     41,793           1,158 <zeth_mpt::mpt::node::Node<zeth_mpt::mpt::memoiz
     26,004,168  28.00%          1      26,004,168 <zeth_mpt_state::SparseState as stateless::trie:
     21,389,831  23.03%     41,590             514 <zeth_mpt::mpt::node::Node<zeth_mpt::mpt::memoiz
     16,104,120  17.34%      1,039          15,499 <revm_context::journal::inner::JournalInner<revm
     15,999,662  17.23%        841          19,024 <revm_context::journal::inner::JournalInner<revm
     15,635,579  16.84%      1,239          12,619 <revm_database::states::state::State<stateless::
     15,498,490  16.69%        388          39,944 <&mut revm_database::states::state::State<statel
     15,014,347  16.17%        770          19,499 <revm_context::context::Context<revm_context::bl
     14,994,327  16.14%        770          19,473 <revm_context::journal::Journal<&mut revm_databa
     14,299,020  15.40%        618          23,137 revm_interpreter::instructions::contract::call_h
     14,253,493  15.35%        618          23,063 revm_interpreter::instructions::contract::call_h
     14,230,009  15.32%        618          23,025 revm_interpreter::instructions::contract::call_h
     13,714,388  14.77%     10,505           1,305 ziskos::zisklib::lib::keccak256::keccak256

Shows for each function:

  • STEPS: Total cumulative cycles used by the function (including all nested calls)
  • % STEPS: Percentage of total program cycles this function represents
  • CALLS: Number of times this function was called
  • STEPS/CALL: Average cycles per call to this function
  • FUNCTION: Function name from symbol table

2. TOP COST FUNCTIONS - Analysis by profiling cost:

TOP COST FUNCTIONS (COST, % COST, CALLS, COST/CALL, FUNCTION)
-------------------------------------------------------------
  5,255,204,123  45.95%          1   5,255,204,123 <reth_evm::execute::BasicBlockExecutor<&reth_evm
  5,172,696,823  45.23%          1   5,172,696,823 <alloy_evm::eth::block::EthBlockExecutor<alloy_e
  4,997,989,104  43.70%         70      71,399,844 <revm_handler::mainnet_handler::MainnetHandler<r
  4,530,507,470  39.61%     41,793         108,403 <zeth_mpt::mpt::node::Node<zeth_mpt::mpt::memoiz
  4,014,605,785  35.10%          1   4,014,605,785 <zeth_mpt_state::SparseState as stateless::trie:
  3,759,934,537  32.87%     10,505         357,918 ziskos::zisklib::lib::keccak256::keccak256

Shows for each function:

  • COST: Total cumulative profiling cost of the function (including all nested calls)
  • % COST: Percentage of total program cost this function represents
  • CALLS: Number of times this function was called
  • COST/CALL: Average profiling cost per call to this function
  • FUNCTION: Function name from symbol table

Key insights:

Both tables show cumulative metrics - each function includes the cost/cycles of everything it calls. This helps identify:

  • Which high-level functions consume the most resources
  • Whether optimization should focus on a function's implementation or the functions it calls
  • Functions with high cost per call that might benefit from caching or optimization
  • Functions called frequently that could benefit from batching or precompiles

By comparing the STEPS and COST analyses, you can identify cases where functions have many cycles but relatively low cost (efficient operations) versus high cost per cycle (expensive operations like precompiles).

For example, ziskos::zisklib::lib::keccak256::keccak256 shows:

  • Called 10,505 times
  • 13,714,388 steps (14.77% of total) with ~1,305 steps/call
  • 3,759,934,537 cost (32.87% of total) with ~357,918 cost/call

This indicates that while Keccak uses 14.77% of cycles, it represents 32.87% of the total cost - showing it's an expensive operation relative to its cycle count, typical of precompile operations.

Customizing ROI Display

Showing More or Fewer Functions

Use the -T (or --top-roi) flag to control how many top functions are displayed:

# Show top 50 functions
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S -T 50

# Show only top 10 functions
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S -T 10

Specifying the Main Entry Point

If your program's entry point isn't named main, use the -M (or --main-name) flag:

ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S -M custom_entry

Filtering Functions by Pattern

For large programs, you may want to focus analysis on specific functions or modules. Use the --roi-filter flag with a regular expression pattern to mark functions of interest:

# Filter functions containing "sha256" in their name
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S --roi-filter "sha256"

# Filter multiple patterns
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S --roi-filter "hash|crypto|encode"

When combined with --top-roi-filter, the display will show only functions that match the specified pattern:

# Show only functions matching the filter pattern
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S \
  --roi-filter "keccak" --top-roi-filter

This is useful when you want to:

  • Focus optimization efforts on a specific subsystem or module
  • Analyze only cryptographic functions
  • Compare different implementations of similar functionality
  • Filter out noise from unrelated code

Detailed Caller Analysis

The -D (or --top-roi-detail) flag provides an in-depth breakdown of each top function, showing exactly where costs come from and who calls the function. This detailed analysis helps pinpoint optimization opportunities at a granular level.

Command

ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S -D

What This Shows

For each top function, the detailed analysis provides:

  1. Overall metrics: Total steps and cost for the function
  2. Cost by opcode: Breakdown showing which operations (opcodes and precompiles) consume the most resources within this function, with ranking of the top 4 most expensive operations
  3. Top step callers: List of functions that call this function, showing:
    • Number of calls from each caller
    • Total steps attributed to calls from that caller
    • Percentage of this function's total steps coming from each caller

This information helps you understand:

  • What makes a function expensive (which operations dominate)
  • Who is responsible for calling it (caller distribution)
  • Where to focus optimization (expensive operations vs. frequent callers)

Output Explanation

DETAIL FUNCTION ziskos::zisklib::lib::keccak256::keccak256
----------------------------------------------------------
STEPS                         13,714,388  14.77%
COST                       3,759,934,537  32.87%

|    COST BY OPCODE                     COUNT            COST       % RANK
|    ---------------------------------------------------------------------
|    OP ltu                            28,516       1,710,960   0.05%
|    OP add                           169,207       4,230,175   0.11%
|    OP sub                             3,644         218,640   0.01%
|    OP and                            94,545       5,672,700   0.15%
|    OP or                          2,489,249     149,354,940   3.97% #2
|    OP xor                           492,192      29,531,520   0.79% #3
|    OP sll                           360,008      19,080,424   0.51% #4
|    OP dma_memcpy                     21,010         882,420   0.02%
|    OP dma_xmemset                    21,010         882,420   0.02%
|    OP _dma_pre                        2,346         206,448   0.01%
|    OP _dma_post                       9,863         867,944   0.02%
|    OP keccak                         32,650   2,466,707,500  65.61% #1

|    TOP STEP CALLERS (calls, steps)
|    -------------------------------
|              3,974       9,749,694  71.09% <zeth_mpt_state::SparseState as stateless::trie::State
|              2,332       2,778,890  20.26% <zeth_mpt::mpt::node::Node<zeth_mpt::mpt::memoize::Cac
|              1,284         217,150   1.58% revm_interpreter::instructions::system::keccak256::<re
|              1,266         188,634   1.38% <revm_database::states::state::State<stateless::witnes
|                720         107,280   0.78% <alloy_primitives::bits::bloom::Bloom>::accrue_log
|                429          63,921   0.47% <reth_trie_common::hashed_state::HashedPostState>::fro
|                202          30,098   0.22% <revm_database::states::state::State<stateless::witnes
|                144         350,053   2.55% <alloy_trie::hash_builder::HashBuilder>::update
|                 66         102,536   0.75% stateless::recover_block::verify_and_compute_sender
|                 58         110,681   0.81% alloy_primitives::utils::keccak256_impl

Understanding the detailed report:

Function Header:

DETAIL FUNCTION ziskos::zisklib::lib::keccak256::keccak256
----------------------------------------------------------
STEPS                         13,714,388  14.77%
COST                       3,759,934,537  32.87%

Shows the total cumulative steps and profiling cost for this function (including nested calls).

COST BY OPCODE section:

|    COST BY OPCODE                     COUNT            COST       % RANK
|    ---------------------------------------------------------------------
|    OP keccak                         32,650   2,466,707,500  65.61% #1
|    OP or                          2,489,249     149,354,940   3.97% #2
|    OP xor                           492,192      29,531,520   0.79% #3

Breaks down which operations consume resources within this function:

  • COUNT: Number of times each operation was executed
  • COST: Total profiling cost for all executions
  • %: Percentage of this function's total cost
  • RANK: Top 4 most expensive operations marked #1 through #4

This shows that keccak precompile dominates this function's cost at 65.61%, making it the primary optimization target.

TOP STEP CALLERS section:

|    TOP STEP CALLERS (calls, steps)
|    -------------------------------
|              3,974       9,749,694  71.09% <zeth_mpt_state::SparseState...
|              2,332       2,778,890  20.26% <zeth_mpt::mpt::node::Node...

Shows which functions call this function and how steps are distributed:

  • First column: Number of calls from this caller
  • Second column: Total steps consumed when called from this caller
  • Percentage: How much of this function's total steps come from this caller
  • Function name: The calling function

This reveals that SparseState is responsible for 71% of this function's execution, making it the primary call path to analyze.

Controlling Detail Level

Use the -C (or --roi-callers) flag to control how many callers are shown in the detailed analysis for each function:

# Show top 20 callers for each function in the detailed report
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S -D -C 20

# Show only top 5 callers for each function
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S -D -C 5

The default value is 10 callers per function. Increasing this number provides more complete call path information but may make the output more verbose.

Tracking Function Calls

Sometimes you need to analyze each individual call to a function to understand:

  • Which parameter values are most frequently used
  • What patterns exist in the arguments
  • Which specific input values trigger expensive code paths

This information is valuable for optimization strategies. For example, if you discover that certain parameter values are very common, you could:

  • Add fast paths for those frequent values
  • Use lookup tables or caching for common inputs
  • Optimize the general case based on typical parameter distributions

How It Works

Use the --track-call-args feature combined with --roi-filter to log parameter values for each call to matching functions:

  • --roi-filter "pattern": Specifies which functions to track (using a regular expression)
  • --track-call-args N: Specifies how many parameters to log (up to 8, corresponding to RISC-V a0-a7 registers)

Important limitation: The tool logs the raw parameter values from registers. This means:

  • For scalar values (integers, booleans): You get the actual value
  • For pointers/addresses: You get only the address itself, not the data it points to
  • This makes tracking most useful for functions with scalar parameters or when you're interested in address patterns

Command

# Track calls to filtered functions, logging first 4 parameters
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -S \
  --roi-filter "hash_function" --track-call-args 4 --track-output-path ./traces

Options

  • --roi-filter "pattern": Regular expression to match function names you want to track (required)
  • --track-call-args N: Number of parameters to log (1-8, corresponding to RISC-V a0-a7 registers)
  • --track-separator "SEP": Character used to separate parameter values in output (default: ;)
  • --track-output-path PATH: Directory where tracking files will be written (default: current directory)

Output

For each matched function, a text file is created (<function_name>.txt) with one line per call:

# ROI: hash_function (PC: 0x00012a0-0x00012f8)
# Separator: ';'
# Parameters: a0-a3
0x7fff8200;0x00000100;0x7fff8400;0x00000000
0x7fff8300;0x00000040;0x7fff8400;0x00000001
0x7fff8450;0x00000080;0x7fff8400;0x00000002

Each line contains the parameter values (in hexadecimal) for one function call, separated by the chosen separator. You can then analyze this file to:

  • Find the most common parameter combinations
  • Identify patterns in memory addresses
  • Detect outliers or unusual parameter values
  • Build histograms of value distributions

PC Histogram Analysis

The -H (or --histogram) flag provides a low-level view of the most frequently executed code positions in your program. Unlike function-level profiling, this analysis operates at the program counter (PC) level, showing you the exact assembly instructions that execute most often.

What This Shows

This analysis:

  • Identifies the most executed individual instructions by their program counter address
  • Groups consecutive instructions together automatically
  • Attributes these instruction groups to their parent function (when symbols are loaded with -S)
  • Helps identify hot loops, critical paths, and instruction-level bottlenecks

This is particularly useful for:

  • Understanding which specific code sequences dominate execution time
  • Identifying tight loops that could benefit from optimization
  • Verifying that optimizations are affecting the intended code paths
  • Finding unexpected hotspots at the instruction level

Command

# Show top 50 most executed instruction groups
ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X -S -H 50

The histogram requires -S to display function names. The number after -H controls how many instruction groups to display.

Output Explanation

TOP PC HISTOGRAM (EXECUTIONS, % EXECUTIONS, PC)
-----------------------------------------------
        796,670   0.86%  0x801230b8:   lbu r16, 0x0(r14)
        796,670   0.86%  0x801230bc:   beq r16, r12, 0xffffffd4
      1,593,340   1.72%  -----------   <revm_bytecode::legacy::raw::LegacyRawBytecode>::into_analyzed

        755,644   0.81%  0x801230c0:   slli r17, r16, 0x38
        755,644   0.81%  0x801230c4:   srai r17, r17, 0x38
        755,644   0.81%  0x801230c8:   bge r15, r17, 0x14
      2,266,932   2.44%  -----------   <revm_bytecode::legacy::raw::LegacyRawBytecode>::into_analyzed

        547,858   0.59%  0x801230dc:   addi r14, r14, 0x1
        547,858   0.59%  0x801230e0:   bltu r14, r10, 0xffffffd8
      1,095,716   1.18%  -----------   <revm_bytecode::legacy::raw::LegacyRawBytecode>::into_analyzed

        429,174   0.46%  0x800a38ec:   ld r10, 0x60(r21)
        429,174   0.46%  0x800a38f0:   lbu r11, 0x0(r10)
        429,174   0.46%  0x800a38f4:   addi r10, r10, 0x1
        429,174   0.46%  0x800a38f8:   sd r10, 0x60(r21)
        429,174   0.46%  0x800a38fc:   slli r10, r11, 0x4
        429,174   0.46%  0x800a3900:   add r10, r19, r10
        429,174   0.46%  0x800a3904:   ld r11, 0x8(r10)
        429,174   0.46%  0x800a3908:   ld r12, 0x180(r21)
        429,174   0.46%  0x800a390c:   sub r13, r12, r11
        429,174   0.46%  0x800a3910:   sd r13, 0x180(r21)
        429,174   0.46%  0x800a3914:   bltu r12, r11, 0x20
        429,174   0.46%  0x800a3918:   ld r12, 0x0(r10)
        429,174   0.46%  0x800a391c:   addi r10, r21, 0x0 => copyb
        429,174   0.46%  0x800a3920:   addi r11, r9, 0x0 => copyb
        429,174   0.46%  0x800a3924:   jalr r1, r12, 0x0
        429,174   0.46%  0x800a3928:   lbu r10, 0x68(r21)
        429,174   0.46%  0x800a392c:   bne r10, r0, 0xffffffc0
      7,295,958   7.86%  -----------   <revm_handler::mainnet_handler::MainnetHandler<revm_context::evm::Ev

Understanding the histogram:

The output is organized into instruction groups, where each group consists of:

  1. Individual instruction lines: Each shows:

    • EXECUTIONS: Number of times this specific instruction was executed
    • % EXECUTIONS: Percentage of total program steps
    • PC: Program counter address in hexadecimal
    • Instruction: The RISC-V assembly instruction at that address
  2. Group summary line (with dashes):

    • Total executions: Sum of all instructions in this group
    • % EXECUTIONS: Cumulative percentage for the entire group
    • Function name: The function to which these instructions belong

Key insights from the example:

The first group shows a simple loop checking bytes:

        796,670   0.86%  0x801230b8:   lbu r16, 0x0(r14)     # Load byte
        796,670   0.86%  0x801230bc:   beq r16, r12, 0xffffffd4  # Branch if equal
      1,593,340   1.72%  -----------   <revm_bytecode::legacy::raw::LegacyRawBytecode>::into_analyzed

This tight 2-instruction sequence executed 796,670 times, representing 1.72% of total execution.

The large group at the bottom represents a complex instruction dispatcher:

        429,174   0.46%  0x800a38ec:   ld r10, 0x60(r21)     # Load from context
        ...
        429,174   0.46%  0x800a392c:   bne r10, r0, 0xffffffc0   # Loop back
      7,295,958   7.86%  -----------   <revm_handler::mainnet_handler::MainnetHandler...

This 17-instruction sequence accounts for 7.86% of total execution, making it a prime optimization target.

When to use histogram analysis:

  • After function-level profiling: Once you identify expensive functions, use histograms to see which specific instruction sequences within those functions dominate
  • Validating compiler optimizations: Verify that loops are unrolled or optimized as expected
  • Finding unexpected hotspots: Sometimes a small instruction sequence accounts for disproportionate execution time
  • Comparing implementations: See how different code structures affect instruction-level execution patterns

Additional Options

Show Steps Without Full Statistics

For quick execution time checks without generating full statistics, use the --steps flag:

ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin --steps

Progress Indicators

For long-running programs, show progress updates every 16M steps with --with-progress:

ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin --with-progress

Disable Thousands Separator

For machine-readable output, disable the thousands separator with --no-thousands-sep:

ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest -i input.bin -X --no-thousands-sep

Complete Example: Comprehensive Profiling

Here's a complete example that uses most profiling features together:

ziskemu -e target/elf/riscv64ima-zisk-zkvm-elf/release/guest \
  -i input.bin \
  -X \
  -S \
  -D \
  -T 30 \
  -C 15 \
  -H 50 \
  --roi-filter "sha256|hash" \
  --track-call-args 6 \
  --track-output-path ./profiling_data \
  -m

This command will:

  1. Generate full statistics (-X)
  2. Read and use symbol information (-S)
  3. Show detailed caller analysis (-D)
  4. Display top 30 functions by cost (-T 30)
  5. Show top 15 callers for each function (-C 15)
  6. Display top 50 most executed instructions (-H 50)
  7. Filter to sha256/hash-related functions (--roi-filter)
  8. Track first 6 parameters of filtered function calls (--track-call-args)
  9. Save tracking data to ./profiling_data directory
  10. Show performance metrics (-m)

Tips for Effective Profiling

Start Simple, Add Detail

Begin with basic statistics (-X) to get an overview, then progressively add more detailed analysis:

  1. Basic: ziskemu -e program.elf -i input.bin -X
  2. Functions: ziskemu -e program.elf -i input.bin -X -S
  3. Callers: ziskemu -e program.elf -i input.bin -X -S -D
  4. Detailed: Add -H as needed

Focus on High Impact

Use the final_cost percentage to identify functions with the highest impact. Optimizing a function that represents 50% of execution time will have much more effect than optimizing one at 1%.

Understand Profiling Cost vs. Final Cost

When a function has high final cost but low profiling cost, the optimization opportunity lies in the functions it calls, not in the function itself. Focus your optimization efforts where profiling costs are highest, as these represent direct computational work that can be improved through code changes or patching with precompiles.

Use Filtering for Large Codebases

In programs with hundreds of functions, use --roi-filter to focus on specific subsystems or modules of interest.

Track Representative Inputs

Profile with realistic, representative inputs. The cost distribution can vary significantly based on input characteristics.

Practical Example: Analyzing Ethereum Opcode Costs

This example demonstrates how to analyze the cost distribution of Ethereum opcodes in a real-world client implementation. By filtering for the EVM instruction interpreter functions, we can obtain a detailed breakdown of which Ethereum operations consume the most resources during block validation.

Scenario

You want to understand which Ethereum opcodes are most expensive in terms of ZisK proving costs when validating a specific block. This information helps you:

  • Identify which EVM operations would benefit most from optimization
  • Understand the cost profile of real-world Ethereum transactions
  • Guide decisions about which precompiles or patches to prioritize

Command

target/release/ziskemu \
  -S \
  -X \
  -e ../zisk-eth-client/bin/guests/stateless-validator-reth/target/riscv64ima-zisk-zkvm-elf/release/zec-reth \
  -i ../data/benchmark_inputs/24654304_30c8b8.bin \
  --roi-filter "revm_interpreter::instructions::" \
  --top-roi-filter \
  -T 200

What this does:

  • -S: Load symbol information from the ELF file
  • -X: Generate full statistics with cost breakdown
  • -e <path>: Path to the compiled Ethereum client (reth implementation)
  • -i <input>: Block data to validate (block 24,654,304)
  • --roi-filter "revm_interpreter::instructions::": Filter to show only functions in the EVM instruction interpreter namespace (where all Ethereum opcodes are implemented)
  • --top-roi-filter: Display only the filtered functions in the top ROI lists
  • -T 200: Show top 200 functions (to capture all EVM opcodes)

Expected Output

The output will show the TOP COST FUNCTIONS filtered to only include EVM instruction implementations, giving you a clear view of which Ethereum opcodes dominate the proving cost for this specific block:

TOP COST FUNCTIONS (COST, % COST, CALLS, COST/CALL, FUNCTION)
-------------------------------------------------------------
  9,433,353,231  10.32%      5,824       1,619,737 revm_interpreter::instructions::contract::call_helpers::load_acc_
  9,396,093,086  10.28%      5,824       1,613,340 revm_interpreter::instructions::contract::call_helpers::load_acco
  9,377,741,662  10.26%      5,824       1,610,189 revm_interpreter::instructions::contract::call_helpers::load_acco
  8,344,978,788   9.13%      1,695       4,923,291 revm_interpreter::instructions::contract::call::<revm_interpreter
  4,599,658,812   5.03%    342,951          13,412 revm_interpreter::instructions::stack::swap::<1, revm_interpreter
  2,772,734,752   3.03%    128,956          21,501 revm_interpreter::instructions::memory::mload::<revm_interpreter:
  2,580,388,569   2.82%     10,675         241,722 revm_interpreter::instructions::host::sload::<revm_interpreter::i
  1,726,257,923   1.89%    105,903          16,300 revm_interpreter::instructions::memory::mstore::<revm_interpreter
  1,599,904,068   1.75%    119,289          13,412 revm_interpreter::instructions::stack::swap::<2, revm_interpreter
  1,576,416,043   1.72%     13,627         115,683 revm_interpreter::instructions::arithmetic::mulmod::<revm_interpr
  1,499,796,900   1.64%    111,825          13,412 revm_interpreter::instructions::stack::swap::<3, revm_interpreter
  1,430,041,088   1.56%    106,624          13,412 revm_interpreter::instructions::stack::swap::<4, revm_interpreter
  1,045,628,445   1.14%      2,201         475,069 revm_interpreter::instructions::contract::static_call::<revm_inte
    896,353,301   0.98%    184,312           4,863 revm_interpreter::instructions::control::jumpi::<revm_interpreter
    812,869,552   0.89%    561,374           1,448 revm_interpreter::instructions::stack::push::<1, revm_interpreter
    806,652,474   0.88%    465,922           1,731 revm_interpreter::instructions::stack::push::<2, revm_interpreter
    763,874,190   0.84%      6,781         112,649 revm_interpreter::instructions::host::sstore::<revm_interpreter::
    691,435,073   0.76%      5,682         121,688 revm_interpreter::instructions::system::keccak256::<revm_interpre
    669,514,638   0.73%    245,798           2,723 revm_interpreter::instructions::arithmetic::add::<revm_interprete
    638,632,995   0.70%    102,549           6,227 revm_interpreter::instructions::arithmetic::mul::<revm_interprete
    620,675,903   0.68%    239,701           2,589 revm_interpreter::instructions::control::jump::<revm_interpreter:
    527,546,726   0.58%     83,391           6,326 revm_interpreter::instructions::bitwise::shr::<revm_interpreter::
    452,376,936   0.49%    302,391           1,496 revm_interpreter::instructions::stack::dup::<2, revm_interpreter:
    325,487,994   0.36%     41,683           7,808 revm_interpreter::instructions::bitwise::sar::<revm_interpreter::
    311,851,955   0.34%     25,502          12,228 revm_interpreter::instructions::system::codecopy::<revm_interpret
    289,141,110   0.32%    120,407           2,401 revm_interpreter::instructions::bitwise::iszero::<revm_interprete
    264,613,976   0.29%    176,881           1,496 revm_interpreter::instructions::stack::dup::<3, revm_interpreter:
    262,969,735   0.29%     18,608          14,132 revm_interpreter::instructions::system::calldataload::<revm_inter
    252,430,047   0.28%     41,031           6,152 revm_interpreter::instructions::bitwise::sgt::<revm_interpreter::
    248,940,076   0.27%      1,928         129,118 revm_interpreter::instructions::contract::delegate_call::<revm_in
    242,086,315   0.26%        192       1,260,866 revm_interpreter::instructions::host::extcodesize::<revm_interpre
    229,785,355   0.25%     10,852          21,174 revm_interpreter::instructions::stack::push::<32, revm_interprete

This filtered view allows you to quickly identify:

  • Most expensive opcodes: Which EVM operations have the highest total cost
  • Frequently called opcodes: Operations with many calls but lower individual cost
  • Optimization targets: Opcodes that would benefit most from ZisK-specific optimizations or precompiles

Important note: With this method, no modification to the ELF file is required. The profiling works directly on the compiled binary using existing symbol information. However, you do need to know the naming convention used for the functions that implement each opcode. In this case, the REVM interpreter uses the namespace revm_interpreter::instructions:: consistently, making it easy to filter all opcode implementations with a single pattern.

Conclusion

ZiskEmu's profiling capabilities provide deep insights into your program's resource consumption and performance characteristics. By understanding profiling and final costs, analyzing regions of interest, and using the various filtering and tracking options, you can effectively identify optimization opportunities and improve the efficiency of your ZisK programs.

Use profiling costs as your primary optimization metric, as they provide a direct cause-and-effect relationship with code changes. This makes them ideal for detecting where patches should be applied, validating that optimizations are working correctly, and ensuring that precompiles are being used where expected.

Remember that profiling works on any ELF file with symbols, including release builds, making it easy to analyze production-ready code without special compilation flags or instrumentation.