Skip to main content

Transpilation

This page explains how the transpiler bridges a standard RISC-V ELF and the ZisK ISA. It covers why transpilation runs at the start of every emulation rather than being cached, how ZisK's 8-bit instruction alignment leaves room to expand awkward RISC-V instructions, a worked layout example, and the address map the transpiled ROM, input data, and RAM live in at runtime.

The ISA & processor page described the shape the ZisK processor expects every instruction in: c = op(a, b) with four specific-purpose registers. But the binary you compile with cargo-zisk build is still a standard RISC-V ELF — RV64IMA, the hardware-style ISA. Something has to bridge those two worlds, and that something is the transpiler.

Why transpile at runtime?

Transpilation runs at the start of every emulation, not once and cached. That sounds expensive, but in practice it's no slower than reading a previously transpiled ZisK binary from disk:

ApproachWhat it costs
Cache the transpiled ZisK ROMDisk I/O proportional to ROM size. Plus a versioning/invalidation story every time the transpiler changes.
Re-transpile from the RISC-V ELFCPU time proportional to ELF size. Cheap enough to be unnoticeable. No cache invalidation, no stale-output footguns.

Re-transpiling is the path of least friction: the RISC-V ELF is already the canonical artifact (the verification key is anchored to its transpiled form), so making the in-memory ZisK ROM a deterministic function of it removes a whole category of "which file is the real one?" questions.

The output of the transpiler is a ZisK ROM: a map from program counter to ZisK instruction, ready for the emulation loop in ISA & processor → How an instruction is emulated to walk through.

Alignment

The translation between the two ISAs is mostly 1:1 — one RISC-V instruction becomes one ZisK instruction — but a handful of RISC-V instructions don't have a single-instruction ZisK equivalent, and the alignment scheme is designed to absorb the slack:

ISAInstruction widthAlignment
RV64IMA32 bits32-bit
RV64IMAC (with the C extension)16 or 32 bits16-bit
ZisKvariable8-bit

ZisK's 8-bit alignment is the key choice. It means every RISC-V instruction's address — 32-bit aligned, so a multiple of 4 — has room for four ZisK instructions before colliding with the next one. That headroom is exactly what the awkward cases need:

RISC-V instruction classHow it transpiles
Most RV64IMA instructionsOne ZisK instruction at the same address. The byte slots in between stay empty.
Atomic instructions2–4 ZisK instructions packed into the slots starting at the RISC-V instruction's address.
C-extension instructionsUsually one ZisK instruction; sometimes two when the 16-bit encoding implies more work than a single ZisK op can express.

The same scheme also makes it impossible to confuse one program's transpiled code with another's: ZisK addresses match RISC-V addresses everywhere a base RISC-V instruction lives, so RISC-V branch targets land on real ZisK instructions without any remapping table.

Example

Take a snippet of RISC-V code starting at address 0, with one instruction of each kind for illustration:

The transpiler walks each RISC-V instruction and emits one or more ZisK instructions at the same address, leaving the in-between slots empty:

ZisK instructions don't actually have a fixed bit length — the diagram represents them as 8-bit because that's the alignment granularity. The "no code" gaps are exactly the slack that absorbs the variable-width RISC-V layout.

The address map

The transpiled ROM doesn't live alone in the address space. The ZisK runtime (ziskos) reserves a few regions for its own plumbing — entry/exit handlers, input data, RAM. Knowing where each region sits is useful when reading a guest program's disassembly or chasing down a memory issue.

High level

FromToRegion
0x10000x10000000ziskos ROM
0x400000000x7FFFFFFFInput data
0x800000000x88000000Program ROM
0xA00000000xBFFFFFFFRAM

The four regions split cleanly by purpose: ziskos ROM is runtime support, the input region is where the host's input bytes are placed before execution begins, the program ROM is where the transpiled guest lives, and RAM is read/write working memory.

Inside the ROM region

AddressSymbolWhat's there
0x1000ROM_ENTRYWhere execution starts. Jumps to the program launcher.
0x1004ROM_EXITWhere execution ends. Returning here halts the program.
0x1008FLOAT_HANDLER_ADDRSoft-float trampoline. Saves the RISC-V register file, calls into the float library, restores the register file. Used whenever an F or D instruction is transpiled.
0x1110Program launcherWrites initial values for ROM and RAM globals, then jumps to the first guest instruction at 0x80000000. Also hosts the syscall trap handler, including the exit sequence that commits public outputs and jumps to ROM_EXIT.
0x80000000Program codeThe transpiled guest. Execution ends with a syscall that returns control to the launcher's exit path.
0x87F00000Float soft-libraryThe code the soft-float trampoline jumps into.

Inside the RAM region

AddressSymbolWhat's there
0xA0000000RAM_ADDR / SYS_ADDRziskos system RAM.
0xA0010000OUTPUT_ADDRThe buffer the guest writes its public outputs into. After proving, the host reads from here.
0xA0030000Program RAMThe guest's general-purpose heap and stack.
0xBFFF0000Float soft-library RAMScratch space the soft-float code uses.

Where this picks up

You now know what runs (the ISA) and what it runs on (the processor and the transpiled ROM). The next page, Arithmetization, opens up the layer beneath: how an execution becomes a system of polynomial constraints the prover can prove.