Education
Sep 4, 2024
A zkVM (Zero-Knowledge Virtual Machine) is a powerful type of virtual machine that utilizes zero-knowledge proofs (ZKPs) to guarantee the integrity and privacy of computations. Zero-knowledge proofs are a cryptographic technique that enables one party to prove the validity of a statement to another party without revealing any additional information. In simpler terms, it's like proving you know a secret without actually revealing the secret itself.
zkVM is driving the future of verifiable computation. This technology holds immense potential to revolutionize various sectors by offering robust solutions for data privacy, secure transactions, decentralized finance (DeFi), regulatory compliance, and much more. By enabling the verification of computations without revealing sensitive information, zkVM empowers a paradigm shift towards trustless systems.
This article delves deep into the world of zkVMs. We'll explore the fundamentals of ZKPs, unveil the inner workings of zkVMs, introduce the concept of zkEVMs (a specific type of zkVM), and equip you with a framework for evaluating different zkVM solutions.
This is part one of a three-part series:
“Within the next 5 years, we will be talking about applications of zero-knowledge protocols the way we talk about applications of blockchain protocols. The potential unlocked by the breakthroughs of the last few years is going to take the mainstream by storm”
- Jill Gunter from Espresso Systems, May 2021
Since 2021, the zero knowledge landscape(ZK) has evolved into a diverse ecosystem of primitives, networks, and applications spanning multiple sectors. However, most of ZK remains a mystery to its users and the crypto space at large, despite its development history and recent progress marked by the launches of ZK powered rollups such as Starknet and zkSync Era.
But times are changing. It is our view that zero knowledge cryptography is a powerful, universally adoptable tool for scaling and securing any software. Simply put, ZK is the bridge to mass adoption of cryptography. Quoting Jill once more, an enormous amount of value (both fundamental and speculative) will be created around anything that touches zero knowledge proofs, both in web2 and web3. The best minds in crypto are diligently iterating to make ZK economically feasible and production ready. Even so, the industry admittedly has some progress to make before our envisioned paradigm becomes a reality.
To draw parallels between ZK and Bitcoin adoption, one of the reasons Bitcoin has evolved from a fringe hobbyist forum internet currency to BlackRock approved “digital gold” is the proliferation of developer and community generated literature that cultivated interest. As it stands, ZK exists in a bubble within a bubble. Information is scattered and polarized; articles are either littered with jargon far too difficult to understand, or too layman to convey anything meaningful beyond recycled examples (e.g. Where’s Waldo). It seems as though everyone (experts and laymen) knows what zero knowledge is, but no one can describe how it actually works.
As one of the teams contributing to the zero knowledge paradigm, we want to demystify our work and help a broader audience build a canonical foundation for understanding and analyzing ZK systems and applications, in order to facilitate education and discussions between interested parties, and enable the proliferation of relevant information.
tldr; You can’t grow zero knowledge with zero knowledge.
If you have no prior knowledge of zero knowledge proofs (ZKP), this video from Wired explains the concept at five difficulty levels in an interactive manner with easily understandable examples and demonstrations. Highly recommended.
In simplest terms, ZKPs enable one party (prover) to prove to another party (the verifier) that they know something without revealing what that thing is or any additional information. More specifically, ZKPs prove knowledge of a piece of data, or knowledge of the result of a computation, without revealing the data or inputs. The process of creating a zero knowledge proof involves a series of mathematical models to transform the results of a computation into a piece of otherwise meaningless information that proves successful execution of code, which can later be verified.
In some cases, it takes less work to verify the proof, which is constructed after multiple rounds of algebraic conversions and cryptography, than it would take to run the computation. This unique combination of security and scalability is what makes zero knowledge cryptography such a powerful tool.
(Note: Succinct’s bridge uses SNARKs but SP1 is a STARK based protocol)
It is worth noting that all STARKs are SNARKs, but not all SNARKs are STARKs.
For a better general understanding of SNARKs and STARKs, we recommend reading this article series written by Yan Zhang and Yi Sun of Axiom, and this collection of articles in Ventali Tan’s github.
A virtual machine (VM) is a program that runs programs. In context, a zkVM is a virtual computer that is implemented as a system for generating zero knowledge proofs, or a universal circuit, or tool, for generating ZKPs for any program or computation.
zkVMs remove the need to learn complicated mathematics and cryptography for designing and coding ZK, and enables any developer to execute programs written in their preferred languages and generate ZKPs, making it far easier to integrate and interact with zero knowledge. Broadly speaking, most references to zkVMs implicitly include the compiler toolchains and proof systems appended to the virtual machine that executes the programs, and not just the virtual machine itself. Below, we summarize the main components of a zkVM and their functions:
The design and implementation of each component is governed by the choice of proof (SNARKs or STARKs) and the instruction set architecture (ISA) of the zkVM. Traditionally, an ISA specifies what a CPU is capable of (data types, registers, memory etc.) and the sequence of actions the CPU performs when it executes a program. In context, the ISA determines the machine code that is interpretable and executable by the VM. Choosing an ISA can yield radical differences in the accessibility and usability of the zkVM, as well as the speed and efficiency of the proof generation processes, and underpins the construction of any zkVM.
Below are some examples of zkVMs and their components for your reference.
For now, we will focus on the interactions between each component at a high level to provide a framework for understanding the algebraic and cryptographic processes as well as the design tradeoffs of a zkVM in a later article.
zkEVM (Zero-Knowledge Ethereum Virtual Machine) is a specialized type of zkVM tailored for the Ethereum blockchain. It is designed to execute Ethereum smart contracts using ZKPs, maintaining compatibility with the Ethereum Virtual Machine (EVM). The primary advantage of zkEVM is its ability to enhance Ethereum's scalability and privacy. By processing multiple transactions off-chain and then proving their validity on-chain with a single proof, zkEVM reduces the computational load on the Ethereum mainnet and ensures user privacy. This makes zkEVM an essential tool for improving the efficiency and security of Ethereum-based applications and transactions.
The relationship between zkVM and zkEVM lies in their shared use of zero-knowledge proofs to provide privacy and scalability. While zkVMs offer a general-purpose solution for various blockchains and applications, zkEVMs are specifically optimized for the Ethereum ecosystem. Both technologies play a crucial role in the future of blockchain, addressing the growing need for private, secure, and efficient trustless interactions.
The following diagram is an abstracted, generalized process flowchart of a zkVM, split and categorized between the format (inputs / outputs) of a program as it moves through the components of a zkVM. We will examine each process in depth in subsequent articles.
The process flow of a zkVM is generally as follows:
1. The compiler first compiles programs written in conventional languages (C, C++, Rust, Solidity) into machine code. The format of the machine code is dictated by the choice of the ISA.
2. The VM executes the machine code and generates an execution trace, which is the series of steps of the underlying program. The format of this is predetermined by the choice of arithmetization as well as the set of polynomial constraints. Common arithmetization schemes include R1CS as in Groth16, PLONKish arithmetization as in halo2, and AIR as in plonky2 and plonky3.
3. The prover receives the trace and represents it as a set of polynomials bound by a set of constraints, essentially translating the computation into algebra by mapping out the facts mathematically.
4. The prover commits to these polynomials using a Polynomial Commitment Scheme (PCS). A commitment scheme is a protocol which allows the prover to create a fingerprint of some data X, which is called a commitment to X, and later prove facts about X without revealing X, using the commitment to X. The PCS is the fingerprint; a “pre-processed'' succinct version of the constraints on the computation. This allows the prover to prove facts about the computation, now expressed in a polynomial equation, using random values proposed by the verifier in the following steps.
5. The prover runs a Polynomial Interactive Oracle Proof (PIOP) to show that the committed polynomials represent an execution trace which satisfies the given constraints. A PIOP is an interactive proof protocol consisting of a series of rounds in which the prover sends commitments to polynomials, the verifier responds with random field values, and the prover provides evaluations of the polynomial at these random values, akin to “solving” the polynomial equation using random values to probabilistically persuade the verifier.
6. Applying the Fiat-Shamir heuristic; the prover runs the PIOP in a non-interactive mode, where the verifier’s behavior is limited to sending pseudo-random challenge points. In cryptography, the Fiat–Shamir heuristic converts an interactive proof of knowledge into a digital signature for verification. This step encrypts the proof and makes it zero knowledge.
7. The prover must convince the verifier that the claimed polynomial evaluations are correct, with respect to the polynomial commitments it sent to the verifier. To do this, the prover produces an “evaluation” or “opening” proof, which is provided by the polynomial commitment scheme (fingerprint).
8. The verifier checks the proof by following the proof system’s verification protocol, either using the constraints or the commitment. The verifier either accepts or rejects the result according to the validity of the proof.
Summarily, a zkVM proof proves, for a given program, a given result, and given initial conditions, that there is some input which causes the program to produce the given result when executed from the given initial conditions. We can combine this statement with the process flow to arrive at the following description of a zkVM.
A zkVM proof proves, for a given VM program and a given output, that there is some input which causes the given program to produce the given output when executed on the VM.
What are the criteria by which we should evaluate zkVMs? In other words, when should we say that one zkVM is better than another? Truthfully, the answer depends on the use case.
Lita’s market research suggests that for most commercial use cases, the most important properties, out of speed, efficiency, and succinctness, are either speed or core-time efficiency, depending on the application. Some applications are price sensitive and want to optimize for low energy consumption and low use of capital in proving; for these, core-time efficiency is probably the most important metric to optimize. Other applications, particularly finance or trading related applications, are latency sensitive and want to optimize for speed.
Most publicized comparisons of performance focus exclusively on speed, which is important but not a holistic measurement of performance. There are also a few important properties that measure the reliability of a zkVM; most of which are not up to production ready standards, even for market leading incumbents.
We hereby propose that zkVMs should be evaluated on the following criteria, separated into two subsets:
When evaluating zkVMs for mission critical applications, correctness and security should be considered as the baseline. There needs to be sufficient reason to be confident about the correctness, and sufficiently strong claimed security. Also, the trust assumptions need to be sufficiently weak for the application.
Without these properties, the zkVM is potentially worse than useless for the application, as it may fail to perform as specified and expose users to hacking and exploits.
Correctness is comprised of three properties:
You can have completeness without soundness; if the proof system proves everything including falsity, obviously it is complete but not sound. Inversely, you can have soundness without completeness; if the proof system proves a program existed but does not cannot prove the computations, obviously it is sound (after all, it never proves any falsity), but not complete.
When zkVMs have trust assumptions, this usually takes the form of a trusted setup process. A setup process for a ZK proof system runs once, before the first proof is generated using the proof system, in order to generate some information called “the setup data.” In a trusted setup process, one or more individuals generate some randomness which gets incorporated into the setup data, and it needs to be assumed that at least one of those individuals deleted the randomness which they incorporated into the setup data.
There are two common trust assumption models in practice.
An honest majority trust assumption states that out of some set of N individuals, more than N/2 of those individuals exhibited integrity in some particular interaction(s) with the system, which is commonly used by blockchains
A “1 out of N” trust assumption states that out of some set of N individuals, at least one of those individuals exhibited integrity in some particular interaction(s) with the system, which is commonly used by MPC based tools and applications.
It is generally agreed that all else being equal, zkVMs with no trust assumptions are more secure than zkVMs which require trust assumptions.
In practice, all three correctness properties have non-zero tolerances. This implies all proofs are statistical probabilities of correctness, and not absolute certainties. A tolerance refers to the maximum tolerable probability that one property will fail. Zero tolerances are of course the ideal, but zkVMs do not achieve zero tolerances on all of these properties in practice. Perfect soundness and completeness appear to be incompatible with succinctness, and there are no known methods for achieving perfect zero knowledge. A common way of measuring security is in bits of security, where a tolerance of 1 / (2^n) is referred to as n bits of security. More bits of security is better.
If a zkVM is perfectly correct, that does not necessarily imply that it is reliable. Correctness only implies that the zkVM satisfies its security properties up to the claimed tolerances. It does not imply that the claimed tolerances are low enough to be market ready. Also, if a zkVM is sufficiently secure, that does not imply that it is correct; security refers to the claimed tolerances, not the tolerances that are actually achieved. Only when a zkVM is both perfectly correct and sufficiently secure can it be said that the zkVM is reliable up to the claimed tolerances.
When we talk about security here, it primarily relates to the design and evaluation metrics of a zkVM. However, zero-knowledge security also refers to systems and protocols that leverage zero-knowledge proofs to enhance privacy and security.
For example, zero-knowledge cloud storage is an application of zero-knowledge security. It uses ZKPs to ensure that service providers cannot access encryption keys or the stored data. This guarantees that only the data owner can decrypt and access their information, providing a high level of privacy and security. By applying zero-knowledge security principles, zero-knowledge cloud storage protects sensitive data from unauthorized access and breaches, ensuring robust data privacy and integrity.
Speed, efficiency, and succinctness are all sliding scale properties. All of these factors contribute to the end user cost of zkVM. How they should be weighted in an evaluation depends on the application. Often, the fastest solution is not the most efficient or the most succinct; the most succinct solution is not the fastest or the most efficient; and so on and so forth. Let’s first define each property before explaining their relationship
Speed should be defined and measured relative to specific test programs, inputs, and systems to ensure it can be quantitatively assessed. This metric is critical for latency-sensitive applications where prompt availability of proofs is essential, but comes with higher overhead and larger proof sizes
The prover consumes two kinds of resources: core-time and space. Efficiency can therefore be subdivided into core-time efficiency and space efficiency.
Core-Time Efficiency: Average amount of time the prover operates across all cores multiplied by number of cores running the prover. For a single-core prover, the core-time consumption and the speed are the same thing. For a multi-core capable prover running in multi-core mode on a multi-core system, core-time consumption and speed are not the same thing. If a program fully utilizes 5 cores or threads for 5 seconds, that would be 25 seconds of user time and 5 seconds of wall clock time.
Space Efficiency: Refers to the storage capacity used, such as RAM
User time is interesting as a proxy for the energy consumed by running a computation. In a situation where all cores are fully utilized almost all the time, the energy consumption of a CPU should remain relatively constant. In this situation, the user time spent by a CPU-bound, predominantly user-mode code execution should be roughly linearly proportional to the number of watt-hours (i.e., energy) consumed by that code execution.
Economizing on energy consumption, or the usage of computing resources, should be interesting from the point of view of any proving operations which have enough scale that their energy bill (or their cloud compute bill) for proving is a considerable operational cost. For these reasons, user time is an interesting metric. Lower costs of proving allow service providers to pass on lower proving prices to cost-sensitive customers.
Both kinds of efficiency are related to the energy consumption of the proving process and the amount of capital utilized by the proving process, which relates to the financial cost of proving. To operationalize the definition of efficiency for measurement, the definition has to be made relative to one or more test programs, one or more test inputs to each of those programs, and one or more test systems.
Succinctness is a composite of three distinct metrics, with the complexity of proof verification further subdivided:
Verification is typically a single core operation, thus speed and core-time efficiency are generally equivalent in this context. As with speed and efficiency, operationalizing the definition of succinctness requires specifying sets of test programs, test inputs, and test systems.
With each performance property defined, we now illustrate the dimunitive effects of optimizing one property over the others.
In general, optimizing for one quality means not optimizing for another quality, and therefore a multi-dimensional analysis is needed to pick an optimal solution on a case by case basis.
A good way of weighting these properties in an evaluation may be to define acceptable levels for each property and then determine which properties are most important. The most important properties should be optimized, subject to maintaining good enough levels on all other properties.
Below we summarize each property and their key considerations:
With the above table, we hereby conclude the first article of our series. In the next articles, we will build on the flow chart shown above to explain common arithmetic and cryptographic processes in zkVMs.
If you found this helpful, visit our website and gitbook to learn more about what we are building at Lita.
Also, follow us on X and Discord to stay updated so you don’t miss the rest of the series
________________________________________
Disclaimer:
The content of this blog, including text, graphics, and images, is protected by copyright © 2024 Lita.Foundation. All rights reserved. Unauthorized use and/or duplication of this material without express and written permission from this site’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Lita.Foundation with appropriate and specific direction to the original content.
For any questions or permissions, please contact Lita Foundation.