Optimizing the Ethereum Execution Engine for Succinct Proofs with Valida

Morgan Thomas

Apr 23, 2025

Lita proposes an extension of the Valida ISA as a new Ethereum execution environment, to replace the Ethereum Virtual Machine (EVM). As a result of these efforts, executions should happen faster on Ethereum, while costing less to prove, and the implementation of Ethereum should be more straightforward to verify. As the ISA for the Ethereum execution environment, an extension of Valida would offer the same advantages as an extension of RISC-V, but also the advantage of faster, more efficient succinct proving of executions of smart contracts.

1. Context

Let “the Ethereum execution environment” mean the environment in which smart contracts execute. Currently, the Ethereum execution environment is synonymous with the Ethereum Virtual Machine (EVM). “The EVM” refers either to:

the EVM instruction set architecture (ISA), which is the binary code format that is directly executable by Ethereum, or
any software implementation of the EVM ISA.

On 20 April, 2025, Vitalik Buterin proposed that the Ethereum execution environment should be decoupled from the EVM ISA, and redesigned based on the RISC-V ISA. Buterin writes:

"This post proposes a radical idea for the future of the Ethereum execution layer, one that is equally as ambitious as the beam chain effort is for the consensus layer. It aims to greatly improve the efficiency of the Ethereum execution layer, resolving one of the primary scaling bottlenecks, and can also greatly improve the execution layer’s simplicity - in fact, it is perhaps the only way to do so.

The idea: replace the EVM with RISC-V as the virtual machine language that smart contracts are written in."‍
‍
In response to this, Lev Soukhanov wrote:

"My recommendation is, instead, constructing a proof-friendly architecture with minimal MMU allowing to run contracts as separate executables; I don’t think it should be RISC-V; rather a separate ISA - ideally, aware of limitations dictated by SNARK protocols. Even ISA resembling some subset of opcodes of EVM will likely be better (+ as we are aware, the precompiles will be with us whether we want it or not, so RISC-V doesn’t give any simplification here)."

Source: Soukhanov, Lev. Comment on Buterin, Vitalk. “Long-term L1 execution layer proposal: replace the EVM with RISC-V.” Ethereum Magicians. Apr 20, 2025.

Both proposals by Buterin and Soukhanov aim to simplify the ISA of the Ethereum execution environment, in order to improve the verifiability of Ethereum, the efficiency of smart contract executions, and the efficiency of proving smart contract executions.

Soukahov departs from Buterin in Soukhanov’s denial that a RISC-V Ethereum execution environment will obviate the need for precompiles. The need for precompiles is also supported by a comment by Ben Adams on the same post by Buterin.

Soukhanov also departs from Buterin on the idea that RISC-V in particular is a good choice of ISA for the Ethereum execution environment. Instead, Soukhanov recommends that a special-purpose ISA be devised for the purpose of this project, which Soukhanov advises should be designed with consideration of SNARK proving.

‍2. Proposal

The Ethereum execution environment is in need of rearchitecture to support further scaling of Ethereum to higher transaction volumes and greater complexity of transactions. One of the major issues is that the EVM uses a 256-bit word size. This results in higher memory consumption and resulting loss of memory locality, as well as considerably more expensive arithmetic operations. All of this is an obstacle to efficient interpretation or JIT compilation of EVM code.

A further issue with the EVM architecture is that its instructions have relatively complex semantics. This results in more complex implementations which have more opportunities for error, making auditing, testing, and formal verification efforts more challenging.

For purposes of simplicity and efficient execution using JIT compilation, Valida and RISC-V core (RV32IM) are both good starting points. Both of them support efficient code generation for mainstream programming languages such as C and Rust. Valida and RV32IM are both about equally complex, and they are substantially simpler than alternatives such as WASM or EVM.

For purposes of efficiently making succinct proofs of execution, Valida is a better starting point than RISC-V. The Valida ISA was designed specifically for making succinct proofs of execution. The main difference is that RISC-V has a bank of 31 general-purpose registers, whereas Valida has no general-purpose registers and instead, most Valida opcodes directly address stack operands held in RAM.

In a CPU, a register is a memory location that is located relatively close to the control unit and arithmetic logic unit (ALU), offering relatively fast read-write latency. A register holds a relatively small amount of data: typically one word, a small number of words, or as little as one bit. A general-purpose register is typically used for holding inputs and outputs of arithmetic and logical operations.

Registers are the lowest-latency form of volatile memory, with the least storage capacity. The next lowest-latency form of volatile memory is L1 cache, followed by L2 cache, etc., and then RAM. As a rule, lower latency implies less storage capacity. The fact that information travels no faster than the speed of light within computer hardware explains this. This is known as the principle of memory locality: memory which is closer to the point of processing is faster to access.

The principle of memory locality has a very pronounced effect on the performance of code running on hardware, since memory access latency is often much higher than processing latency. In the context of SNARK proving, the principle of memory locality does not apply in the same way. There is still a general tendency that accessing smaller memories has less cost, but this is less pronounced than in the case of CPUs. SNARKs work with immutable, timeless mathematical relations, whereas hardware works with chains of cause and effect. Since information does not travel through space and time in the relations of SNARK proofs, there is no principle of memory locality for SNARKs in the physical sense having to do with the speed of information travel. In SNARK proving, there is a general tendency that updating smaller memories can be done by committing to less information, and this can make the costs of accessing smaller memories less.

In common with all modern CPU architectures, the architecture of RISC-V uses general-purpose registers to store inputs and outputs of logical and arithmetic operations. This results in a need to move data between RAM and registers, particularly at function call boundaries, where the contents of registers must be saved and restored by the caller and/or the callee (according to the calling convention). Compared to Valida, code generation for RISC-V will tend to emit more opcodes that deal with loading and storing data.

The use of general-purpose registers has a cost in terms of complexity of generated code. In the context of a CPU architecture, general-purpose registers have a benefit which outweighs the cost. On a typical program, the processor runs much faster than it would if it did not have general-purpose registers. On the other hand, in SNARK proving, there is not a benefit that outweighs the cost for having general-purpose registers. As Lita, we believe that this is a major reason why according to our testing and the feedback we receive from users, Valida offers faster proving compared to zk-VMs using RISC-V.

‍

3. Addressing Twist and Shout‍

The Twist and Shout paper by Srinath Setty and Justin Thaler (2025) introduces a novel memory argument which is more sensitive to memory locality, compared to its peers. Setty and Thaler argue that this innovation may obviate the need for ISAs designed for SNARK proving:

Source: Setty, Srinath and Thaler, Justin. “Twist and Shout: Faster memory checking arguments via one-hot addressing and increments.” Cryptology ePrint Archive, Paper 2025/105. P. 23.

‍
Later in the same paper, Setty and Thaler provide a quantitative description of the sensitivity of the Twist memory argument costs to memory locality:

‍
It is worth noting that in real terms, the benefits of memory locality are less in the context of the Twist memory argument than in the case of hardware. In the case of Twist, for proving a memory access, if you double the amount of time elapsed since the last access to the same address, then you increase the marginal cost of proving the memory access by one to three field multiplications. Another way of thinking about this is that if you double the median time between accesses to an address of a program execution, then you increase the median marginal cost of a memory access by one to three field multiplications.

For comparison, here is a chart showing the memory access latencies for different types of memory. These numbers are as of 2012, and a test system is not specified, so they should only be taken as rough, order of magnitude and relative measures:

Source: Bonér, Jonas. “Latency Numbers Every Programmer Should Know.” GitHub, 2012.

This chart indicates that a reference to RAM has roughly 200x the latency as a reference to L1 cache. This is a larger difference, proportionally, than the biggest difference that is possible between memory access costs for a Twist memory with 32-bit or 64-bit capacity. Since this analysis did not include register access latencies, which are less than L1 cache access latencies, the actual difference between the smallest and the largest memory access latencies in a CPU can be expected to be greater than 200x.

Twist offers lower costs for accessing smaller memories, but also lower costs for accessing locations in larger memories which were more recently accessed. The fact that accessing smaller memories has lower costs may provide a reason to have general-purpose registers in an ISA for making proofs of execution based on Twist. However, it may be unnecessary to have general-purpose registers in such an ISA due to the fact that accessing recently accessed memory locations is also cheaper in Twist.

In Valida, the values that would otherwise be stored in general-purpose registers are instead stored on the stack. The stack in Valida is a downward growing region of memory. Each currently executing function call has its own stack frame, which is a portion of the stack that holds the values of its local variables. The frame pointer register holds a pointer to the beginning of the current stack frame. When a function call returns, the frame pointer register has its previous value restored to it.

Due to the way the stack works, it is likely that over most of any part of any given program’s execution, a certain region of the stack remains “hot,” i.e., frequently accessed. This is the region around the median value of the frame pointer. If each part of the execution concentrates its memory accesses in a certain region of the stack, keeping that region hot, then proving the execution will benefit from lower memory access costs in the Twist memory argument.

The current argument does not take a position on whether the Twist memory argument is more efficient than the memory argument used in the current Valida prover. The argument is that either way, Valida is likely to provide a benefit over RISC-V for SNARK proving. Even in the context of a Twist-based memory argument, it is likely that in the context of SNARK proving, savings on memory access proving costs from using general-purpose registers are wiped out by the increased number of instructions that need to be emitted to deal with moving local variable values in and out of general-purpose registers. This hypothesis has not been tested, but it would be worthwhile to test it.

There is no ruling out the possibility that future innovations in memory consistency arguments will have costs that are more sensitive to memory sizes and locality. More generally, there is no ruling out the possibility that an ISA designed for execution proving using the currently known SNARKs will be unsuited to the best SNARKs in the future. What can’t change because of future innovations is that there is at least potentially the opportunity to design ISAs which are more suited to SNARK proving by taking into account the constraints of SNARK proving when designing an ISA. Given that Ethereum has the goal of supporting efficient succinct proving of smart contract executions, it makes sense to consider what is the best ISA for SNARK proving based on the best current information. As of now, given the goals driving the proposed project of rearchitecting Ethereum’s execution environment, Valida is probably the best choice as a starting point for designing a new ISA for Ethereum. Compared to RV32IM, Valida has most of the same advantages, but also the advantage of being more optimal for succinct execution proving.

‍

4. Compiler technology
‍

A key consideration in choosing an ISA for Ethereum is the availability of compiler technology for that ISA. The current proposal envisions that a new ISA would be invented for Ethereum, which would be an extension of Valida. Of course, there is no compiler targeting an ISA that has not yet been invented. There is, already, a compiler toolchain targeting Valida. It is able to compile code written in C, Rust, or any language that can be compiled to WASM. It would be relatively easy to extend this compiler toolchain to support specialized opcodes for Ethereum. Lita is experienced in doing this kind of work.

Valida is, as far as Lita knows, the only ISA designed for succinct proving which has an actively developed prover and a compiler toolchain for general purpose systems programming languages. Using general purpose systems programming languages as source languages has major advantages, because it leverages widely available skill sets and mature compiler pipelines with sophisticated optimization passes. These advantages are not easily replicated in the context of special-purpose programming languages designed for succinct proving. This is a key difference between Valida and comparable toolchains such as Starkware’s Cairo toolchain or Lurk Lab’s Lurk toolchain, which use special-purpose programming languages designed for succinct proving.

One argument that can be made for RISC-V over Valida is that the compiler technology for RISC-V is more mature. While this is true, Lita’s C and Rust compiler toolchain for Valida is fairly mature. It has an extensive test suite, including the relevant parts of the Rust standard library test suite. Also, most of the code in Lita’s compiler toolchain is not specific to Valida and was not created by Lita. Lita’s Valida compiler toolchain is built on open source code, specifically LLVM and the official Rust compiler toolchain. These code bases contain millions of lines of code. The Valida-specific part of the toolchain is very small in comparison to the part that is not Valida-specific. The high complexity of this toolchain creates a high potential for bugs in code generation. However, this issue is not specific to Valida; most of the same code is also used for compiling code to target RISC-V and other major processor architectures. Also, the open source code in question is extensively tested and widely used, and has many eyes on it, all of which mitigates the possibility of errors in code generation.

‍

All Articleschevron_right

Education

What is Lita?

Lita Team

Mar 25, 2025

Introducing Lita Studios

Lita Team

Mar 25, 2025

Keccak Acceleration Chip and Benchmarks

Lita Team

Mar 18, 2025

Optimizing the Ethereum Execution Engine for Succinct Proofs with Valida

1. Context

‍2. Proposal

3. Addressing Twist and Shout‍

4. Compiler technology‍

Related Articles

What is Lita?

Introducing Lita Studios

Keccak Acceleration Chip and Benchmarks

4. Compiler technology
‍