Oct 30, 2020

Ethereum Virtual Machine Runtime Memory

Run Time Environment

The EVM Execution Environment The environment in which bytcode is executed is the collection of resources that bytecode has access to.

Specifically, bytecode has access to:

I. Stack

II. Memory

III. Contract storage

Iv. A finite amount of gas

Various fields of information about the running transaction, including calldata, the running contract, and its caller.

Three Worlds of Data:

The stack is not the only data location you have to work with. Specifically, you have access to:

Stack:

Push, pop, dupe, and swap are the base operations you have access to.

Memory.

A location to read/write data that will only last as long as a function call.

Storage.

A location to read/write data that will persist across function calls and transactions.

Function Call Environments

Generally speaking, each function call gets its own, isolated execution environment. Namely:

A new, empty stack.
New, empty memory.
Access to the running contract address’s storage.
A finite amount of gas to run.

What does memory address being referred to as a "32-byte offset?

In Ethereum's context, if a memory address is referred to as a "32-byte offset," it means that it's pointing to a location that is 32 bytes away from a base or starting memory location.

Ethereum's memory is organized in a linear fashion, and it operates with 32-byte (256-bit) memory slots. So, a "32-byte offset" signifies a distance or displacement of 32 bytes from a particular reference point in memory.

For instance, if the base memory address is 0x00, a "32-byte offset" would imply that the memory address being referenced is 0x20 (32 bytes * 1 slot), assuming each slot is 32 bytes.

Memory offsets are crucial when dealing with memory operations like mload and mstore. They determine the location where data is read from or written to within the contract's memory. Keeping track of offsets is essential for correct memory manipulation within Ethereum smart contracts.

What is the use of push in yul or assembly?

It's used to dynamically increase the memory size during contract execution.

Push Usage in Ethereum Assembly

    6080604052

    60 80                       =   PUSH1 0x80
    60 40                       =   PUSH1 0x40
    52                          =   MSTORE

Here's an example demonstrating the usage of 'push' to allocate more memory space dynamically:

        // Define a function to allocate memory dynamically using `push`
        function allocateMemory(uint _size) public pure returns (bytes memory) {
            bytes memory dynamicData;
            assembly {
                // Allocate memory for dynamicData
                let freePtr := mload(0x40) // Load the free memory pointer
                mstore(0x40, add(freePtr, _size)) // Increase the free memory pointer
                
                // Allocate memory using `push`
                dynamicData := mload(0x40) // Load the new free memory pointer as the dynamicData
                mstore(0x40, add(dynamicData, _size)) // Move the free memory pointer ahead by _size
                
                // Update the new size in the allocated memory
                mstore(dynamicData, _size)
            }
            return dynamicData;
        }

    assembly {
        let ptr := mload(0x40) // Load the current free memory pointer
        
        mstore(ptr, 42) // Store the value 42 at the current memory pointer
        
        // Update the free memory pointer to allocate more space (if needed)
        mstore(0x40, add(ptr, 32))
    }

Here:
    - mload(0x40) retrieves the current free memory pointer, denoted by ptr.
    - mstore(ptr, 42) stores the value 42 at the location pointed to by ptr.
    - mstore(0x40, add(ptr, 32)) updates the free memory pointer (0x40) to allocate more space by moving it ahead by 32 bytes (as each slot in Ethereum memory is 32 bytes).

This sequence showcases the coordination between mload, mstore, and the adjustment of the free memory pointer. 
While push is not explicitly used here, it could be employed to increase the memory space (ptr) further if needed.

Memory Use Cases in Ethereum's EVM

Memory in the Ethereum Virtual Machine (EVM) serves several critical purposes:

1. Returning Data

When a smart contract needs to return data to the caller, it must be stored in memory first before using the 'RETURN' opcode to access and return it. Here's a simple Solidity function showcasing this:

function getData() public pure returns (uint) {
    uint value = 42;
    return value;
}

Here is an example of returning the value 7:

PUSH1 0x07
PUSH1 0x50
MSTORE

PUSH1 0x20
PUSH1 0x50
RETURN
In the above example:

The MSTORE opcode takes two arguments: The memory location and the value to store.
The memory location we are using is 0x50. In practice, we’d want to choose a location that does not clash with other data in memory (note that in this rudimentary example, it doesn’t matter which value we pick, so we choose a unique one to make the code easier to read).
The value we are storing is 0x07. Note that MSTORE always writes 32 bytes of data.
The RETURN opcode takes two arguments: The memory location of the where the data-to-return resides, and the length of that data in bytes.
The memory location we are using (0x50) is the same location we wrote to using MSTORE.
The length is 32 bytes (0x20) to return the full data we wrote using MSTORE.
Once the RETURN opcodes executes, the EVM stops execution of the function call, passing the return data to the parent context. This parent context may be the end of the transaction, or another function call that used a CALL opcode.

2. Long life spanned

Memory provides stability for data that needs to persist across multiple opcodes or internal function calls within a contract. It simplifies logic, especially considering the limitations on stack manipulation. Here's an example:

    function manipulateData(uint[] memory dataArray) public pure returns (uint) {
        // Work with the dataArray over multiple opcodes
        uint sum = 0;
        for (uint i = 0; i < dataArray.length; i++) {
            sum += dataArray[i];
        }
        return sum;
    }


Sometimes there will be data that you need to keep track of and continuously update throughout the lifespan of your function call, but not something you need to persist across function calls. Memory is the perfect use for this.

Arguably, the most common example is Solidity’s “free memory pointer”. This value exists at memory location 0x40. Semantically, it represents “the next location in memory that hasn’t been used yet” (read more at 💠 How Solidity Manages Memory (coming soon)).

This is useful not just for Solidity, but also for any bytecode architecture that allocates memory on a dynamic basis. It allows you to always have a handle on where to put new data, without conflicting with or overriding existing data.

3. Large data

For data larger than the EVM's 32-byte word size, memory serves as a workspace. Here's an example of handling larger data:

    function processLargeData() public pure returns (bytes memory) {
        // Generate a large data set
        bytes memory largeData = new bytes(1000);
        for (uint i = 0; i < largeData.length; i++) {
            largeData[i] = bytes1(uint8(i));
        }
    return largeData;
}

Stack items are always 32 bytes in length. Memory is not limited in this respect.

A common example is contract deployment. During the contract deployment process, your bytecode needs to return the final state of the new contract’s bytecode.

This means your deployment bytecode is returning runtime bytecode, as a value in memory. This runtime bytecode will surely be larger than 32 bytes – up to 24 kilobytes on Layer 1. This is much larger than what can fit onto the stack.

In summary, memory usage in Ethereum's EVM is crucial for returning data, maintaining data across opcodes or function calls, and handling larger data sizes that exceed the 32-byte word size limit of the EVM.

Questions to Ponder:

What happens if a contract returns more than the specified length?
- All memory up to the specified length is overwritten with return data.
- Extra data is truncated.
What happens if a contract returns less than the specified length?
- All returned data is written to memory.
- Extra memory remains unchanged.
While in stack, Loading address information directly from storage vs. memory?