Internals of Solidity

TL;DR

the only place where the data location can be omitted

Data Location

Currently, reference types comprise structs, arrays and mappings.

Fixed-size byte array is a value type.

Every reference type has an additional annotation, the “data location”, about where it is stored. If you use a reference type, you always have to explicitly provide the data area where the type is stored:

memory: its lifetime is limited to an external function call;
storage is the location where the state variables are stored, where the lifetime is limited to the lifetime of a contract;
calldata: a non-modifiable, non-persistent area where function arguments are stored, and behaves mostly like memory.
- Try to use calldata as data location if possible because it will avoid copies and also makes sure that the data cannot be modified.
- Arrays and structs with calldata data location can also be returned from functions, but it is not possible to allocate such types.

A function parameter in solidity can be stored either in memory or the calldata. If the function is an entry point to the contract, called directly from a user (using a transaction) or from a different contract, then the parameter’s value can be taken directly from the call data. If the function is called internally, then the parameters have to be stored in memory. From the perspective of the called contract calldata is read only.

With scalar types such as uint or address the compiler handles the choice of storage for us, but with arrays, which are longer and more expensive, we specify the type of storage to be used. Return values are always returned in memory.

case	data location
local variable	`memory` or `storage`
state variable	only `storage`
parameter of internal function	`memory` or `storage`
parameter of external function	`calldata`

Solidity by default puts complex data types, such as structs, in storage when initializing them as local variables.

An assignment or type conversion that changes the data location will always incur an automatic copy operation, while assignments inside the same data location only copy in some cases for storage types.

Assignments between storage and memory (or from calldata) always create an independent copy.
Assignments from memory to memory only create references.
- This means that changes to one memory variable are also visible in all other memory variables that refer to the same data.
Assignments from storage to a local storage variable also only assign a reference.
All other assignments to storage always copy.
- Examples for this case are assignments to state variables or to members of local variables of storage struct type, even if the local variable itself is just a reference.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
pragma solidity >=0.5.0 <0.9.0;
contract C {
    // The data location of x (state variable) is storage.
    // This is the only place where the data location can be omitted.
    uint[] x;
    function f(uint[] memory memoryArray) public {
        x = memoryArray;        // works, copies the whole array to storage
        uint[] storage y = x;   // works, assigns a pointer, data location of y (local variable) is storage
        y[7];       // fine, returns the 8th element
        y.pop();    // fine, modifies x through y
        delete x;   // fine, clears the array, also modifies y

        // The following does not work; it would need to create a new temporary unnamed (local) array in storage, 
        // but storage is "statically" allocated:
        y = memoryArray;    // does not work

        // Similarly, "delete y" is not valid, as assignments to local variables
        // referencing storage objects can only be made from existing storage objects.
        // It would "reset" the pointer, but there is no sensible location it could point to.
        // For more details see the documentation of the "delete" operator.
        delete y;           // does not work

        g(x); // calls g, handing over a reference to x
        h(x); // calls h and creates an independent, temporary copy in memory
    }
    function g(uint[] storage) internal pure {}
    function h(uint[] memory) public pure {}
}

y 的生命周期是整个合约的声明周期，而 memoryArray 的声明周期是一次函数调用，y = memoryArray 要能成立，意味着 memoryArray 必须要在 storage 中创建临时副本并让 y 引用该副本，而 “assignments to local variables referencing storage objects can only be made from existing storage objects”。

Layout of State Variables in Storage

Except for dynamically-sized arrays and mappings (see below), data is stored contiguously item after item starting with the first state variable, which is stored in slot 0.

For each variable, a size in bytes is determined according to its type.
State variables of contracts are stored in storage in a compact way such that multiple values sometimes use the same storage slot.

Multiple, contiguous items that need less than 32 bytes are packed into a single storage slot if possible, according to the following rules:

The first item in a storage slot is stored lower-order aligned (little-endian).
- Endian and endianness (or “byte-order”) describe how computers organize the bytes that make up numbers.
  - Little-endian means storing bytes in order of least-to-most-significant (where the least significant byte takes the first or lowest address), comparable to a common European way of writing dates (e.g., 31 December 2050).
    - It’s used on all Intel processors.
  - Big-endian is the opposite order, comparable to an ISO date (2050-12-31).
    - Big-endian is also often called “network byte order” because Internet standards usually require data to be stored big-endian, starting at the standard UNIX socket level and going all the way up to standardized Web binary data structures.
    - Older Mac computers using 68000-series and PowerPC microprocessors formerly used big-endian.
```
number: 0x01234567
memory address      0x100   0x101   0x102   0x103
little endian       67      45      23      01
big endian          01      23      45      67
```
Value types use only as many bytes as are necessary to store them.
If a value type does not fit the remaining part of a storage slot, it is stored in the next storage slot.
Structs and array data always start a new slot and their items are packed tightly according to these rules.
- The elements of structs and arrays are stored after each other, just as if they were given as individual values.
Items following struct or array data always start a new storage slot.

For contracts that use inheritance, the ordering of state variables is determined by the C3-linearized order of contracts starting with the most base-ward contract. If allowed by the above rules, state variables from different contracts do share the same storage slot.

Some considerations:

When using elements that are smaller than 32 bytes, your contract’s gas usage may be higher. This is because the EVM operates on 32 bytes at a time. Therefore, if the element is smaller than that, the EVM must use more operations in order to reduce the size of the element from 32 bytes to the desired size.
It might be beneficial to use reduced-size types if you are dealing with storage values because the compiler will pack multiple elements into one storage slot, and thus, combine multiple reads or writes into a single operation.
- If you are not reading or writing all the values in a slot at the same time, this can have the opposite effect, though: When one value is written to a multi-value storage slot, the storage slot has to be read first and then combined with the new value such that other data in the same slot is not destroyed.
When dealing with function arguments or memory values, there is no inherent benefit because the compiler does not pack these values.
In order to allow the EVM to optimize for this, ensure that you try to order your storage variables and struct members such that they can be packed tightly.

The layout of state variables in storage is considered to be part of the external interface of Solidity due to the fact that storage pointers can be passed to libraries.

Mappings and Dynamic Arrays

Due to their unpredictable size, mappings and dynamically-sized array types cannot be stored “in between” the state variables preceding and following them. Instead, they are considered to occupy only 32 bytes with regards to the rules above and the elements they contain are stored starting at a different storage slot that is computed using a Keccak-256 hash.

Assume the storage location of the mapping or array ends up being a slot p after applying the storage layout rules.

For dynamic arrays, this slot stores the number of elements in the array (byte arrays and strings are an exception).
- Array data is located starting at keccak256(p) and it is laid out in the same way as statically-sized array data would: One element after the other, potentially sharing storage slots if the elements are not longer than 16 bytes.
- Dynamic arrays of dynamic arrays apply this rule recursively.
  - The location of element x[i][j], where the type of x is uint24[][], is computed as follows (again, assuming x itself is stored at slot p):
    - The slot is keccak256(keccak256(p) + i) + floor(j / floor(256 / 24)).
    - The element can be obtained from the slot data v using (v >> ((j % floor(256 / 24)) * 24)) & type(uint24).max.
    - 一个 slot 有 256 bit，可以放多个元素。
For mappings, the slot stays empty, but it is still needed to ensure that even if there are two mappings next to each other, their content ends up at different storage locations.
- The value corresponding to a mapping key k is located at keccak256(h(k) . p) where . is concatenation and h is a function that is applied to the key depending on its type.
  - for value types, h pads the value to 32 bytes in the same way as when storing the value in memory.
  - for strings and byte arrays, h(k) is just the unpadded data.
- If the mapping value is a non-value type, the computed slot marks the start of the data.
- If the value is of struct type, you have to add an offset corresponding to the struct member to reach the member.
  1 2 3 4
  // compute the storage location of data[4][9].c struct S { uint16 a; uint16 b; uint256 c; } uint x; mapping(uint => mapping(uint => S)) data;
  - The position of the mapping itself is 1 (the variable x with 32 bytes precedes it).
  - data[4] (also a map) is stored at keccak256(uint256(4) . uint256(1)).
  - The data for data[4][9] starts at slot keccak256(uint256(9) . keccak256(uint256(4) . uint256(1))).
  - The slot offset of the member c inside the struct S is 1 because a and b are packed in a single slot.
  - The slot for data[4][9].c is keccak256(uint256(9) . keccak256(uint256(4) . uint256(1))) + 1.
    - The type of the value is uint256, so it uses a single slot.

bytes and string are encoded identically. In general, the encoding is similar to bytes1[], in the sense that there is a slot for the array itself and a data area that is computed using a keccak256 hash of that slot’s position.

For byte arrays that store data which is 32 or more bytes long, the main slot p stores length * 2 + 1 and the data is stored as usual in keccak256(p).
- length * 2 + 1 是奇数，所以二进制最低位一定是 1，小端存储，所以 lowest bit 一定是 1。
For short values (shorter than 32 bytes) the array elements are stored together with the length in the same slot.
- If the data is at most 31 bytes long, the elements are stored in the higher-order bytes (left aligned) and the lowest-order byte stores the value length * 2.
- length * 2 是整数，它二进制最低位一定是 0，最低字节存储该整数，小端存储，所以 lowest bit 一定是 0。
You can distinguish a short array from a long array by checking if the lowest bit is set: short (not set) and long (set).

JSON Output

Layout in Memory

Solidity reserves four 32-byte slots, with specific byte ranges being used as follows:

0x00 - 0x3f (64 bytes): scratch space for hashing methods
- Scratch space can be used between statements (i.e. within inline assembly).
0x40 - 0x5f (32 bytes): currently allocated memory size (aka. free memory pointer)
0x60 - 0x7f (32 bytes): zero slot
- The zero slot is used as initial value for dynamic memory arrays and should never be written to (the free memory pointer points to 0x80 initially).

Solidity always places new objects at the free memory pointer and memory is never freed (this might change in the future).

Elements in memory arrays in Solidity always occupy multiples of 32 bytes.

This is even true for bytes1[], but not for bytes and string.

There are some operations in Solidity that need a temporary memory area larger than 64 bytes and therefore will not fit into the scratch space. They will be placed where the free memory points to, but given their short lifetime, the pointer is not updated.

The memory may or may not be zeroed out. Because of this, one should not expect the free memory to point to zeroed out memory.

Differences to layout in storage:

uint8[4] a; occupies 32 bytes (1 slot) in storage, but 128 bytes (4 items with 32 bytes each) in memory.
The following struct occupies 96 bytes (3 slots of 32 bytes) in storage, but 128 bytes (4 items with 32 bytes each) in memory.

1
2
3
4
5
6
struct S {
    uint a;
    uint b;
    uint8 c;
    uint8 d;
}

Layout of Call Data

The input data for a function call is assumed to be in the format defined by the ABI specification.

The ABI specification requires arguments to be padded to multiples of 32 bytes.
- The internal function calls use a different convention.
Arguments for the constructor of a contract are directly appended at the end of the contract’s code, also in ABI encoding.
- The constructor will access them through a hard-coded offset, and not by using the codesize opcode, since this of course changes when appending data to the code.

Cleaning Up Variables

When a value is shorter than 256 bit, in some cases the remaining bits must be cleaned.

The Solidity compiler is designed to clean such remaining bits before any operations that might be adversely affected by the potential garbage in the remaining bits.
- Before writing a value to memory, the remaining bits need to be cleared because the memory contents can be used for computing hashes or sent as the data of a message call.
- Before storing a value in the storage, the remaining bits need to be cleaned because otherwise the garbled value can be observed.
- Access via inline assembly is not considered such an operation: If you use inline assembly to access Solidity variables shorter than 256 bits, the compiler does not guarantee that the value is properly cleaned up.
We do not clean the bits if the immediately following operation is not affected.
The Solidity compiler cleans input data when it is loaded onto the stack.

The Optimizer

Contract ABI Specification

An application binary interface is an interface between two program modules; often, between the operating system and user programs.

An ABI defines how data structures and functions are accessed in machine code. It’s the primary way of encoding and decoding data into and out of machine code.
- Application programming interface defines this access in high-level, often human-readable formats as source code.

In Ethereum, the ABI is used to encode contract calls for the EVM and to read data out of transactions.

The purpose of an ABI is to define the functions in the contract that can be invoked and describe how each function will accept arguments and return its result.
A contract’s ABI is specified as a JSON array of function descriptions and events.
- A function description is a JSON object with fields type, name, inputs, outputs, constant, and payable.
- An event description object has fields type, name, inputs, and anonymous.

References

Layout of State Variables in Storage
https://developer.mozilla.org/en-US/docs/Glossary/Endianness
Solidity Tutorial : all about Bytes
https://ethereum.org/en/developers/tutorials/uniswap-v2-annotated-code/#trade

TL;DR#

Data Location#

Layout of State Variables in Storage#

Mappings and Dynamic Arrays#

JSON Output#

Layout in Memory#

Layout of Call Data#

Cleaning Up Variables#

The Optimizer#

Contract ABI Specification#