TL;DR
the only place where the data location can be omitted
Data Location
Currently, reference types comprise structs, arrays and mappings.
- Fixed-size byte array is a value type.
Every reference type has an additional annotation, the “data location”, about where it is stored. If you use a reference type, you always have to explicitly provide the data area where the type is stored:
memory: its lifetime is limited to an external function call;storageis the location where the state variables are stored, where the lifetime is limited to the lifetime of a contract;calldata: a non-modifiable, non-persistent area where function arguments are stored, and behaves mostly like memory.- Try to use
calldataas data location if possible because it will avoid copies and also makes sure that the data cannot be modified. - Arrays and structs with
calldatadata location can also be returned from functions, but it is not possible to allocate such types.
- Try to use
A function parameter in solidity can be stored either in memory or the calldata. If the function is an entry point to the contract, called directly from a user (using a transaction) or from a different contract, then the parameter’s value can be taken directly from the call data. If the function is called internally, then the parameters have to be stored in memory. From the perspective of the called contract calldata is read only.
With scalar types such as uint or address the compiler handles the choice of storage for us, but with arrays, which are longer and more expensive, we specify the type of storage to be used. Return values are always returned in memory.
| case | data location |
|---|---|
| local variable | memory or storage |
| state variable | only storage |
| parameter of internal function | memory or storage |
| parameter of external function | calldata |
Solidity by default puts complex data types, such as structs, in storage when initializing them as local variables.
An assignment or type conversion that changes the data location will always incur an automatic copy operation, while assignments inside the same data location only copy in some cases for storage types.
- Assignments between
storageandmemory(or fromcalldata) always create an independent copy. - Assignments from
memorytomemoryonly create references.- This means that changes to one memory variable are also visible in all other memory variables that refer to the same data.
- Assignments from
storageto a local storage variable also only assign a reference. - All other assignments to
storagealways copy.- Examples for this case are assignments to state variables or to members of local variables of
storagestruct type, even if the local variable itself is just a reference.
- Examples for this case are assignments to state variables or to members of local variables of
| |
y的生命周期是整个合约的声明周期,而memoryArray的声明周期是一次函数调用,y = memoryArray要能成立,意味着memoryArray必须要在 storage 中创建临时副本并让y引用该副本,而 “assignments to local variables referencing storage objects can only be made from existing storage objects”。
Layout of State Variables in Storage
Except for dynamically-sized arrays and mappings (see below), data is stored contiguously item after item starting with the first state variable, which is stored in slot 0.
- For each variable, a size in bytes is determined according to its type.
- State variables of contracts are stored in storage in a compact way such that multiple values sometimes use the same storage slot.
Multiple, contiguous items that need less than 32 bytes are packed into a single storage slot if possible, according to the following rules:
The first item in a storage slot is stored lower-order aligned (little-endian).
Endian and endianness (or “byte-order”) describe how computers organize the bytes that make up numbers.
- Little-endian means storing bytes in order of least-to-most-significant (where the least significant byte takes the first or lowest address), comparable to a common European way of writing dates (e.g., 31 December 2050).
- It’s used on all Intel processors.
- Big-endian is the opposite order, comparable to an ISO date (2050-12-31).
- Big-endian is also often called “network byte order” because Internet standards usually require data to be stored big-endian, starting at the standard UNIX socket level and going all the way up to standardized Web binary data structures.
- Older Mac computers using 68000-series and PowerPC microprocessors formerly used big-endian.
number: 0x01234567 memory address 0x100 0x101 0x102 0x103 little endian 67 45 23 01 big endian 01 23 45 67- Little-endian means storing bytes in order of least-to-most-significant (where the least significant byte takes the first or lowest address), comparable to a common European way of writing dates (e.g., 31 December 2050).
Value types use only as many bytes as are necessary to store them.
If a value type does not fit the remaining part of a storage slot, it is stored in the next storage slot.
Structs and array data always start a new slot and their items are packed tightly according to these rules.
- The elements of structs and arrays are stored after each other, just as if they were given as individual values.
Items following struct or array data always start a new storage slot.
For contracts that use inheritance, the ordering of state variables is determined by the C3-linearized order of contracts starting with the most base-ward contract. If allowed by the above rules, state variables from different contracts do share the same storage slot.
Some considerations:
- When using elements that are smaller than 32 bytes, your contract’s gas usage may be higher. This is because the EVM operates on 32 bytes at a time. Therefore, if the element is smaller than that, the EVM must use more operations in order to reduce the size of the element from 32 bytes to the desired size.
- It might be beneficial to use reduced-size types if you are dealing with storage values because the compiler will pack multiple elements into one storage slot, and thus, combine multiple reads or writes into a single operation.
- If you are not reading or writing all the values in a slot at the same time, this can have the opposite effect, though: When one value is written to a multi-value storage slot, the storage slot has to be read first and then combined with the new value such that other data in the same slot is not destroyed.
- When dealing with function arguments or memory values, there is no inherent benefit because the compiler does not pack these values.
- In order to allow the EVM to optimize for this, ensure that you try to order your storage variables and struct members such that they can be packed tightly.
The layout of state variables in storage is considered to be part of the external interface of Solidity due to the fact that storage pointers can be passed to libraries.
Mappings and Dynamic Arrays
Due to their unpredictable size, mappings and dynamically-sized array types cannot be stored “in between” the state variables preceding and following them. Instead, they are considered to occupy only 32 bytes with regards to the rules above and the elements they contain are stored starting at a different storage slot that is computed using a Keccak-256 hash.
Assume the storage location of the mapping or array ends up being a slot p after applying the storage layout rules.
- For dynamic arrays, this slot stores the number of elements in the array (byte arrays and strings are an exception).
- Array data is located starting at
keccak256(p)and it is laid out in the same way as statically-sized array data would: One element after the other, potentially sharing storage slots if the elements are not longer than 16 bytes. - Dynamic arrays of dynamic arrays apply this rule recursively.
- The location of element
x[i][j], where the type ofxisuint24[][], is computed as follows (again, assumingxitself is stored at slotp):- The slot is
keccak256(keccak256(p) + i) + floor(j / floor(256 / 24)). - The element can be obtained from the slot data
vusing(v >> ((j % floor(256 / 24)) * 24)) & type(uint24).max. - 一个 slot 有 256 bit,可以放多个元素。
- The slot is
- The location of element
- Array data is located starting at
- For mappings, the slot stays empty, but it is still needed to ensure that even if there are two mappings next to each other, their content ends up at different storage locations.
The value corresponding to a mapping key
kis located atkeccak256(h(k) . p)where.is concatenation andhis a function that is applied to the key depending on its type.- for value types,
hpads the value to 32 bytes in the same way as when storing the value in memory. - for strings and byte arrays,
h(k)is just the unpadded data.
- for value types,
If the mapping value is a non-value type, the computed slot marks the start of the data.
If the value is of struct type, you have to add an offset corresponding to the struct member to reach the member.
1 2 3 4// compute the storage location of data[4][9].c struct S { uint16 a; uint16 b; uint256 c; } uint x; mapping(uint => mapping(uint => S)) data;- The position of the mapping itself is
1(the variablexwith 32 bytes precedes it). data[4](also a map) is stored atkeccak256(uint256(4) . uint256(1)).- The data for
data[4][9]starts at slotkeccak256(uint256(9) . keccak256(uint256(4) . uint256(1))). - The slot offset of the member
cinside the structSis1becauseaandbare packed in a single slot. - The slot for
data[4][9].ciskeccak256(uint256(9) . keccak256(uint256(4) . uint256(1))) + 1.- The type of the value is
uint256, so it uses a single slot.
- The type of the value is
- The position of the mapping itself is
bytes and string are encoded identically. In general, the encoding is similar to bytes1[], in the sense that there is a slot for the array itself and a data area that is computed using a keccak256 hash of that slot’s position.
- For byte arrays that store data which is
32or more bytes long, the main slotpstoreslength * 2 + 1and the data is stored as usual inkeccak256(p).length * 2 + 1是奇数,所以二进制最低位一定是 1,小端存储,所以 lowest bit 一定是 1。
- For short values (shorter than
32bytes) the array elements are stored together with the length in the same slot.- If the data is at most
31bytes long, the elements are stored in the higher-order bytes (left aligned) and the lowest-order byte stores the valuelength * 2. length * 2是整数,它二进制最低位一定是 0,最低字节存储该整数,小端存储,所以 lowest bit 一定是 0。
- If the data is at most
- You can distinguish a short array from a long array by checking if the lowest bit is set: short (not set) and long (set).
JSON Output
Layout in Memory
Solidity reserves four 32-byte slots, with specific byte ranges being used as follows:
0x00 - 0x3f(64 bytes): scratch space for hashing methods- Scratch space can be used between statements (i.e. within inline assembly).
0x40 - 0x5f(32 bytes): currently allocated memory size (aka. free memory pointer)0x60 - 0x7f(32 bytes): zero slot- The zero slot is used as initial value for dynamic memory arrays and should never be written to (the free memory pointer points to
0x80initially).
- The zero slot is used as initial value for dynamic memory arrays and should never be written to (the free memory pointer points to
Solidity always places new objects at the free memory pointer and memory is never freed (this might change in the future).
Elements in memory arrays in Solidity always occupy multiples of 32 bytes.
- This is even true for
bytes1[], but not forbytesandstring.
There are some operations in Solidity that need a temporary memory area larger than 64 bytes and therefore will not fit into the scratch space. They will be placed where the free memory points to, but given their short lifetime, the pointer is not updated.
- The memory may or may not be zeroed out. Because of this, one should not expect the free memory to point to zeroed out memory.
Differences to layout in storage:
uint8[4] a;occupies 32 bytes (1 slot) in storage, but 128 bytes (4 items with 32 bytes each) in memory.- The following struct occupies 96 bytes (3 slots of 32 bytes) in storage, but 128 bytes (4 items with 32 bytes each) in memory.
| |
Layout of Call Data
The input data for a function call is assumed to be in the format defined by the ABI specification.
- The ABI specification requires arguments to be padded to multiples of 32 bytes.
- The internal function calls use a different convention.
- Arguments for the constructor of a contract are directly appended at the end of the contract’s code, also in ABI encoding.
- The constructor will access them through a hard-coded offset, and not by using the
codesizeopcode, since this of course changes when appending data to the code.
- The constructor will access them through a hard-coded offset, and not by using the
Cleaning Up Variables
When a value is shorter than 256 bit, in some cases the remaining bits must be cleaned.
- The Solidity compiler is designed to clean such remaining bits before any operations that might be adversely affected by the potential garbage in the remaining bits.
- Before writing a value to memory, the remaining bits need to be cleared because the memory contents can be used for computing hashes or sent as the data of a message call.
- Before storing a value in the storage, the remaining bits need to be cleaned because otherwise the garbled value can be observed.
- Access via inline assembly is not considered such an operation: If you use inline assembly to access Solidity variables shorter than 256 bits, the compiler does not guarantee that the value is properly cleaned up.
- We do not clean the bits if the immediately following operation is not affected.
- The Solidity compiler cleans input data when it is loaded onto the stack.
The Optimizer
Contract ABI Specification
An application binary interface is an interface between two program modules; often, between the operating system and user programs.
- An ABI defines how data structures and functions are accessed in machine code. It’s the primary way of encoding and decoding data into and out of machine code.
- Application programming interface defines this access in high-level, often human-readable formats as source code.
In Ethereum, the ABI is used to encode contract calls for the EVM and to read data out of transactions.
- The purpose of an ABI is to define the functions in the contract that can be invoked and describe how each function will accept arguments and return its result.
- A contract’s ABI is specified as a JSON array of function descriptions and events.
- A function description is a JSON object with fields
type,name,inputs,outputs,constant, andpayable. - An event description object has fields
type,name,inputs, andanonymous.
- A function description is a JSON object with fields