Lead it

As a smart contract-oriented language, Solidity has both differences and similarities with other classic languages.

On the one hand, the nature of serving blockchain makes it different from other languages. For example, the deployment and invocation of contracts are confirmed by the blockchain network; Execution costs need to be tightly controlled to prevent malicious code from consuming node resources.

On the other hand, as a programming language, the implementation of Solidity does not break away from classic languages. For example, Solidity contains stack and heap-like designs and uses stack virtual machines for bytecode processing.

The previous articles in this series introduced how to develop a Solidity application. In order to give you a better understanding of why and why, this article will introduce the inner workings of Solidity, focusing on the lifecycle of the Solidity application and how EVM works.

Solidity lifecycle

As with any language, Solidity’s code life cycle consists of four phases: compile, deploy, execute, and destroy. The chart below illustrates the full life cycle of the Solidity application:

When compiled, the Solidity file generates bytecode. This is code similar to JVM bytecode. When deployed, the bytecode and construction parameters are constructed into a transaction, which is packaged into a block, passed through a network consensus process, and finally the contract is constructed on each blockchain node and the contract address is returned to the user.

When the user is ready to call the function on the contract, the calling request also goes through the process of transaction, block, consensus, and finally is executed by the EVM virtual machine on each node.

Here is an example program that we explore through Remix for its life cycle.

Pragma solidity ^ 0.4.25; contract Demo{ uint private _state; constructor(uint state){ _state = state; } function set(uint state) public { _state = state; }}

compile

After the source code is compiled, you can get its binary by using the Bytecode button:

608060405234801561001057600080fd5b506040516020806100ed833981018060405281019080805190602001909291905050508060008190555050 60a4806100496000396000f300608060405260043610603f576000357c01000000000000000000000000000000000000000000000000000000009004 63ffffffff16806360fe47b1146044575b600080fd5b348015604f57600080fd5b50606c60048036038101908080359060200190929190505050606e 565b005b80600081905550505600a165627a7a723058204ed906444cc4c9aabd183c52b2d486dfc5dea9801260c337185dad20e11f811b0029

You can also get the corresponding OpCode:

PUSH1 0x80 PUSH1 0x40 MSTORE CALLVALUE DUP1 ISZERO PUSH2 0x10 JUMPI PUSH1 0x0 DUP1 REVERT JUMPDEST POP PUSH1 0x40 MLOAD PUSH1 0x20 DUP1 PUSH2 0xED DUP4 CODECOPY DUP2 ADD DUP1 PUSH1 0x40 MSTORE DUP2 ADD SWAP1 DUP1 DUP1 MLOAD SWAP1 PUSH1 0x20  ADD SWAP1 SWAP3 SWAP2 SWAP1 POP POP POP DUP1 PUSH1 0x0 DUP2 SWAP1 SSTORE POP POP PUSH1 0xA4 DUP1 PUSH2 0x49 PUSH1 0x0 CODECOPY PUSH1 0x0 RETURN STOP PUSH1 0x80 PUSH1 0x40 MSTORE PUSH1 0x4 CALLDATASIZE LT PUSH1 0x3F JUMPI PUSH1 0x0 CALLDATALOAD PUSH29 0x100000000000000000000000000000000000000000000000000000000 SWAP1 DIV PUSH4 0xFFFFFFFF AND DUP1 PUSH4 0x60FE47B1 EQ PUSH1 0x44 JUMPI JUMPDEST PUSH1 0x0 DUP1 REVERT JUMPDEST CALLVALUE DUP1 ISZERO PUSH1 0x4F JUMPI PUSH1 0x0 DUP1 REVERT JUMPDEST POP PUSH1 0x6C PUSH1 0x4 DUP1 CALLDATASIZE SUB DUP2 ADD SWAP1 DUP1 DUP1 CALLDATALOAD SWAP1 PUSH1 0x20 ADD SWAP1 SWAP3 SWAP2 SWAP1 POP POP POP PUSH1 0x6E JUMP JUMPDEST STOP JUMPDEST DUP1 PUSH1 0x0 DUP2 SWAP1 SSTORE POP POP JUMP STOP LOG1 PUSH6 0x627A7A723058 KECCAK256 0x4e 0xd9 MOD DIFFICULTY 0x4c 0xc4 0xc9 0xaa 0xbd XOR  EXTCODECOPY MSTORE 0xb2 0xd4 DUP7 0xdf 0xc5 0xde 0xa9 DUP1 SLT PUSH1 0xC3 CALLDATACOPY XOR 0x5d 0xad KECCAK256 0xe1 0x1f DUP2 SHL STOP 0x29

The following instruction set is the code corresponding to the set function, which is explained later on how to run the set function.

JUMPDEST DUP1 PUSH1 0x0 DUP2 SWAP1 SSTORE POP POP JUMP STOP

The deployment of

Once compiled, you can deploy the code on Remix, passing 0x123 as the construct parameter:

After successful deployment, you will receive a transaction receipt:



Click on Input to see the specific transaction input data:



The yellow part of the data above is exactly the binary of the contract mentioned above; The purple part corresponds to the passed construction parameter 0x123.

All of these indicate that contract deployment uses transactions as a medium. Combined with the knowledge of blockchain transactions, we can reconstruct the entire deployment process:

  • The client constructs a transaction by deploying the request (contract binaries, construct parameters) as the input data for the transaction
  • The transaction is RLP encoded and then signed by the sender with a private key
  • Signed transactions are pushed to nodes on the blockchain
  • After the blockchain node verifies the transaction, it is deposited into the transaction pool
  • When it is the node’s turn to block, the transaction is packaged to build blocks and broadcast to other nodes
  • Other nodes validate blocks and reach consensus. Different blockchains may adopt different consensus algorithms. In FISCO BCOS, PBFT is used to achieve consensus, which requires three stages of submission (pre-prepare, prepare, and commit).
  • The node executes the transaction, and as a result, the smart contract DEMO is created, and the storage space for the state field _state is allocated and initialized to 0x123

perform

Depending on whether or not we have the view modifier, we can classify functions into two categories: calls and transactions. Since it is determined at compile time that the call will not cause a change in the contract state, the node can simply provide a query for this type of function call without confirming with other blockchain nodes. Since a transaction may cause a state change, it is acknowledged across the network.

Let’s take a look at how it works, assuming the user calls set(0x10).

First, the function set is not configured with the view/pure modifier, which means it may change the contract state. Therefore, this call information will be put into a transaction, through the process of transaction coding, transaction signature, transaction push, transaction pool cache, package out block, network consensus, and finally be executed by the EVM of each node.

In EVM, the parameter 0xA is stored in the contract field _state by SSTORE bytecode. This bytecode gets the address of the state field _state and the new value 0xA from the stack, and then completes the actual storage.

The following diagram shows the process in action:

This is just a rough introduction to how SET (0xA) works, and the next section will expand on how EVM works and how data is stored.

The destruction

Since the contract cannot be tampered with once it is on the chain, the life of the contract lasts until the underlying blockchain is shut down completely. To destroy the contract manually, use the bytecode selfdestruct. Destruction of contracts also requires transaction confirmation, which will not be discussed here.

Principle of EVM entry

In the previous article, we explained how the Solidity app works. After the transaction is confirmed, the bytecode is finally executed by the EVM. With EVM, the above is just a passing mention, but this section describes how it works in detail.

Operation principle

EVM is a stack virtual machine whose core feature is that all operands are stored on the stack. Let’s take a look at how this works with a simple Solidity statement:

uint a = 1;
uint b = 2;
uint c = a + b;

After compiling this code, the resulting bytecode is as follows:

PUSH1 0x1
PUSH1 0x2
ADD

For the sake of a better understanding of the concept, this is condensed into the three statements above, but the actual bytecode can be more complex and interspersed with statements like SWAP and DUP.

We can see that in the above code, there are two directives, PUSH1 and ADD, which mean the following:

  • PUSH1: Push data to the top of the stack.
  • Add: POP two top stack elements, ADD them, and push them back to the top of the stack.

The execution is explained here in a semi-animated way. In the figure below, SP represents the top stack pointer and PC represents the program counter. After push1 0x1, PC and SP both move down:

Similarly, after push1 0x2, the PC and SP states are as follows:

Finally, when the add completes, both operands at the top of the stack are popped as input to the add instruction, and the sum of the two is pushed on the stack:

Storage inquiry

During development, we often encounter confusing memory modifiers; When you read the open source code, you’ll also see various Assembly operations that work directly on memory. Developers who are not familiar with storage mechanisms will be confused by these situations, so this section explores the storage principles of EVM.

In the previous article Solidity Basic Features of Smart Contract Writing, we discussed that a piece of Solidity code usually involves local variables and contract state variables.

There are differences in how these variables are stored, and the following code shows the relationship between variables and how they are stored.

Contract Demo{// Uint private _state; Function set(uint state) public {int I = 0; // String memory STR = "aaa"; }}

The stack

The stack is used to store the operands of bytecode instructions. In Solidity, local variables of integers, fixed-length byte arrays, etc., will be pushed and pushed as the instruction runs.

For example, in the following simple statement, the variable value 1 is read out and pushed to the top of the stack with a PUSH operation:

uint i = 1;

There is no way to force changes to the way these variables are stored, and if you place a memory modifier before them, the compiler will report an error.

memory

Memory is similar to the heap in Java. It is used to store “objects”. In Solidity programming, if a local variable is of the type variable-length byte array, string, struct, etc., it is usually modified with a memory modifier to indicate that it is stored in memory.

In this section, we’ll look at how memory stores these objects, using strings as an example.

1. Object storage structure

We’ll use the Assembly statement to analyze how complex objects are stored.

The Assembly statement is used to invoke bytecode operations. The mload directive will be used to make calls to these bytecodes. Mload (p) means to read 32 bytes of data from address p. Developers can pass object variables directly into mload as Pointers.

In the following code, after the mload call, the data variable holds the first 32 bytes of the string STR in memory.

string memory str = "aaa";
bytes32 data;
assembly{
    data := mload(str)
}  

To master mload, you can use this to analyze how string variables are stored. The following code reveals how string data is stored:

Function strStorage() public view returns(Bytes32, Bytes32){String memory STR = "Yes "; bytes32 data; bytes32 data2; assembly{ data := mload(str) data2 := mload(add(str, 0x20)) } return (data, data2); }

The DATA variable represents 0 to 31 bytes of STR, and DATA2 represents 32 to 63 bytes of STR. The results of running the strStorage function are as follows:

0: bytes32: 0x0000000000000000000000000000000000000000000000000000000000000006
1: bytes32: 0xe4bda0e5a5bd0000000000000000000000000000000000000000000000000000

As you can see, the first data word yields a value of 6, which is exactly the number of UTF-8 encoded bytes of the string “hello”. The second data word holds the UTF-8 encoding of hello itself.

Once you have a good grasp of how strings are stored, you can use Assembly to modify, copy, and concatenate strings. Readers can search Solidity’s string library to learn how to implement a string concat.

2. Memory allocation mode

Since memory is used to store objects, there must be a reference to how memory is allocated.

Memory allocation is very simple, it is sequential allocation. Next we will assign two objects and look at their addresses:

function memAlloc() public view returns(bytes32, bytes32){
    string memory str = "aaa";
    string memory str2 = "bbb";
    bytes32 p1;
    bytes32 p2;
    assembly{
        p1 := str
        p2 := str2
    }   
    return (p1, p2);
}

When you run this function, the return result will contain two data words:

0: bytes32: 0x0000000000000000000000000000000000000000000000000000000000000080
1: bytes32: 0x00000000000000000000000000000000000000000000000000000000000000c0

This means that the starting address of the first string str1 is 0x80, and the starting address of the second string str2 is 0xc0, 64 bytes between which is exactly the space occupied by str1 itself. The memory layout at this point is as follows, with one cell representing 32 bytes (a data word, and EVM uses 32 bytes as a data word instead of 4 bytes) :

  • 0x40~0x60: Idle pointer to hold available addresses, in this case 0x100, indicating that new objects will be allocated from 0x100. You can use mLoad (0x40) to get the assigned address of the new object.
  • 0x80~ 0xC0: The initial address assigned to the object. So I’m assigning the string aaa
  • 0xc0~0x100: The string BBB is allocated
  • 0x100~… : Since it is sequentially allocated, new objects will be allocated here.

    State storage

As the name implies, a state store is used to store a state field for a contract.

In the model, the storage consists of multiple 32-byte storage slots. In the previous article, we introduced the set function of the Demo contract, where 0x0 represents the storage slot of the state variable _state. All fixed length variables are placed in this set of slots in order.

For mapping and array, storage is more complex, occupying 1 slot and containing data occupying other slots according to the corresponding rules. For example, in mapping, the storage slot of data item is calculated by Keccak from the key value k and the mapping slot p.

In terms of implementation, different chains may adopt different implementations. The classic one is the MPT tree adopted by Ethereum. Due to the problems of MPT tree performance and scalability, FISCO BCOS abandoned this structure and adopted distributed storage, which can store state data through RocksDB or MySQL, so that the performance and scalability of storage can be improved.

“Language

This paper introduces the operating principle of Solidity, which is summarized as follows. First of all, Solidity source code is compiled into bytecode. When deployed, the bytecode is validated across the network in the form of transactions and contracts are formed on nodes. The contract function call, if it is a transaction type, is acknowledged by the network and ultimately executed by the EVM. EVM is a stack virtual machine that reads the bytecode of a contract and executes it. During execution, it interacts with the stack, memory, and contract storage. The stack is used to store ordinary local variables, which are bytecode operands. Memory is used to store objects, which is stored by Length + Body. Memory is allocated in the way of sequential allocation. A state store is used to store state variables. Understanding how Solidity works and the principles behind it is the only way to become an expert Solidity programmer.