Dead knock Ethereum source code analysis of the MPT tree – below

Articles and information can be found at github.com/blockchainG…

The previous chapter mainly introduced the principle of MPT tree in Ethereum, this will mainly be involved in the source code of MPT tree disassembly analysis. The triE module mainly has the following files:

|-encoding.goThe main code conversion between | - hasher.goRealized from a node subtree hash function | - node.goDefines a Trie tree all the node types and parsing code in | - sync.goImplements SyncTrie object definition and all methods | - iterator.goDefines all the enumeration interface and implementation related | - secure_trie.goImplements the SecureTrie object | - proof.goAs the key to construct a merkle prove | - trie.goTrie tree to add and delete. | - database.goReference counts trie tree nodes in memoryCopy the code

To realize the overview

encoding.go

KEYBYTES Encoding, HEX Encoding, COMPACT Encoding, KEYBYTES encoding, HEX Encoding, COMPACT Encoding, KEYBYTES encoding, HEX Encoding

  1. Hex code converted to Compact code:hexToCompact()
  2. Compact code converted to HEX code:compactToHex()
  3. Convert keybytes encoding to Hex encoding:keybytesToHex()
  4. Hex encoding converted to keybytes encoding:hexToKeybytes()
  5. Gets the length of the common prefix for a two-byte array:prefixLen()
func hexToCompact(hex []byte) []byte {
    terminator := byte(0)
    if hasTerm(hex) { // Check whether any end is 0x10 => 16
        terminator = 1 // End 16 indicates a leaf node
        hex = hex[:len(hex)- 1] // remove the tail tag
    }
    buf := make([]byte.len(hex)/2+1) // Array of bytes
    
    buf[0] = terminator << 5 // The byte is 00000000 or 00100000
    // If the length is odd, add the odd bit flag 1 and place the first nibble byte in the lower four bits of buf[0]
    if len(hex)&1= =1 {
        buf[0] | =1 << 4 // The odd flag is 00110000
        buf[0] |= hex[0] // The first nibble is contained in the first byte 0011xxxx
        hex = hex[1:]}// Combine two nibble bytes into one byte
    decodeNibbles(hex, buf[1:)return buf
  
Copy the code
// Compact code is converted to Hex code
func compactToHex(compact []byte) []byte {
    base := keybytesToHex(compact)
    base = base[:len(base)- 1]
     // apply terminator flag
    // Base [0] includes four cases
    // 00000000 Indicates the even bit of the expansion node
    // 00000001 Odd bits of the extension node
    // 00000010 Even bit of leaf node
    // 00000011 Odd bits of leaves

    // apply terminator flag
    if base[0] > =2 {
       // If it is a leaf node, add 16 to the end of the Hex flag
        base = append(base, 16)}// apply odd flag
    Chop is equal to 2 if it is an even digit, otherwise it is equal to 1
    chop := 2 - base[0] &1
    return base[chop:]
}
Copy the code
// Compact code is converted to Hex code
func compactToHex(compact []byte) []byte {
    base := keybytesToHex(compact)
    base = base[:len(base)- 1]
     // apply terminator flag
    // Base [0] includes four cases
    // 00000000 Indicates the even bit of the expansion node
    // 00000001 Odd bits of the extension node
    // 00000010 Even bit of leaf node
    // 00000011 Odd bits of leaves

    // apply terminator flag
    if base[0] > =2 {
       // If it is a leaf node, add 16 to the end of the Hex flag
        base = append(base, 16)}// apply odd flag
    Chop is equal to 2 if it is an even digit, otherwise it is equal to 1
    chop := 2 - base[0] &1
    return base[chop:]
}
Copy the code
// Convert hexadecimal bibbles to key bytes, which can only be used for even length keys
func hexToKeybytes(hex []byte) []byte {
    if hasTerm(hex) {
        hex = hex[:len(hex)- 1]}if len(hex)&1! =0 {
        panic("can't convert hex key of odd length")
    }
    key := make([]byte, (len(hex)+1) /2)
    decodeNibbles(hex, key)
    return key
}


Copy the code
// Return the length of the common prefix for a and b
func prefixLen(a, b []byte) int {
    var i, length = 0.len(a)
    if len(b) < length {
        length = len(b)
    }
    for ; i < length; i++ {
        ifa[i] ! = b[i] {break}}return i
}


Copy the code

node.go

Four types of nodes

The Node interface has four implementations: fullNode, shortNode, valueNode, and hashNode. Only fullNode and shortNode can have child nodes.

type (
	fullNode struct {
		Children [17]node // Branch node
		flags    nodeFlag
	}
	shortNode struct { // Extend the node
		Key   []byte
		Val   node // Can point to leaf nodes or branch nodes.
		flags nodeFlag
	}
	hashNode  []byte
	valueNode []byte // The value of the leaf node, which will eventually be wrapped in shortNode
)
Copy the code

trie.go

The Trie object implements all the functions of the MPT tree, including adding, deleting, changing and checking (key, value) pairs, calculating merkle hashes, and writing the entire tree to the database.

iterator.go

NodeIterator provides the ability to traverse all nodes within a tree. The structure is as follows: This structure is defined in trie.go

type nodeIterator struct {
	trie.NodeIterator
	t   *odrTrie
	err error
}
Copy the code

NodeIterator contains an interface called NodeIterator, whose implementation is provided by iterator.go.

func (it *nodeIterator) Next(descend bool) bool 
func (it *nodeIterator) Hash(a) common.Hash 
func (it *nodeIterator) Parent(a) common.Hash 
func (it *nodeIterator) Leaf(a) bool 
func (it *nodeIterator) LeafKey(a) []byte 
func (it *nodeIterator) LeafBlob(a) []byte 
func (it *nodeIterator) LeafProof(a)[] []byte 
func (it *nodeIterator) Path(a) []byte {}
func (it *nodeIterator) seek(prefix []byte) error 
func (it *nodeIterator) peek(descend bool) (*nodeIteratorState, *intAnd []byte, error) 
func (it *nodeIterator) nextChild(parent *nodeIteratorState, ancestor common.Hash) (*nodeIteratorState, []byte.bool) 
func (it *nodeIterator) push(state *nodeIteratorState, parentIndex *int, path []byte) 
func (it *nodeIterator) pop(a) 
Copy the code

At the heart of the NodeIterator is the Next method. Each time this method is called, the current node represented by the NodeIterator object is updated to the Next node. When all nodes are iterated, the Next method returns false.

There are three ways to generate a NodeIterator interface:

(1) : Trie. NodeIterator (start byte [])

The start argument specifies which path to traverse from, or from start to finish if nil.

② : NewDifferenceIterator(a, b NodeIterator)

When NewDifferenceIterator(a, b NodeIterator) is called, the resulting NodeIterator traverses only nodes that exist in B but not in A.

(3) : NewUnionIterator (iters [] NodeIterator)

When NewUnionIterator(its []NodeIterator) is called, the node traversed by the generated NodeIterator is the set of all passed nodes.

database.go

Database is the cache layer of the TRIE module to the real Database. Its purpose is to count the references of the cached nodes, so as to achieve the prune function of the block. The main methods are as follows:

func NewDatabase(diskdb ethdb.KeyValueStore) *Database
func NewDatabaseWithCache(diskdb ethdb.KeyValueStore, cache int) *Database 
func (db *Database) DiskDB(a) ethdb.KeyValueReader
func (db *Database) InsertBlob(hash common.Hash, blob []byte)
func (db *Database) insert(hash common.Hash, blob []byte, node node)
func (db *Database) insertPreimage(hash common.Hash, preimage []byte)
func (db *Database) node(hash common.Hash) node
func (db *Database) Node(hash common.Hash) ([]byte, error)
func (db *Database) preimage(hash common.Hash) ([]byte, error)
func (db *Database) secureKey(key []byte) []byte
func (db *Database) Nodes(a) []common.Hash
func (db *Database) Reference(child common.Hash, parent common.Hash)
func (db *Database) Dereference(root common.Hash)
func (db *Database) dereference(child common.Hash, parent common.Hash)
func (db *Database) Cap(limit common.StorageSize) error
func (db *Database) Commit(node common.Hash, report bool) error
Copy the code

security_trie.go

As an encrypted implementation of trie, ecurity_trie wraps the trie tree and all keys are converted to the hash values calculated by kECCAK256 algorithm. At the same time, the database stores the original key corresponding to the hash value. However, the official comment in the code is that the code is unstable and is not used elsewhere except for test cases.

proof.go

  • Prove() : According to a givenkeyIn thetrieIn, will be satisfiedkeyThe nodes on the path of the maximum length prefix are added toproofDbEach element in the queue satisfies: unencoded hash and correspondingrlpEncoded node)
  • VerifyProof () : validationproffDbIs there any that satisfies the inputhash, and the node corresponding to the key. If so, returnrlpThe node after decoding.

Implementation details

Add, delete, change and check Trie objects

① : Trie tree initialization

If root is not empty, the entire Trie tree is loaded by resolveHash, and if it is empty, a new Trie tree is created.

func New(root common.Hash, db *Database) (*Trie, error) {
	if db == nil {
		panic("trie.New called without a database")
	}
	trie := &Trie{
		db: db,
	}
	ifroot ! = (common.Hash{}) && root ! = emptyRoot { rootnode, err := trie.resolveHash(root[:],nil)
		iferr ! =nil {
			return nil, err
		}
		trie.root = rootnode
	}
	return trie, nil
}
Copy the code

② : Trie tree insertion

First, Trie tree insertion is a recursive call that starts at the root and continues to find the right place to insert.

func (t *Trie) insert(n node, prefix, key []byte, value node) (bool, node, error)
Copy the code

Parameter Description:

  • N: Node to be inserted
  • Prefix: the key that has been processed (the prefix shared by nodes)
  • Key: the unfinished partkeyThe fullkey = prefix + key
  • Value: indicates the value to be inserted

Return value description:

  • Bool does the operation change the Trie tree (dirty)
  • Node: root Node of the subtree after insertion

The following is an illustration of shortNode, fullNode, hashNode and nil.

2.1: The node is nil

The empty tree directly returns shortNode. In this case, the root of the entire tree contains one shortNode.

case nil:
		return true, &shortNode{key, value, t.newFlag()}, nil
Copy the code

2.2: The node is shortNode

  • If the public prefix is equal to key, then the two keys are the same. If value is the same (dirty == false), then an error is returned.

  • If there are no errors, update the value of shortNode and return

  • If the common prefixes do not match exactly, then the common prefixes need to be extracted to form a separate node (extension node) with a branch node behind the extension node and two short nodes behind the branch node as appropriate.

  • First build a branch node (branch := &fullNode{flags: t.newflag ()}) and then insert the remaining two short nodes by calling T.insert from the Children position of the branch node

matchlen := prefixLen(key, n.Key)
		if matchlen == len(n.Key) {
			dirty, nn, err := t.insert(n.Val, append(prefix, key[:matchlen]...) , key[matchlen:], value)if! dirty || err ! =nil {
				return false, n, err
			}
			return true, &shortNode{n.Key, nn, t.newFlag()}, nil
		}
		branch := &fullNode{flags: t.newFlag()}
		var err error
		_, branch.Children[n.Key[matchlen]], err = t.insert(nil.append(prefix, n.Key[:matchlen+1]...). , n.Key[matchlen+1:], n.Val)
		iferr ! =nil {
			return false.nil, err
		}
		_, branch.Children[key[matchlen]], err = t.insert(nil.append(prefix, key[:matchlen+1]...). , key[matchlen+1:], value)
		iferr ! =nil {
			return false.nil, err
		}
		if matchlen == 0 {
			return true, branch, nil
    }
		return true, &shortNode{key[:matchlen], branch, t.newFlag()}, nil
Copy the code

2.3: The node is fullNode

If the node is fullNode(i.e., branch node), call insert directly to the corresponding child node and point that child node to the newly generated node.

dirty, nn, err := t.insert(n.Children[key[0]], append(prefix, key[0]), key[1:], value)
		if! dirty || err ! =nil {
			return false, n, err
		}
		n = n.copy()
		n.flags = t.newFlag()
		n.Children[key[0]] = nn
		return true, n, nil
Copy the code

2.4: The node is hashNode

Nodes that are temporarily in the database are loaded into memory by calling t.resolveHash(n, prefix) and then by calling insert.

rn, err := t.resolveHash(n, prefix)
		iferr ! =nil {
			return false.nil, err
		}
		dirty, nn, err := t.insert(rn, prefix, key, value)
		if! dirty || err ! =nil {
			return false, rn, err
		}
		return true, nn, nil
Copy the code

③ : Trie tree query value

In fact, it is based on the input hash, find the corresponding leaf node data. Focus on the TryGet method.

Parameters:

  • origNode: Indicates the start of the current searchnodelocation
  • key: Enter the data to search forhash
  • pos: the currenthashWhich digit is matched
func (t *Trie) tryGet(origNode node, key []byte, pos int) (value []byte, newnode node, didResolve bool, err error) {
	switch n := (origNode).(type) {
	case nil: // Indicates that trie is an empty tree
		return nil.nil.false.nil
	case valueNode: //// This is the data corresponding to the leaf node we are looking for
		return n, n, false.nil
	case *shortNode: //// matches on leaf nodes or extension nodes
		if len(key)-pos < len(n.Key) || ! bytes.Equal(n.Key, key[pos:pos+len(n.Key)]) {
			return nil, n, false.nil
		}
		value, newnode, didResolve, err = t.tryGet(n.Val, key, pos+len(n.Key))
		if err == nil && didResolve {
			n = n.copy()
			n.Val = newnode
		}
		return value, n, didResolve, err
	case *fullNode:// Match at the branch node
		value, newnode, didResolve, err = t.tryGet(n.Children[key[pos]], key, pos+1)
		if err == nil && didResolve {
			n = n.copy()
			n.Children[key[pos]] = newnode
		}
		return value, n, didResolve, err
	case hashNode: // The current node is a light node and needs to be obtained from db
		child, err := t.resolveHash(n, key[:pos])
		iferr ! =nil {
			return nil, n, true, err
		}
		value, newnode, _, err := t.tryGet(child, key, pos)
		return value, newnode, true, err
...
}
Copy the code

DidResolve is used to determine if the trie tree will change. TryGet () is just used to get data, and when a hashNode goes to db to get the node’s value, the existing trie needs to be updated. DidResolve will change. The other is the basic recursive look-up tree operation.

④ : Trie tree update value

To update the value, you call the INSERT method.

To this Trie tree add, delete, change, check on the explanation of almost.

Writes the node to Trie’s in-memory database

If you want to write a node to an in-memory database and need serialization, you can first understand the Rlp encoding of Ethereum. This is done by trie.mit (). When trie.mit (nil), serialization and caching are performed. After serialization, Compact Encoding is used to save space.

func (t *Trie) Commit(onleaf LeafCallback) (root common.Hash, err error) {
	if t.db == nil {
		panic("commit called on trie with nil database")
	}
	hash, cached, err := t.hashRoot(t.db, onleaf)
	iferr ! =nil {
		return common.Hash{}, err
	}
	t.root = cached
	return common.BytesToHash(hash.(hashNode)), nil
}
Copy the code

The above code says something like this:

  • Each timeCommit()The triecachegenWill be add 1
  • Commit()The method returnstrie.rootPoints to thenodethehash(Uncoded)
  • One of thehashRoot()The purpose of the method isReturns the hash of the node pointed to by trie.rootAs well asEach node has the root of the trie tree for its own hash.
// Generate a hash for each node
func (t *Trie) hashRoot(db *Database, onleaf LeafCallback) (node, node, error) {
	if t.root == nil {
		return hashNode(emptyRoot.Bytes()), nil.nil
	}
	h := newHasher(onleaf)
	defer returnHasherToPool(h)
	return h.hash(t.root, db, true) // Generate an unencoded hash for each node
}
Copy the code

The core method of hashRoot is h.hash, which returns the hash of the head node and the hash of the head node for each child node (trie. root points to it), doing roughly the following:

① : If we do not store nodes, but just hash, we get the data from the cache

ifhash, dirty := n.cache(); hash ! =nil {
		if db == nil {
			return hash, n, nil
		}
		if! dirty {switch n.(type) {
			case *fullNode, *shortNode:
				return hash, hash, nil
			default:
				return hash, n, nil}}}Copy the code

② : Recursively call H.Hashchildren to calculate the hash value of all the child nodes, and then replace the original child node with the hash value of the current child node

2.1: If the node isshortNode

The collapsed Key is replaced from Hex Encoding to Compact Encoding, and the hash method is recursively called to calculate the hash and cache of the child nodes, so that the child node is replaced with the hash value of the child node

collapsed, cached := n.copy(), n.copy()
		collapsed.Key = hexToCompact(n.Key)
		cached.Key = common.CopyBytes(n.Key)

		if_, ok := n.Val.(valueNode); ! ok { collapsed.Val, cached.Val, err = h.hash(n.Val, db,false)
			iferr ! =nil {
				return original, original, err
			}
		}
		return collapsed, cached, nil
Copy the code

2.2: The node is fullNode

Iterate over each child node, replace the child node with the Hash value of the child node, otherwise the node has no children. Return directly.

		collapsed, cached := n.copy(), n.copy(a)for i := 0; i < 16; i++ {
			ifn.Children[i] ! =nil {
				collapsed.Children[i], cached.Children[i], err = h.hash(n.Children[i], db, false)
				iferr ! =nil {
					return original, original, err
				}
			}
		}
		cached.Children[16] = n.Children[16]
		return collapsed, cached, nil
Copy the code

③ : Hash value of storage node N, if we specify the storage layer, it will write the corresponding key/value pair

The store() method does two main things:

  • rlpserializationcollapsedNode and insert it into db disk
  • Of the current nodehash
  • Hash the node intodb

3.1: empty data or hashNode is not processed

if _, isHash := n.(hashNode); n == nil || isHash {
		return n, nil
	}
Copy the code

3.2: Generates the RLP code of the node

h.tmp.Reset()                                 // Cache initialization
	iferr := rlp.Encode(&h.tmp, n); err ! =nil { Serialize the current node
		panic("encode error: " + err.Error())
	}
	if len(h.tmp) < 32 && !force {
		return n, nil // Nodes smaller than 32 bytes are stored inside their parent the encoded node length is less than 32. If force is true, it ensures that all Nodes are encoded
	}
// If the length is too long, it will be replaced by the newly computed hash
	hash, _ := n.cache() // Retrieves the hash of the current node
	if hash == nil {
		hash = h.makeHashNode(h.tmp) // Generate a hash node
	}
Copy the code

3.3: Merge Trie nodes into the intermediate memory cache

hash := common.BytesToHash(hash)
		db.lock.Lock()
		db.insert(hash, h.tmp, n)
		db.lock.Unlock()
		// Track external references from account->storage trie
		// Trace accounts -> Store external references in Trie
		ifh.onleaf ! =nil {
			switch n := n.(type) {
			case *shortNode:
				if child, ok := n.Val.(valueNode); ok {  // points to branch nodes
					h.onleaf(child, hash) // Collects statistics about the current node, such as the number of child nodes and the number of valid nodes
				}
			case *fullNode:
				for i := 0; i < 16; i++ {
					if child, ok := n.Children[i].(valueNode); ok {
						h.onleaf(child, hash)
					}
				}
			}
		}
Copy the code

At this point, writing the node to the Trie’s in-memory database is complete.

If you think the article is good, you can pay attention to the public number:Blockchain technology stack, all the detailed source analysis of Ethereum article content and code information are in it.

Trie tree caching mechanism

The Trie tree structure has two parameters: cacheGen and cachelimit. These two parameters are the parameters controlled by cache. Each time the Trie tree calls Commit, the current CacheGen increases by 1.

func (t *Trie) Commit(onleaf LeafCallback) (root common.Hash, err error){... t.cachegen++ ... }Copy the code

The current CacheGen is then placed in the node when the Trie tree is inserted.

func (t *Trie) insert(n node, prefix, key []byte, value node) (bool, node, error){...return true, &shortNode{n.Key, nn, t.newFlag()}, nil
}
func (t *Trie) newFlag(a) nodeFlag {
    return nodeFlag{dirty: true, gen: t.cachegen}
}

Copy the code

If trie.cachegen – node.cachegen > cachelimit, the node can be removed from memory. That is to say, the node is killed from memory after several Commit attempts without modification. Whenever a node is added or removed from the trie path, all nodes in the path need to be re-instantiated, that is, nodeFlags in the node are initialized. All need to be re-updated to db disk.

Removing the node is done in the hasher.hash method, which is called at commit time. If the method’s canUnload method call returns true, then the node is removed, and if only the Hash node is returned instead of the Node node, then the node is not referenced and will soon be cleared by GC. When a node is removed, it and its children are represented by a hashNode node. Methods can be used to load the node into memory if needed later.

func (h *hasher) hash(n node, db *Database, force bool) (node, node, error){...// Unload the node from the cache. All of its children will have lower or equal cache generation numbers.
       cacheUnloadCounter.Inc(1)... }Copy the code

Reference & Summary

This part of the important content is also described above, mainly focused on the Trie, if there is something wrong, can be corrected in time oh.

mindcarver.cn/about/

Github.com/blockchainG…