Blockchain state data storage method
Short answer questions
Briefly describe the concept of accounts in blockchain and their classification in Ethereum.
Explain the role of Merkle Patricia Tree (MPT) in blockchain data storage.
Explain the three types of data nodes in MPT tree and their differences.
What is "content addressing" and how is it implemented in MPT tree?
Explain the concept of node reuse in MPT tree and its advantages and disadvantages.
Why do we need to distinguish between the current Merkle state tree and the historical Merkle state tree?
Explain the role of read-write sets in the execution of blockchain transactions.
How to generate modification sets for the current and historical Merkle state trees based on read-write sets?
Briefly describe the characteristics of B+ tree and LSM tree, and their applicable scenarios when storing Merkle state trees.
How does the LevelDB database distinguish the data nodes of the current and historical state trees when storing Merkle state trees?
Short answer questions
Accounts are the basic units used to record transaction information in blockchain. In Ethereum, accounts are divided into external accounts and contract accounts. External accounts are directly controlled by users, while contract accounts contain smart contract codes and are created by external accounts.
Merkle Patricia Tree (MPT) is an improved Merkle tree variant that combines the advantages of Merkle tree and Trie dictionary tree, and is used to organize and manage important data such as account status and transaction information in the blockchain.
The MPT tree contains three types of data nodes: leaf nodes, extension nodes, and branch nodes. Leaf nodes store account addresses and corresponding status data; extension nodes store partial account addresses and hash pointers to the next node; branch nodes contain 16 elements, corresponding to 16 possible hexadecimal characters, which are used to index different account addresses.
"Content addressing" refers to finding data based on the hash value of the data content. In the MPT tree, the Key of a data node is the hash value of its data content, and the Value is the data content itself. The hash value can be used to quickly locate the required data node.
Node reuse means that in the MPT tree, if the account status data in different blocks is the same, the corresponding MPT tree node in the previous block can be directly reused. Node reuse can save storage space, but it will cause the latest account status data to be mixed with a large amount of historical data, reducing access performance.
In order to improve access performance, it is necessary to distinguish between the current Merkle state tree and the historical Merkle state tree. The current Merkle state tree only maintains the latest status of each blockchain account, while the historical Merkle state tree maintains the historical status data of each account.
The read-write set is used to record the changes in account status before and after the transaction is executed. During the transaction execution process, the node device will generate a read-write set to record the status data of the relevant account before and after the transaction is executed.
The node device can generate two modification sets based on the read-write set: one for the current Merkle state tree, containing the data nodes that need to be written to the current state tree; the other for the historical Merkle state tree, containing the data nodes that need to be written to the historical state tree.
B+ tree has high read and modify performance and is suitable for storing the current Merkle state tree that needs to be modified frequently; LSM tree has high write performance and is suitable for storing historical Merkle state trees that mainly perform write operations.
LevelDB database uses Key-Value pairs to store Merkle state tree data nodes. For the current state tree, the key of the data node is its node ID; for the historical state tree, the key of the data node is the hash value of its data content.
Glossary
Term DefinitionBlockchainA decentralized distributed ledger technology used to record and verify transaction data. AccountThe basic unit used to record transaction information in the blockchain, which can be an external account or a contract account. Merkle treeA tree data structure used to efficiently verify the integrity of data. Merkle Patricia Tree (MPT)An improved Merkle tree variant that combines the advantages of Merkle tree and Trie dictionary tree. Content addressing searches for data based on the hash value of the data content. Node reuseReuse the corresponding MPT tree node in the previous block in the MPT tree. The current Merkle state tree maintains the Merkle tree of the latest state of each blockchain account. The historical Merkle state tree maintains the Merkle tree of the historical state data of each blockchain account. The read-write set is a data set that records the changes in account status before and after the transaction is executed. B+ tree is a balanced multi-way search tree with high read and modify performance. LSM tree is a tree data structure based on log structure merging, with high write performance. LevelDB is a Key-Value database that uses a multi-level data storage structure. RocksDB is a Key-Value database based on the LevelDB architecture. Trie dictionary tree is a tree data structure used to efficiently store and retrieve strings.