Blockchain state data synchronization learning
Short answer questions
What is a Merkle state tree? What role does it play in the blockchain?
Please explain the difference between the current Merkle state tree and the historical Merkle state tree, and the type of data they store.
Why is it more efficient to synchronize only the current Merkle state tree than the entire historical Merkle state tree when synchronizing the Merkle state tree data?
After synchronizing the Merkle state tree data from other node devices, how to deal with the situation where the local latest block height lags behind the actual latest block height?
What is a read-write set, and what role does it play in the update process of the Merkle state tree?
Please explain the difference between the update data node and the historical data node, and how they are generated respectively.
Why are the current Merkle state tree and the historical Merkle state tree stored in different data structures in the database?
Please explain the characteristics of the B+ tree and the LSM tree, and what type of Merkle state tree data they are suitable for storing respectively.
When using a Key-Value database to store Merkle state tree data, what are the key values of the current Merkle state tree and the historical Merkle state tree?
Please explain the difference and role of node ID and hash value in storing Merkle state tree data.
Answer to short answer question
The Merkle state tree is a data structure used to store and organize account state data in the blockchain. It connects all account state data using hash pointers to form a tree structure, which is convenient for quick verification of data integrity and authenticity.
The current Merkle state tree stores the latest state data of each blockchain account, while the historical Merkle state tree stores the historical state data of each blockchain account. The current Merkle state tree only contains the latest data and has a small storage space; while the historical Merkle state tree contains all historical data and has a large storage space.
Since the current Merkle state tree only stores the latest data, its data volume is much smaller than the historical Merkle state tree containing all historical data. Therefore, synchronizing only the current Merkle state tree can significantly reduce the amount of data transmission and increase the synchronization speed.
When the local latest block height lags behind the actual latest block height, it is necessary to re-execute transactions of all blocks from the local latest block to the actual latest block, and update the synchronized current and historical Merkle state tree data based on these transactions to ensure that the local data is consistent with the blockchain network.
The read-write set is used to record the status data of the relevant accounts before and after the transaction execution in the target block. During the update process of the Merkle state tree, the read-write set is used to generate a write set that describes the Merkle state tree data nodes that need to be written to the target block.
The update data node is generated by modifying and updating the existing data nodes in the current Merkle state tree, and is used to update the current Merkle state tree; while the historical data node is recreated and added based on the historical account status, and is used to build the historical Merkle state tree.
The current Merkle state tree needs to frequently modify the value of the data node, while the historical Merkle state tree focuses more on writing new historical data nodes. Therefore, in order to optimize read and write performance, they are stored in different data structures in the database.
B+ tree is a balanced tree with high read and write performance, suitable for storing current Merkle state tree data that needs to frequently modify data node values; while LSM tree is a log structure merge tree with high write performance, suitable for storing historical Merkle state tree data that needs to frequently write new historical data nodes.
The key value of the current Merkle state tree data node is the node ID, while the key value of the historical Merkle state tree data node is the hash value of the data content.
The node ID is the unique identifier of the data node in the Merkle state tree, which is used to quickly locate and modify the data node; while the hash value is used to verify the integrity of the data content and realize the reuse of data nodes through "content addressing"
Key terms
Term definition Merkle state tree A tree data structure used to store and organize account state data in a blockchain. The current Merkle state tree stores the Merkle state tree of the latest state data of each blockchain account. The historical Merkle state tree stores the Merkle state tree of the historical state data of each blockchain account. Data node The basic unit in the Merkle state tree, storing account state data or hash pointers. Node ID Unique identifier of a data node in the Merkle state tree. Hash value A summary of data calculated by a hash function, used to verify data integrity. Read-write set A data set that records the status of related accounts before and after a transaction is executed. B+ tree A balanced tree with high read and write performance. LSM tree A log-structured merge tree with high write performance. Key-Value database A database that stores data in the form of key-value pairs. LevelDB database A Key-Value database that uses a multi-layer storage structure. RocksDB database A Key-Value database based on the LevelDB architecture. MPT tree Merkle Patricia tree, a data structure that combines the advantages of Merkle tree and Trie tree.