Difference between revisions of "Replicated state structure"
(3 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
= This Page is Still Work in Progress = | = This Page is Still Work in Progress = | ||
− | Each node of the Internet Computer maintains a state. The state includes the data related to canisters, the messages processed by the node, responses generated after processing the messages, etc. A portion of the state is individual to each node (Eg: messages received in the peer-to-peer layer, cryptographic key material). A portion of the state is identical for all the honest nodes in the subnet. This portion of the state is called <b>replicated state</b> of the subnet. | + | Each node of the Internet Computer maintains a state. The state includes the data related to canisters, the messages processed by the node, responses generated after processing the messages, etc. A portion of the state is individual to each node (Eg: messages received in the peer-to-peer layer, cryptographic key material). A portion of the state is identical for all the honest nodes in the subnet. This portion of the state is called <b>replicated state</b> of the subnet. This article describes the structure of the replicated state stored on the Internet Computer. |
== Per-round Certified State == | == Per-round Certified State == | ||
Line 7: | Line 7: | ||
For each canister C running on a subnet, there are several input queues. There is one input queue specifically for ingress messages (sent by the external users) to C. For each other canister C′, with whom C communicates, there is one input queue. This input queue is used to store cross-subnet messages received from canister C'. In each round, the execution layer will consume some of the inputs in these queues, update the replicated state of the relevant canisters, and place outputs in <i>ingress history</i> and various output queues. Ingress history is used store the responses generated by the canister on processing ingress messages. For each canister C running on a subnet, there are several output queues. For each other canister C′, with whom C communicates, there is one output queue. This output queue is used to store cross-subnet messages to be sent from C to C'. | For each canister C running on a subnet, there are several input queues. There is one input queue specifically for ingress messages (sent by the external users) to C. For each other canister C′, with whom C communicates, there is one input queue. This input queue is used to store cross-subnet messages received from canister C'. In each round, the execution layer will consume some of the inputs in these queues, update the replicated state of the relevant canisters, and place outputs in <i>ingress history</i> and various output queues. Ingress history is used store the responses generated by the canister on processing ingress messages. For each canister C running on a subnet, there are several output queues. For each other canister C′, with whom C communicates, there is one output queue. This output queue is used to store cross-subnet messages to be sent from C to C'. | ||
− | After the round, each node in the subnet stores some relevant information related to the round execution in a tree structure as shown below. The execution layer is designed to execute all the canisters deterministically. Therefore, each node in the subnet should independently create at the same tree. Each node then | + | After the round, each node in the subnet stores some relevant information related to the round execution in a tree structure as shown below. The execution layer is designed to execute all the canisters deterministically. Therefore, each node in the subnet should independently create at the same tree. Each node then creates a Merkle tree by hashing the below tree, signs the root hash using their <i>threshold signing key</i>, and broadcasts their signature share to rest of the nodes in the subnet. On receiving the signature shares from peers, each node then combines the signature shares to create a <i>signature</i> of the root hash. The tree is now considered to be certified by the nodes in the subnet. The below state tree along with its signature is called <i>Per-round certified state</i>. The process of certificate creation can be found in the [https://internetcomputer.org/docs/current/references/ic-interface-spec/#certificate interface spec]. The per-round state tree is also known as the [https://internetcomputer.org/docs/current/references/ic-interface-spec/#state-tree system state tree]. |
[[File:Per-round certified state.png|900px|frameless|left]] | [[File:Per-round certified state.png|900px|frameless|left]] | ||
− | In the above per-round state tree, the internal nodes store a label and are represented by rectangles. The leaves contain data and are represented with rounded boxes. In the per-round state tree, | + | In the above per-round state tree, the internal nodes store a label and are represented by rectangles. The leaves contain data and are represented with rounded boxes. In the per-round state tree, the following information is stored: |
* Time - Height of the blockchain for which the tree is generated. | * Time - Height of the blockchain for which the tree is generated. | ||
− | |||
* Metadata - Metadata of the subnet. | * Metadata - Metadata of the subnet. | ||
− | + | * Canister - certified state, module hash, controllers and metadata are stored for each canister on the subnet. | |
− | * Canister - | ||
** Controllers - Each canister can have a (possibly empty) list of controllers. A controller of a canister has the ability to upgrade, stop or delete the canister. | ** Controllers - Each canister can have a (possibly empty) list of controllers. A controller of a canister has the ability to upgrade, stop or delete the canister. | ||
** Module hash - Each canister runs a Web Assembly (WASM) module. Module hash is the hash of the module. | ** Module hash - Each canister runs a Web Assembly (WASM) module. Module hash is the hash of the module. | ||
Line 20: | Line 18: | ||
** Certified state - When a user sends a query call to the Internet Computer, the message is processed by only one (possibly malicious) blockchain node of the subnet. The response to query calls will not have a certificate and thereby cannot be trusted. To improve the trust in query call responses, Internet Computer has the notion of <i>certified variables</i>. In a nutshell, a canister can a-priori choose to create a certificate for some information and store it in the replicated state. When a user makes a query call later for the information, the canister can directly responds with the information along with its certificate from the replicated state. These certified variable are stored as <i>certified state</i> in the per-round state tree. Note that this <i>certified state</i> is not the entire state of the canister. A canister explicitly chooses a part of its state to be stored as certified state. More details can be found in the [https://internetcomputer.org/docs/current/references/ic-interface-spec/#system-api-certified-data interface spec]. | ** Certified state - When a user sends a query call to the Internet Computer, the message is processed by only one (possibly malicious) blockchain node of the subnet. The response to query calls will not have a certificate and thereby cannot be trusted. To improve the trust in query call responses, Internet Computer has the notion of <i>certified variables</i>. In a nutshell, a canister can a-priori choose to create a certificate for some information and store it in the replicated state. When a user makes a query call later for the information, the canister can directly responds with the information along with its certificate from the replicated state. These certified variable are stored as <i>certified state</i> in the per-round state tree. Note that this <i>certified state</i> is not the entire state of the canister. A canister explicitly chooses a part of its state to be stored as certified state. More details can be found in the [https://internetcomputer.org/docs/current/references/ic-interface-spec/#system-api-certified-data interface spec]. | ||
− | * Subnet - | + | * Subnet - some relevant information related to all the other subnets is stored. Specifically, |
* Request data - The request data contains the ingress history generated by the execution layer during the round. For each ingress message processed by the execution layer, the request data stores status, reply, reject code, reject message and error code. More details can be found in the [https://internetcomputer.org/docs/current/references/ic-interface-spec/#state-tree-request-status interface spec]. | * Request data - The request data contains the ingress history generated by the execution layer during the round. For each ingress message processed by the execution layer, the request data stores status, reply, reject code, reject message and error code. More details can be found in the [https://internetcomputer.org/docs/current/references/ic-interface-spec/#state-tree-request-status interface spec]. | ||
− | * Streams - As | + | * Streams - As canisters are run during the round, a canister C may send a message to another canister C'. Canister C places all its outgoing inter-canister messages in its output queues. After the round, the message routing layer will take the messages in these output queues and place them into <i>subnet-to-subnet streams</i> to be processed by a crossnet transfer protocol, whose job it is to actually transport these messages to other subnets. For each other subnet, the cross-subnet messages to be sent to the subnet are stored. |
In nutshell, the per-round state tree contains only the information that someone would be interested in querying after the execution of a round. The total state of canisters and message routing layers are not included in the per-round state. | In nutshell, the per-round state tree contains only the information that someone would be interested in querying after the execution of a round. The total state of canisters and message routing layers are not included in the per-round state. | ||
Line 32: | Line 30: | ||
The replicated state consists of the canister states, metadata of the subnet, bitcoin state, subnet and consensus queues. | The replicated state consists of the canister states, metadata of the subnet, bitcoin state, subnet and consensus queues. | ||
− | * <b>Canister States</b> - The entire state of each canister in the subnet. For each canister in the subnet, | + | * <b>Canister States</b> - The entire state of each canister in the subnet. For each canister in the subnet, its system state, execution state, and scheduler state are stored. |
** System state - State that is controlled and owned by the system (IC). The system state contains information needed for running and maintaining the canister. The state here cannot be directly modified by the Wasm module in the canister but can be indirectly via the SystemApi interface. The system state includes: | ** System state - State that is controlled and owned by the system (IC). The system state contains information needed for running and maintaining the canister. The state here cannot be directly modified by the Wasm module in the canister but can be indirectly via the SystemApi interface. The system state includes: | ||
*** Controllers - Each canister can have a (possibly empty) list of controllers. A controller of a canister has the ability to upgrade, stop or delete the canister. | *** Controllers - Each canister can have a (possibly empty) list of controllers. A controller of a canister has the ability to upgrade, stop or delete the canister. | ||
Line 68: | Line 66: | ||
* <b>Metadata</b> - Metadata of the entire subnet. Metadata stores the architecture of the Internet Computer, routing tables to communicate with other subnets, etc. This is used for inter-canister messaging and history queues. | * <b>Metadata</b> - Metadata of the entire subnet. Metadata stores the architecture of the Internet Computer, routing tables to communicate with other subnets, etc. This is used for inter-canister messaging and history queues. | ||
** Ingress history - History of ingress messages as they traversed through the system. | ** Ingress history - History of ingress messages as they traversed through the system. | ||
− | ** Streams - For each other subnet, | + | ** Streams - For each other subnet, the state of the corresponding crossnet (XNet) stream to that subnet are stored. |
** Canister allocation ranges - The canister ID ranges from which this subnet generates canister IDs. | ** Canister allocation ranges - The canister ID ranges from which this subnet generates canister IDs. | ||
** Last generated canister id - The last generated canister ID (or "None" if this subnet has not generated any canister IDs yet). This canister id must be within the range specified in "canister allocation ranges". | ** Last generated canister id - The last generated canister ID (or "None" if this subnet has not generated any canister IDs yet). This canister id must be within the range specified in "canister allocation ranges". | ||
Line 83: | Line 81: | ||
** Time of last allocation charge - The last time when canisters were charged for compute and storage allocation. Charging for compute and storage is done periodically, so this is needed to calculate how much time should be charged for when charging does occur. | ** Time of last allocation charge - The last time when canisters were charged for compute and storage allocation. Charging for compute and storage is done periodically, so this is needed to calculate how much time should be charged for when charging does occur. | ||
** Subnet metrics | ** Subnet metrics | ||
− | ** Expected compiled Wasms - The set of WASM modules | + | ** Expected compiled Wasms - The set of WASM modules expected to be present in the [`Hypervisor`]'s compilation cache. This allows us to deterministically decide when a compilation is expected to be fast and ignore the compilation cost when considering the round instruction limit. Each time a canister is installed, its WASM is inserted and the set is cleared at each checkpoint. |
** Bitcoin - Responses to "BitcoinGetSuccessors" can be larger than the max inter-canister response limit. To work around this limitation, large responses are paginated and are stored here temporarily until they're fetched by the calling canister. | ** Bitcoin - Responses to "BitcoinGetSuccessors" can be larger than the max inter-canister response limit. To work around this limitation, large responses are paginated and are stored here temporarily until they're fetched by the calling canister. | ||
* <b>Subnet Queues</b> - Messages received from, and to be sent to each subnet. | * <b>Subnet Queues</b> - Messages received from, and to be sent to each subnet. | ||
− | * <b>Consensus Queue</b> - | + | * <b>Consensus Queue</b> - A list of responses to be sent to the consensus layer. |
* <b>Bitcoin State</b> - The state of the Bitcoin network. With the introduction of [https://wiki.internetcomputer.org/wiki/Bitcoin_integration Bitcoin integration] feature, the canisters on the Internet Computer can store Bitcoin, query the state of Bitcoin network and post transactions to the Bitcoin network. Each Internet Computer blockchain node syncs Bitcoin state regularly from the Bitcoin network. The replicated state of the subnet stores this Bitcoin state. | * <b>Bitcoin State</b> - The state of the Bitcoin network. With the introduction of [https://wiki.internetcomputer.org/wiki/Bitcoin_integration Bitcoin integration] feature, the canisters on the Internet Computer can store Bitcoin, query the state of Bitcoin network and post transactions to the Bitcoin network. Each Internet Computer blockchain node syncs Bitcoin state regularly from the Bitcoin network. The replicated state of the subnet stores this Bitcoin state. | ||
Line 95: | Line 93: | ||
** Stable height - The height of the Bitcoin block that has enough number of confirmations. | ** Stable height - The height of the Bitcoin block that has enough number of confirmations. | ||
** Fee percentiles cache - When a canister posts a transaction to the Bitcoin network, it would like to specify a transaction fee. Fee percentiles cache stores the statistical information on the the transaction fee observed in Bitcoin network recently. | ** Fee percentiles cache - When a canister posts a transaction to the Bitcoin network, it would like to specify a transaction fee. Fee percentiles cache stores the statistical information on the the transaction fee observed in Bitcoin network recently. | ||
− | ** Adapter queues | + | ** Adapter queues - When a canister sends a transactions to the Bitcoin network, the transaction is first added to the adapter queue. The "adapter" process later picks up the transactions in this queue and sends them to the Bitcoin network. |
The implementation for the replicated state can be found on [https://github.com/dfinity/ic/blob/master/rs/replicated_state/src/replicated_state.rs github]. | The implementation for the replicated state can be found on [https://github.com/dfinity/ic/blob/master/rs/replicated_state/src/replicated_state.rs github]. | ||
+ | |||
+ | ==See Also== | ||
+ | * '''The Internet Computer project website (hosted on the IC): [https://internetcomputer.org/ internetcomputer.org]''' |
Latest revision as of 10:59, 27 February 2023
This Page is Still Work in Progress
Each node of the Internet Computer maintains a state. The state includes the data related to canisters, the messages processed by the node, responses generated after processing the messages, etc. A portion of the state is individual to each node (Eg: messages received in the peer-to-peer layer, cryptographic key material). A portion of the state is identical for all the honest nodes in the subnet. This portion of the state is called replicated state of the subnet. This article describes the structure of the replicated state stored on the Internet Computer.
Per-round Certified State
In each consensus round, one of the nodes in the subnet proposes a new block. The nodes in the subnet execute the consensus protocol to finalize one block per consensus round. Finalized blocks are passed onto the message routing layer. Message routing layer routes each message in the block into the appropriate input queue of the target canister.
For each canister C running on a subnet, there are several input queues. There is one input queue specifically for ingress messages (sent by the external users) to C. For each other canister C′, with whom C communicates, there is one input queue. This input queue is used to store cross-subnet messages received from canister C'. In each round, the execution layer will consume some of the inputs in these queues, update the replicated state of the relevant canisters, and place outputs in ingress history and various output queues. Ingress history is used store the responses generated by the canister on processing ingress messages. For each canister C running on a subnet, there are several output queues. For each other canister C′, with whom C communicates, there is one output queue. This output queue is used to store cross-subnet messages to be sent from C to C'.
After the round, each node in the subnet stores some relevant information related to the round execution in a tree structure as shown below. The execution layer is designed to execute all the canisters deterministically. Therefore, each node in the subnet should independently create at the same tree. Each node then creates a Merkle tree by hashing the below tree, signs the root hash using their threshold signing key, and broadcasts their signature share to rest of the nodes in the subnet. On receiving the signature shares from peers, each node then combines the signature shares to create a signature of the root hash. The tree is now considered to be certified by the nodes in the subnet. The below state tree along with its signature is called Per-round certified state. The process of certificate creation can be found in the interface spec. The per-round state tree is also known as the system state tree.
In the above per-round state tree, the internal nodes store a label and are represented by rectangles. The leaves contain data and are represented with rounded boxes. In the per-round state tree, the following information is stored:
- Time - Height of the blockchain for which the tree is generated.
- Metadata - Metadata of the subnet.
- Canister - certified state, module hash, controllers and metadata are stored for each canister on the subnet.
- Controllers - Each canister can have a (possibly empty) list of controllers. A controller of a canister has the ability to upgrade, stop or delete the canister.
- Module hash - Each canister runs a Web Assembly (WASM) module. Module hash is the hash of the module.
- Metadata - Metadata contains a bunch of key-value pairs. When a WASM module is installed into the canister, the blockchain nodes look into the "Custom Section" of the WASM module and place that data as metadata of the canister. The most common use cases of metadata are to store the candid interface of the canister and git repository link of the canister code.
- Certified state - When a user sends a query call to the Internet Computer, the message is processed by only one (possibly malicious) blockchain node of the subnet. The response to query calls will not have a certificate and thereby cannot be trusted. To improve the trust in query call responses, Internet Computer has the notion of certified variables. In a nutshell, a canister can a-priori choose to create a certificate for some information and store it in the replicated state. When a user makes a query call later for the information, the canister can directly responds with the information along with its certificate from the replicated state. These certified variable are stored as certified state in the per-round state tree. Note that this certified state is not the entire state of the canister. A canister explicitly chooses a part of its state to be stored as certified state. More details can be found in the interface spec.
- Subnet - some relevant information related to all the other subnets is stored. Specifically,
- Request data - The request data contains the ingress history generated by the execution layer during the round. For each ingress message processed by the execution layer, the request data stores status, reply, reject code, reject message and error code. More details can be found in the interface spec.
- Streams - As canisters are run during the round, a canister C may send a message to another canister C'. Canister C places all its outgoing inter-canister messages in its output queues. After the round, the message routing layer will take the messages in these output queues and place them into subnet-to-subnet streams to be processed by a crossnet transfer protocol, whose job it is to actually transport these messages to other subnets. For each other subnet, the cross-subnet messages to be sent to the subnet are stored.
In nutshell, the per-round state tree contains only the information that someone would be interested in querying after the execution of a round. The total state of canisters and message routing layers are not included in the per-round state.
Replicated State
The replicated state of a subnet is the portion of the state of blockchain nodes that is replicated/identical across all the nodes in the subnet. The replicated state of a subnet is often 100s of gigabytes or even terabytes in size. The replicated state is divided into chunks and each chunk is stored as a file. To create a certificate for the replicated state, the blockchain nodes have to compute hash of the replicated state and sign the hash. As replicated state is often 100s of gigabytes long, the replicated state is not certified for each round.
The replicated state consists of the canister states, metadata of the subnet, bitcoin state, subnet and consensus queues.
- Canister States - The entire state of each canister in the subnet. For each canister in the subnet, its system state, execution state, and scheduler state are stored.
- System state - State that is controlled and owned by the system (IC). The system state contains information needed for running and maintaining the canister. The state here cannot be directly modified by the Wasm module in the canister but can be indirectly via the SystemApi interface. The system state includes:
- Controllers - Each canister can have a (possibly empty) list of controllers. A controller of a canister has the ability to upgrade, stop or delete the canister.
- Canister id - Each canister has an identifier, which is a principal.
- queues
- memory allocation
- freeze threshold
- status
- certified data
- canister metrics
- cycles balance
- cycles debit
- task queue
- global timer
- Execution state - The part of the canister state that can be accessed during execution. Note that execution state is used to track ephemeral information. In particular, `ExecutionState` as processed by runtime is reconstructed at points to represent the execution state in the sandbox process. Therefore, there is no guarantee that equality and ordering of a value are going to be preserved. That is the execution state value might differ even if nothing has been executed.
- Canister root - The path where Canister memory is located. Needs to be stored in ExecutionState in order to perform the exec system call.
- Session nonce - If occupied, runtime is already processing this execution state. This is being used to refer to mutated "MappedState and globals that reside in the sandbox execution process (and not necessarily in memory) and enable continuations.
- Wasm binary - The Wasm executable associated with this state.
- Wasm memory - The persistent heap of the module.
- Stable memory - The canister stable memory which is persisted across canister upgrades.
- Exported globals - The state of exported global variables. Internal global variables are not accessible.
- Exports - A set of the functions that a Wasm module exports. These exported functions can be called by external users or other canisters.
- Metadata - A Wasm module can optionally have a 'Custom Section' to store some metadata. This metadata is extracted from the Wasm module and stored here.
- Last execution round - Round number at which canister executed update type operation.
- Scheduler state - State maintained by the scheduler for scheduling tasks for the canister.
- Last full execution round - The last full round that a canister got the chance to execute. This means that the canister was given the first pulse in the round or consumed its input queue.
- Compute allocation - A canister's compute allocation. A higher compute allocation corresponds to higher priority in scheduling.
- Accumulated priority - Keeps the current priority of this canister, accumulated during the past rounds. In the scheduler analysis documentation, this value is the entry in the vector d that corresponds to this canister.
- Priority credit - Keeps the current priority credit of this Canister, accumulated during the long execution. During the long execution, the Canister is temporarily credited with priority to slightly boost the long execution priority. Only when the long execution is done, then the "accumulated_priority" is decreased by the "priority_credit".
- Long execution mode - Opportunistic (default) or Prioritized
- Heap delta debit - The amount of heap delta debit. The canister skips execution of update messages if this value is non-zero.
- Install code debit - The amount of install_code instruction debit. The canister rejects install_code messages if this value is non-zero.
- Time of last allocation charge - The last time when the canister was charged for the resource allocations. Charging for compute and storage is done periodically, so this is needed to calculate how much time should be considered when charging occurs.
- System state - State that is controlled and owned by the system (IC). The system state contains information needed for running and maintaining the canister. The state here cannot be directly modified by the Wasm module in the canister but can be indirectly via the SystemApi interface. The system state includes:
- Metadata - Metadata of the entire subnet. Metadata stores the architecture of the Internet Computer, routing tables to communicate with other subnets, etc. This is used for inter-canister messaging and history queues.
- Ingress history - History of ingress messages as they traversed through the system.
- Streams - For each other subnet, the state of the corresponding crossnet (XNet) stream to that subnet are stored.
- Canister allocation ranges - The canister ID ranges from which this subnet generates canister IDs.
- Last generated canister id - The last generated canister ID (or "None" if this subnet has not generated any canister IDs yet). This canister id must be within the range specified in "canister allocation ranges".
- Previous state hash - The hash of the previous partial canonical state. The initial state doesn't have any previous state.
- Batch time - The Consensus-determined time this batch was created. This time is monotonically increasing (and not strictly increasing).
- Network topology -
- Own subnet id -
- Own subnet type -
- Own subnet features -
- Subnet call context manager - Asynchronously handled subnet messages
- State sync version - The version of StateSync protocol that should be used to compute manifest of this state.
- Certificate version - The version of certification procedure that should be used for this state.
- Heap delta estimate
- Time of last allocation charge - The last time when canisters were charged for compute and storage allocation. Charging for compute and storage is done periodically, so this is needed to calculate how much time should be charged for when charging does occur.
- Subnet metrics
- Expected compiled Wasms - The set of WASM modules expected to be present in the [`Hypervisor`]'s compilation cache. This allows us to deterministically decide when a compilation is expected to be fast and ignore the compilation cost when considering the round instruction limit. Each time a canister is installed, its WASM is inserted and the set is cleared at each checkpoint.
- Bitcoin - Responses to "BitcoinGetSuccessors" can be larger than the max inter-canister response limit. To work around this limitation, large responses are paginated and are stored here temporarily until they're fetched by the calling canister.
- Subnet Queues - Messages received from, and to be sent to each subnet.
- Consensus Queue - A list of responses to be sent to the consensus layer.
- Bitcoin State - The state of the Bitcoin network. With the introduction of Bitcoin integration feature, the canisters on the Internet Computer can store Bitcoin, query the state of Bitcoin network and post transactions to the Bitcoin network. Each Internet Computer blockchain node syncs Bitcoin state regularly from the Bitcoin network. The replicated state of the subnet stores this Bitcoin state.
- UTXO Set - UTXO set of the Bitcoin network.
- Unstable blocks - The recently added Bitcoin blocks that are still not "confirmed" yet.
- Stable height - The height of the Bitcoin block that has enough number of confirmations.
- Fee percentiles cache - When a canister posts a transaction to the Bitcoin network, it would like to specify a transaction fee. Fee percentiles cache stores the statistical information on the the transaction fee observed in Bitcoin network recently.
- Adapter queues - When a canister sends a transactions to the Bitcoin network, the transaction is first added to the adapter queue. The "adapter" process later picks up the transactions in this queue and sends them to the Bitcoin network.
The implementation for the replicated state can be found on github.
See Also
- The Internet Computer project website (hosted on the IC): internetcomputer.org