Difference between revisions of "Replicated state structure"

From Internet Computer Wiki
Jump to: navigation, search
Line 48: Line 48:
 
** Execution state - The part of the canister state that can be accessed during execution. Note that execution state is used to track ephemeral information. In particular, `ExecutionState` as processed by runtime is reconstructed at points to represent the execution state in the sandbox process. Therefore, there is no guarantee that equality and ordering of a value are going to be preserved. That is the execution state value might differ even if nothing has been executed.
 
** Execution state - The part of the canister state that can be accessed during execution. Note that execution state is used to track ephemeral information. In particular, `ExecutionState` as processed by runtime is reconstructed at points to represent the execution state in the sandbox process. Therefore, there is no guarantee that equality and ordering of a value are going to be preserved. That is the execution state value might differ even if nothing has been executed.
 
** Scheduler state - State maintained by the scheduler for scheduling tasks for the canister.  
 
** Scheduler state - State maintained by the scheduler for scheduling tasks for the canister.  
 +
*** Last full execution round - The last full round that a canister got the chance to execute. This means that the canister was given the first pulse in the round or consumed its input queue.
 +
*** Compute allocation - A canister's compute allocation. A higher compute allocation corresponds to higher priority in scheduling.
 +
*** Accumulated priority -  Keeps the current priority of this canister, accumulated during the past rounds. In the scheduler analysis documentation, this value is the entry in the vector d that corresponds to this canister.
 +
*** Priority credit - Keeps the current priority credit of this Canister, accumulated during the long execution.  During the long execution, the Canister is temporarily credited with priority to slightly boost the long execution priority. Only when the long execution is done, then the `accumulated_priority` is decreased by the `priority_credit`.
 +
*** Long execution mode - Opportunistic (default) or Prioritized
 +
*** Heap delta debit - The amount of heap delta debit. The canister skips execution of update messages if this value is non-zero.
 +
*** Install code debit - The amount of install_code instruction debit. The canister rejects install_code messages if this value is non-zero.
 +
*** Time of last allocation charge - The last time when the canister was charged for the resource allocations. Charging for compute and storage is done periodically, so this is needed to calculate how much time should be considered when charging occurs.
  
 
* Metadata - Metadata of the entire subnet.  
 
* Metadata - Metadata of the entire subnet.  

Revision as of 19:17, 15 November 2022

This Page is Still Work in Progress

Each node of the Internet Computer maintains a state. The state includes the data related to canisters, the messages processed by the node, responses generated after processing the messages, etc. A portion of the state is individual to each node (Eg: messages received in the peer-to-peer layer, cryptographic key material). A portion of the state is identical for all the honest nodes in the subnet. This portion of the state is called replicated state of the subnet. In this article, we describe the structure of the replicated state stored on the Internet Computer.

Per-round Certified State

In each consensus round, one of the nodes in the subnet proposes a new block. The nodes in the subnet execute the consensus protocol to finalize one block per consensus round. Finalized blocks are passed onto the message routing layer. Message routing layer routes each message in the block into the appropriate input queue of the target canister.

For each canister C running on a subnet, there are several input queues. There is one input queue specifically for ingress messages (sent by the external users) to C. For each other canister C′, with whom C communicates, there is one input queue. This input queue is used to store cross-subnet messages received from canister C'. In each round, the execution layer will consume some of the inputs in these queues, update the replicated state of the relevant canisters, and place outputs in ingress history and various output queues. Ingress history is used store the responses generated by the canister on processing ingress messages. For each canister C running on a subnet, there are several output queues. For each other canister C′, with whom C communicates, there is one output queue. This output queue is used to store cross-subnet messages to be sent from C to C'.

After the round, each node in the subnet stores some relevant information related to the round execution in a tree structure as shown below. The execution layer is designed to execute all the canisters deterministically. Therefore, each node in the subnet should independently create at the same tree. Each node then create a Merkle tree by hashing the below tree, sign the root hash using their threshold signing key, and broadcast their signature share to rest of the nodes in the subnet. On receiving the signature shares from peers, each node then combines the signature shares to create a signature of the root hash. We now consider the tree as certified by the nodes in the subnet. The below state tree along with its signature is called Per-round certified state. The process of certificate creation can be found in the interface spec. The per-round state tree is also known as the system state tree.

Per-round certified state.png

In the above per-round state tree, the internal nodes store a label and are represented by rectangles. The leaves contain data and are represented with rounded boxes. In the per-round state tree, we store the following information.

  • Time - Height of the blockchain for which the tree is generated.
  • Metadata - Metadata of the subnet.
  • Canister - For each canister running on the subnet, we store its certified state, module hash, controllers and metadata.
    • Controllers - Each canister can have a (possibly empty) list of controllers. A controller of a canister has the ability to upgrade, stop or delete the canister.
    • Module hash - Each canister runs a Web Assembly (WASM) module. Module hash is the hash of the module.
    • Metadata - Metadata contains a bunch of key-value pairs. When a WASM module is installed into the canister, the blockchain nodes look into the "Custom Section" of the WASM module and place that data as metadata of the canister. The most common use cases of metadata are to store the candid interface of the canister and git repository link of the canister code.
    • Certified state - When a user sends a query call to the Internet Computer, the message is processed by only one (possibly malicious) blockchain node of the subnet. The response to query calls will not have a certificate and thereby cannot be trusted. To improve the trust in query call responses, Internet Computer has the notion of certified variables. In a nutshell, a canister can a-priori choose to create a certificate for some information and store it in the replicated state. When a user makes a query call later for the information, the canister can directly responds with the information along with its certificate from the replicated state. These certified variable are stored as certified state in the per-round state tree. Note that this certified state is not the entire state of the canister. A canister explicitly chooses a part of its state to be stored as certified state. More details can be found in the interface spec.
  • Subnet - We store some relevant information related to all the other subnets. Specifically,
  • Request data - The request data contains the ingress history generated by the execution layer during the round. For each ingress message processed by the execution layer, the request data stores status, reply, reject code, reject message and error code. More details can be found in the interface spec.
  • Streams - As we run the canisters during the round, a canister C may send a message to another canister C'. Canister C places all its outgoing inter-canister messages in its output queues. After the round, the message routing layer will take the messages in these output queues and place them into subnet-to-subnet streams to be processed by a crossnet transfer protocol, whose job it is to actually transport these messages to other subnets. For each other subnet, we store the cross-subnet messages to be sent to the subnet.

In nutshell, the per-round state tree contains only the information that someone would be interested in querying after the execution of a round. The total state of canisters and message routing layers are not included in the per-round state.

Replicated State

The replicated state of a subnet is the portion of the state of blockchain nodes that is replicated/identical across all the nodes in the subnet. The replicated state of a subnet is often 100s of gigabytes or even terabytes in size. The replicated state is divided into chunks and each chunk is stored as a file. To create a certificate for the replicated state, the blockchain nodes have to compute hash of the replicated state and sign the hash. As replicated state is often 100s of gigabytes long, the replicated state is not certified for each round.

The replicated state consists of the following components.

  • Canister States - The entire state of each canister in the subnet. For each canister in the subnet, we store the following:
    • System state - State that is controlled and owned by the system (IC). The system state contains information needed for running and maintaining the canister. The state here cannot be directly modified by the Wasm module in the canister but can be indirectly via the SystemApi interface. The system state includes:
      • Controllers - Each canister can have a (possibly empty) list of controllers. A controller of a canister has the ability to upgrade, stop or delete the canister.
      • Canister id - Each canister has an identifier, which is a principal.
      • queues
      • memory allocation
      • freeze threshold
      • status
      • certified data
      • canister metrics
      • cycles balance
      • cycles debit
      • task queue
      • global timer
    • Execution state - The part of the canister state that can be accessed during execution. Note that execution state is used to track ephemeral information. In particular, `ExecutionState` as processed by runtime is reconstructed at points to represent the execution state in the sandbox process. Therefore, there is no guarantee that equality and ordering of a value are going to be preserved. That is the execution state value might differ even if nothing has been executed.
    • Scheduler state - State maintained by the scheduler for scheduling tasks for the canister.
      • Last full execution round - The last full round that a canister got the chance to execute. This means that the canister was given the first pulse in the round or consumed its input queue.
      • Compute allocation - A canister's compute allocation. A higher compute allocation corresponds to higher priority in scheduling.
      • Accumulated priority - Keeps the current priority of this canister, accumulated during the past rounds. In the scheduler analysis documentation, this value is the entry in the vector d that corresponds to this canister.
      • Priority credit - Keeps the current priority credit of this Canister, accumulated during the long execution. During the long execution, the Canister is temporarily credited with priority to slightly boost the long execution priority. Only when the long execution is done, then the `accumulated_priority` is decreased by the `priority_credit`.
      • Long execution mode - Opportunistic (default) or Prioritized
      • Heap delta debit - The amount of heap delta debit. The canister skips execution of update messages if this value is non-zero.
      • Install code debit - The amount of install_code instruction debit. The canister rejects install_code messages if this value is non-zero.
      • Time of last allocation charge - The last time when the canister was charged for the resource allocations. Charging for compute and storage is done periodically, so this is needed to calculate how much time should be considered when charging occurs.
  • Metadata - Metadata of the entire subnet.
  • Subnet Queues - Messages received from, and to be sent to each subnet.
  • Consensus Queue - Blocks received from consensus layer.
  • Bitcoin State - The state of the Bitcoin network.

The implementation for the replicated state can be found on github.