Replicated state structure

From Internet Computer Wiki
Jump to: navigation, search

This Page is Still Work in Progress

Each node of the Internet Computer maintains a state. The state includes the data related to canisters, the messages processed by the node, responses generated after processing the messages, etc. A portion of the state is individual to each node (Eg: messages received in the peer-to-peer layer, cryptographic key material). A portion of the state is identical for all the honest nodes in the subnet. This portion of the state is called replicated state of the subnet. In this article, we describe the structure of the replicated state stored on the Internet Computer.

Per-round Certified State

In each consensus round, one of the nodes in the subnet proposes a new block. The nodes in the subnet execute the consensus protocol to finalize one block per consensus round. Finalized blocks are passed onto the message routing layer. Message routing layer routes each message in the block into the appropriate input queue of the target canister.

For each canister C running on a subnet, there are several input queues. There is one input queue specifically for ingress messages (sent by the external users) to C. For each other canister C′, with whom C communicates, there is one input queue. This input queue is used to store cross-subnet messages received from canister C'. In each round, the execution layer will consume some of the inputs in these queues, update the replicated state of the relevant canisters, and place outputs in ingress history and various output queues. Ingress history is used store the responses generated by the canister on processing ingress messages. For each canister C running on a subnet, there are several output queues. For each other canister C′, with whom C communicates, there is one output queue. This output queue is used to store cross-subnet messages to be sent from C to C'.

After the round, each node in the subnet stores some relevant information related to the round execution in a tree structure as shown below. The execution layer is designed to execute all the canisters deterministically. Therefore, each node in the subnet should independently create at the same tree. Each node then create a Merkle tree by hashing the below tree, sign the root hash using their threshold signing key, and broadcast their signature share to rest of the nodes in the subnet. On receiving the signature shares from peers, each node then combines the signature shares to create a signature of the root hash. We now consider the tree as certified by the nodes in the subnet. The below state tree along with its signature is called Per-round certified state. The process of certificate creation can be found in the interface spec. The per-round state tree is also known as the system state tree.

Per-round certified state.png

In the above per-round state tree, the internal nodes store a label and are represented by rectangles. The leaves contain data and are represented with rounded boxes. In the per-round state tree, we store the following information.

  • Time - Height of the blockchain for which the tree is generated.
  • Metadata - Metadata of the subnet.
  • Canister - For each canister running on the subnet, we store its certified state, module hash, controllers and metadata.
    • Controllers - Each canister can have a (possibly empty) list of controllers. A controller of a canister has the ability to upgrade, stop or delete the canister.
    • Module hash - Each canister runs a Web Assembly (WASM) module. Module hash is the hash of the module.
    • Metadata - Metadata contains a bunch of key-value pairs. When a WASM module is installed into the canister, the blockchain nodes look into the "Custom Section" of the WASM module and place that data as metadata of the canister. The most common use cases of metadata are to store the candid interface of the canister and git repository link of the canister code.
    • Certified state - When a user sends a query call to the Internet Computer, the message is processed by only one (possibly malicious) blockchain node of the subnet. The response to query calls will not have a certificate and thereby cannot be trusted. To improve the trust in query call responses, Internet Computer has the notion of certified variables. In a nutshell, a canister can a-priori choose to create a certificate for some information and store it in the replicated state. When a user makes a query call later for the information, the canister can directly responds with the information along with its certificate from the replicated state. These certified variable are stored as certified state in the per-round state tree. Note that this certified state is not the entire state of the canister. A canister explicitly chooses a part of its state to be stored as certified state. More details can be found in the interface spec.
  • Subnet - We store some relevant information related to all the other subnets. Specifically,
  • Request data - The request data contains the ingress history generated by the execution layer during the round. For each ingress message processed by the execution layer, the request data stores status, reply, reject code, reject message and error code. More details can be found in the interface spec.
  • Streams - As we run the canisters during the round, a canister C may send a message to another canister C'. Canister C places all its outgoing inter-canister messages in its output queues. After the round, the message routing layer will take the messages in these output queues and place them into subnet-to-subnet streams to be processed by a crossnet transfer protocol, whose job it is to actually transport these messages to other subnets. For each other subnet, we store the cross-subnet messages to be sent to the subnet.

In nutshell, the per-round state tree contains only the information that someone would be interested in querying after the execution of a round. The total state of canisters and message routing layers are not included in the per-round state.

Replicated State

The replicated state of a subnet is the portion of the state of blockchain nodes that is replicated/identical across all the nodes in the subnet. The replicated state of a subnet is often 100s of gigabytes or even terabytes in size. The replicated state is divided into chunks and each chunk is stored as a file. To create a certificate for the replicated state, the blockchain nodes have to compute hash of the replicated state and sign the hash. As replicated state is often 100s of gigabytes long, the replicated state is not certified for each round.

The replicated state consists of the canister states, metadata of the subnet, bitcoin state, subnet and consensus queues.

  • Canister States - The entire state of each canister in the subnet. For each canister in the subnet, we store its system state, execution state and scheduler state.
    • System state - State that is controlled and owned by the system (IC). The system state contains information needed for running and maintaining the canister. The state here cannot be directly modified by the Wasm module in the canister but can be indirectly via the SystemApi interface. The system state includes:
      • Controllers - Each canister can have a (possibly empty) list of controllers. A controller of a canister has the ability to upgrade, stop or delete the canister.
      • Canister id - Each canister has an identifier, which is a principal.
      • queues
      • memory allocation
      • freeze threshold
      • status
      • certified data
      • canister metrics
      • cycles balance
      • cycles debit
      • task queue
      • global timer
    • Execution state - The part of the canister state that can be accessed during execution. Note that execution state is used to track ephemeral information. In particular, `ExecutionState` as processed by runtime is reconstructed at points to represent the execution state in the sandbox process. Therefore, there is no guarantee that equality and ordering of a value are going to be preserved. That is the execution state value might differ even if nothing has been executed.
      • Canister root - The path where Canister memory is located. Needs to be stored in ExecutionState in order to perform the exec system call.
      • Session nonce - If occupied, runtime is already processing this execution state. This is being used to refer to mutated "MappedState and globals that reside in the sandbox execution process (and not necessarily in memory) and enable continuations.
      • Wasm binary - The Wasm executable associated with this state.
      • Wasm memory - The persistent heap of the module.
      • Stable memory - The canister stable memory which is persisted across canister upgrades.
      • Exported globals - The state of exported global variables. Internal global variables are not accessible.
      • Exports - A set of the functions that a Wasm module exports. These exported functions can be called by external users or other canisters.
      • Metadata - A Wasm module can optionally have a 'Custom Section' to store some metadata. This metadata is extracted from the Wasm module and stored here.
      • Last execution round - Round number at which canister executed update type operation.
    • Scheduler state - State maintained by the scheduler for scheduling tasks for the canister.
      • Last full execution round - The last full round that a canister got the chance to execute. This means that the canister was given the first pulse in the round or consumed its input queue.
      • Compute allocation - A canister's compute allocation. A higher compute allocation corresponds to higher priority in scheduling.
      • Accumulated priority - Keeps the current priority of this canister, accumulated during the past rounds. In the scheduler analysis documentation, this value is the entry in the vector d that corresponds to this canister.
      • Priority credit - Keeps the current priority credit of this Canister, accumulated during the long execution. During the long execution, the Canister is temporarily credited with priority to slightly boost the long execution priority. Only when the long execution is done, then the "accumulated_priority" is decreased by the "priority_credit".
      • Long execution mode - Opportunistic (default) or Prioritized
      • Heap delta debit - The amount of heap delta debit. The canister skips execution of update messages if this value is non-zero.
      • Install code debit - The amount of install_code instruction debit. The canister rejects install_code messages if this value is non-zero.
      • Time of last allocation charge - The last time when the canister was charged for the resource allocations. Charging for compute and storage is done periodically, so this is needed to calculate how much time should be considered when charging occurs.
  • Metadata - Metadata of the entire subnet. Metadata stores the architecture of the Internet Computer, routing tables to communicate with other subnets, etc. This is used for inter-canister messaging and history queues.
    • Ingress history - History of ingress messages as they traversed through the system.
    • Streams - For each other subnet, we store the state of the corresponding crossnet (XNet) stream to that subnet.
    • Canister allocation ranges - The canister ID ranges from which this subnet generates canister IDs.
    • Last generated canister id - The last generated canister ID (or "None" if this subnet has not generated any canister IDs yet). This canister id must be within the range specified in "canister allocation ranges".
    • Previous state hash - The hash of the previous partial canonical state. The initial state doesn't have any previous state.
    • Batch time - The Consensus-determined time this batch was created. This time is monotonically increasing (and not strictly increasing).
    • Network topology -
    • Own subnet id -
    • Own subnet type -
    • Own subnet features -
    • Subnet call context manager - Asynchronously handled subnet messages
    • State sync version - The version of StateSync protocol that should be used to compute manifest of this state.
    • Certificate version - The version of certification procedure that should be used for this state.
    • Heap delta estimate
    • Time of last allocation charge - The last time when canisters were charged for compute and storage allocation. Charging for compute and storage is done periodically, so this is needed to calculate how much time should be charged for when charging does occur.
    • Subnet metrics
    • Expected compiled Wasms - The set of WASM modules we expect to be present in the [`Hypervisor`]'s compilation cache. This allows us to deterministically decide when we expect a compilation to be fast and ignore the compilation cost when considering the round instruction limit. Each time a canister is installed, its WASM is inserted and the set is cleared at each checkpoint.
    • Bitcoin - Responses to "BitcoinGetSuccessors" can be larger than the max inter-canister response limit. To work around this limitation, large responses are paginated and are stored here temporarily until they're fetched by the calling canister.
  • Subnet Queues - Messages received from, and to be sent to each subnet.
  • Consensus Queue - A list of responses to be sent to the consensus layer.
  • Bitcoin State - The state of the Bitcoin network. With the introduction of Bitcoin integration feature, the canisters on the Internet Computer can store Bitcoin, query the state of Bitcoin network and post transactions to the Bitcoin network. Each Internet Computer blockchain node syncs Bitcoin state regularly from the Bitcoin network. The replicated state of the subnet stores this Bitcoin state.
    • UTXO Set - UTXO set of the Bitcoin network.
    • Unstable blocks - The recently added Bitcoin blocks that are still not "confirmed" yet.
    • Stable height - The height of the Bitcoin block that has enough number of confirmations.
    • Fee percentiles cache - When a canister posts a transaction to the Bitcoin network, it would like to specify a transaction fee. Fee percentiles cache stores the statistical information on the the transaction fee observed in Bitcoin network recently.
    • Adapter queues - When a canister sends a transactions to the Bitcoin network, the transaction is first added to the adapter queue. The "adapter" process later picks up the transactions in this queue and sends them to the Bitcoin network.

The implementation for the replicated state can be found on github.

See Also