IC state manager
Overview
The Internet Computer (IC) achieves its security and fault tolerance by replicating computation across node machines located in various independent data centers across the world. For scalability reasons, the Internet Computing Protocol (ICP) composes the IC of multiple independent subnets. Each subnet can be viewed as an independent replicated state machine that replicates its state over a subset of all the available nodes.
Roughly speaking, replication is achieved by having the two lower ICP layers (P2P & Consensus) agree on blocks containing batches of messages to be executed, and then having the two upper ICP layers (Message Routing & Execution) execute them. Blocks are organized as a chain, where each block builds on the previous block. Each Block has an associated height in the chain and one can look at execution of a batch of messages corresponding to the agreed upon Block at height [math]\displaystyle{ x }[/math] by the upper layers as taking the replicated state of version [math]\displaystyle{ x-1 }[/math], and "applying" the batch to it to obtain replicated state of version [math]\displaystyle{ x }[/math].
The state manager is the component that maintains a versioned repository of replicated state and takes care of its certification so that other stakeholders can verify the authenticity of (parts of) the state. This page presents an abstract model of the state manager and the replicated state, and formally defines how the abstract model and the implementation need to relate to each other. We also rely on this model to define the behavior of other components that modify the replicated state, e.g., message routing.
The state manager also provides capabilities to sync state with other replicas in the same subnet so that replicas that have fallen behind can catch up. The latter is not yet covered in this page.
Remarks and Required Prior Knowledge
- The goal of this document is to provide the next level of detail compared to the material available for message routing in the "How it works" section of internetcomputer.org. So it is recommended to study the material available there first.
- We do not give a comprehensive type definition of the replicated state in this document, but leave the definition of the relevant parts to the components that modify the state, e.g., message routing. Sometimes we may rely on definitions made there.
- The documentation provided in this page may slightly deviate from the current implementation in terms of API as well as naming of functions, variables, etc. However, it still conveys the high-level ideas required to understand how the component itself works and how it interacts with other components.
Versioning of State
The state is versioned in a way that is in line with the batch numbers of the batches processed by the deterministic state machine. In particular this means that a state labeled with height [math]\displaystyle{ h }[/math] is exactly the state resulting from a processing cycle of the deterministic state machine processing a batch h using the replicated state labeled with Height [math]\displaystyle{ h - 1 }[/math]. For an overview of such a processing cycle, we refer the reader to the page on the message routing layer.
Abstract State Representation
An implementation of the state manager distinguishes multiple representations of replicated state. In this section we give an abstract description of the various representations and discuss the requirements we need to impose on their relations.
ReplicatedState
, which is the implementation’s internal representation of replicated state.
(Partial)CanonicalState
which is a representation of (parts of) replicated state, universally understandable across implementations.
StateHashTree
which is a structured digest of (parts of) canonical state, universally understandable across implementations.
The relation between the different representations is subsumed below. The space of instances of type ReplicatedState
can be partitioned into multiple equivalence classes. The criterion for whether two ReplicatedState
instances belong to the same equivalence class is whether or not they map to the same CanonicalState
. More formally, we use the equivalence relation [math]\displaystyle{ R }[/math] defined as follows:
[math]\displaystyle{ (x, y) \in R \Longleftrightarrow \texttt{to_canonical}(x) = \texttt{to_canonical}(y). }[/math]
This equivalence relation defines a type ReplicatedState_R
that contains all the equivalence classes [x]
over ReplicatedState
with respect to [math]\displaystyle{ R }[/math], i.e.,
[math]\displaystyle{ [x] = \{ y~|~y ∈ \texttt{ReplicatedState}~∧~(x, y) ∈ R \}. }[/math]
An object of type ReplicatedState
, i.e., a representative of one of the equivalence classes in ReplicatedState_R
, is created whenever the deterministic state machine implemented by the message routing and the execution layers calls the commit_and_certify
function on the state manager. This state can then be internally converted to CanonicalState
to be provided to other replicas within the state sync protocol or to be further converted to PartialCanonicalState
and, e.g., made available to block makers on other subnets who need to include the subnet-to-subnet streams in blocks.
To have a compact and easy to transfer representation that is useful to verify the integrity or compare different versions of (Partial)CanonicalState
, it can be converted into a StateHashTree
. The conversion function is a mapping that needs to be collision resistant, meaning that, for some cs1 : (Partial)CanonicalState
corresponding to some h : StateHashTree
it is intractable to find a different cs2 : (Partial)CanonicalState
that hashes to the same h
. Intuitively, this means that a replica that gets hold of some h : StateHashTree
can verify whether a certain piece of cs : (Partial)CanonicalState
actually corresponds to the respective h
. If h
is authentic, one can use this mechanism to verify the authenticity of c
.
From a StateHashTree
, one can derive a Digest
, representing a StateHashTree
even more compactly, as well as a Witness
. Deriving the latter requires additional information on the (Partial)CanonicalState
the witness should be valid for, i.e., so that one can use the Witness
and the (Partial)CanonicalState
to recompute the Digest
corresponding to the original StateHashTree
.
Batch processing
The figure below illustrates the operations of the state manager during batch processing, and, in doing so, also defines how processing a batch by the implementation should relate to the application of a batch to CanonicalState
as defined in the specification of the message routing and execution environment layers (the blue arrows in the figure). Usually, a replica keeps "applying" the batches passed by consensus to some representative of an equivalence class in ReplicatedState_R
to obtain the representative of an equivalence class in ReplicatedState_R
corresponding to the next state version. If a replica falls too far behind it is required to catch up with the help of other replicas because the batches it requires to move forward are no longer available. This is done by obtaining a certain version of CanonicalState
from other replicas via the state sync protocol.
Type definitions
CanonicalState
and PartialCanonicalState
can be abstractly viewed as trees and represented as partial maps. The figure below gives an example of a (Partial)CanonicalState
tree. Note that the labels used here are just exemplary and don't correspond to the actual state tree exposed by the implementation.
Formally, (Partial)CanonicalState is defined as:
LabeledTree = Content | Label ↦ LabeledTree CanonicalState = LabeledTree PartialCanonicalState = LabeledTree
Both, CanonicalState
and PartialCanonicalState
, are aliases for the same type. However, we name them differently to capture their slightly different semantic.
The type of Content
in the mapping is left abstract for now. It is a place holder for the concrete data placed in the state by the implementation.
Mappings between types
We require the following mappings between the different state representations. We do not give explicit algorithms for some of the mappings defined below but only provide abstract requirements on them. In security proofs and arguments we will explicitly state our assumptions on how the relevant functions are implemented and do the security proofs relative to them.
- A bijective function between ReplicatedState_R and CanonicalState that is efficiently computable in both directions. We model this as two efficiently computable, injective functions:
to_canonical : ReplicatedState_R → CanonicalState from_canonical : CanonicalState → ReplicatedState_R
where it holds that for all c : CanonicalState
we have that c = to_canonical(from_canonical(c))
. Note that these two functions will be implicitly defined by the implementation.
- A function
select
to derivePartialCanonicalState
from(Partial)CanonicalState
subject to someSelector
:
Selector = "Accept" | Label ↦ Selector select : LabeledTree × Selector → PartialCanonicalState
which behaves as follows
select(tree, "Accept") := tree select(tree, map_selector) := { (l ↦ select(subt, subs)) | (l ↦ subt) ∈ tree ∧ (l ↦ subs) ∈ map_selector }
- A function
into_hashtree
mapping(Partial)CanonicalState
toStateHashTree
:
into_hashtree : LabeledTree → StateHashTree
- A function digest to derive a Digest representing a given StateHashTree:
digest : StateHashTree → Digest
- A function witness to derive a Witness from a StateHashTree and a Selector:
witness : StateHashTree × Selector → Witness
- A function derive_witness to derive a Witness from a Witness and a PartialCanonicalState so that it is a valid witness for the given partial canonical state:
derive_witness : Witness × PartialCanonicalState → Witness | Error
- A function recompute_digest to compute a Digest from a PartialCanonicalState and a Witness:
recompute_digest : PartialCanonicalState × Witness → Digest | Error
Required Properties
- Correctness: We require that
∀ c ∈ CanonicalState, (1) ∀ s1, s2 ∈ Selector, (2) ∀ h = into_hashtree(c), (3) ∀ d = digest(h), (4) ∀ c1 = select(c, s1), (5) ∀ c2 = select(c1, s2), (6) ∀ w1 = witness(h, s1), (7) ∀ w2 = derive_witness(w1, c2), (8) it holds that d = recompute_digest(c1, w1) ∧ (9) d = recompute_digest(c2, w2) (10)
- Collision resistance: It is intractable to come up with
c0 ∈ CanonicalState | PartialCanonicalState
,c1 ∈ CanonicalState | PartialCanonicalState
andw ∈ Witness
where it holds that
∄ s ∈ Selector : c1 = select(c0, s) ∧ (1) h = into_hashtree(c0) ∧ (2) digest(h) = recompute_digest(c1, w). (3)
State Hash Tree
Conversion from LabeledTree
The conversion described in this section turns some t : LabeledTree
into a corresponding binary HashTree representing t
. Intuitively, the node types Fork
and Leaf
are used to enforce a binary tree structure, whereas the node type LabeledNode
is used to appropriately reflect the labels present in the LabeledTree
.
Recall that a labeled tree is represented as a partial map LabeledTree = Content | Label ↦ LabeledTree
which, informally speaking, allows to define an individual node in terms of its outgoing labeled edges pointing to other nodes of the tree, where a node may again be a LabeledTree
or it may be Content
in case it is a leaf. Concretely, a certain tree could look as follows:
T = { (label_1 ↦ subtree_1), (label_2 ↦ subtree_2), ..., (label_n ↦ subtree_n) }
Intermediate Representation 1: Ordering
First, we require an ordering function, which assigns an order to the elements in a partial map based on their labels. We let this order simply be lexicographic ordering of the labels. We write the tuples containing the edges as sequences as opposed to writing them as a set to indicate that they already went through ordering, i.e., whenever we use this notation, we implicitly assume that this conversion has already taken place.
We illustrate how ordering is supposed to work based on an example. Assume a tree T
defined as
{ ("c" ↦ "demo3"), ("a" ↦ "demo1"), ("d" ↦ "demo4"), ("b" ↦ { ("x" ↦ "demo2.1"), ("z" ↦ "demo2.3"), ("y" ↦ "demo2.2")} ), ("e" ↦ "demo5") }
After ordering, this tree would be represented as
( ("a" ↦ "demo1"), ("b" ↦ ( ("x" ↦ "demo2.1"), ("y" ↦ "demo2.2"), ("z" ↦ "demo2.3") ) ), ("c" ↦ "demo3"), ("d" ↦ "demo4"), ("e" ↦ "demo5") )
Intermediate Representation 2: Binary State Trees
To describe the conversion, we introduce an intermediate representation which has the same form as a hash tree but the inner nodes do not have associated hashes and the leaves contain Content as opposed to some Hash. In particular we have:
BinaryStateTree = LabeledNode | Fork | Leaf LabeledNode = Label × BinaryStateTree Fork = BinaryStateTree × BinaryStateTree Leaf = Content
The conversion from an ordered version of CanonicalState
to a BinaryStateTree
, is defined as follows:
// Convert LabeledTree into LabeledNodes convert(( (l_1 ↦ s_1), ..., (l_n ↦ s_n) )) := expand( LabeledNode(l_1, convert(s_1)), ..., LabeledNode(l_n, convert(s_n)) ) convert(l : Leaf) := l // Split in first k elements to the left, rest to the right, where k is highest power of two < n expand(c_1, c_2, ..., c_n) := Fork( expand(c_1, ..., c_k), expand(c_k+1,...,c_n) ) where k := 2^max{ x ∈ ℕ : 2^x < n } // Base case expand(c) := c
An alternative representation of the expand logic is to recursively group pairs of nodes in forks from left to right, starting at the leaves. In case there is an odd number of nodes one keeps the last node separate and includes it into a fork on the next possible level of the tree. We additionally include this algorithm (which behaves equivalently to the one above) because it may give pointers for efficient implementations of the conversion.
// Expand even number of child nodes expand(c_1, c_2, ..., c_2n-1, c_2n) := expand( Fork(c_1, c_2), ..., Fork(c_2n-1, c_2n) ) // Expand odd number of child nodes expand(c_1, c_2, ..., c_2n-1, c_2n, c_2n+1) := expand( Fork(c_1, c_2), ..., Fork(c_2n-1, c_2n), c_2n+1 ) // Base case expand(c) := c
Consider the ordered tree from the example above:
T = ( ("a" ↦ "demo1"), ("b" ↦ ( ("x" ↦ "demo2.1"), ("y" ↦ "demo2.2"), ("z" ↦ "demo2.3") ) ), ("c" ↦ "demo3"), ("d" ↦ "demo4"), ("e" ↦ "demo5") )