Difference between revisions of "IC state manager"

From Internet Computer Wiki
Jump to: navigation, search
 
(20 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
== Overview ==
 
== Overview ==
  
The Internet Computer (IC) achieves its security and fault tolerance by replicating computation across node machines located in various independent data centers across the world. For scalability reasons, the Internet Computing Protocol (ICP) composes the IC of multiple independent subnets. Each subnet can be viewed as an independent replicated state machine that replicates its state over a subset of all the available nodes.
+
The Internet Computer (IC) achieves its security and fault tolerance by replicating computation physical hardware devices run by independent node providers distributed across data centers globally. For scalability reasons, the Internet Computing Protocol (ICP) composes the IC of multiple independent subnets. Each subnet can be viewed as an independent replicated state machine that replicates its state over a subset of all the available nodes.
  
 
Roughly speaking, replication is achieved by having the two lower ICP layers (P2P & Consensus) agree on blocks containing batches of messages to be executed, and then having the two upper ICP layers (Message Routing & Execution) execute them. Blocks are organized as a chain, where each block builds on the previous block. Each Block has an associated height in the chain and one can look at execution of a batch of messages corresponding to the agreed upon Block at height <math>x</math> by the upper layers as taking the replicated state of version <math>x-1</math>, and "applying" the batch to it to obtain replicated state of version <math>x</math>.
 
Roughly speaking, replication is achieved by having the two lower ICP layers (P2P & Consensus) agree on blocks containing batches of messages to be executed, and then having the two upper ICP layers (Message Routing & Execution) execute them. Blocks are organized as a chain, where each block builds on the previous block. Each Block has an associated height in the chain and one can look at execution of a batch of messages corresponding to the agreed upon Block at height <math>x</math> by the upper layers as taking the replicated state of version <math>x-1</math>, and "applying" the batch to it to obtain replicated state of version <math>x</math>.
  
The state manager is the component that maintains a versioned repository of replicated state and takes care of its certification so that other stakeholders can verify the authenticity of (parts of) the state. This page presents an abstract model of the state manager and the replicated state, and formally defines how the abstract model and the implementation need to relate to each other. We also rely on this model to define the behavior of other components that modify the replicated state, e.g., [[IC message routing layer|message routing]].
+
The state manager is the component that maintains a versioned repository of replicated state and takes care of its certification so that other stakeholders can verify the authenticity of (parts of) the state. This page presents an abstract model of the state manager and the replicated state, and formally defines how the abstract model and the implementation need to relate to each other. This model is also needed to define the behavior of other components that modify the replicated state, e.g., [[IC message routing layer|message routing]].
  
 
The state manager also provides capabilities to sync state with other replicas in the same subnet so that replicas that have fallen behind can catch up. The latter is not yet covered in this page.
 
The state manager also provides capabilities to sync state with other replicas in the same subnet so that replicas that have fallen behind can catch up. The latter is not yet covered in this page.
Line 11: Line 11:
 
=== Remarks and Required Prior Knowledge ===
 
=== Remarks and Required Prior Knowledge ===
 
* The goal of this document is to provide the next level of detail compared to the material available for message routing in the [https://internetcomputer.org/how-it-works "How it works" section of internetcomputer.org]. So it is recommended to study the material available there first.
 
* The goal of this document is to provide the next level of detail compared to the material available for message routing in the [https://internetcomputer.org/how-it-works "How it works" section of internetcomputer.org]. So it is recommended to study the material available there first.
* We do not give a comprehensive type definition of the replicated state in this document, but leave the definition of the relevant parts to the components that modify the state, e.g., [[IC message routing layer|message routing]]. Sometimes we may rely on definitions made there.
+
* There is no comprehensive type definition of the replicated state in this document, but there is the definition of the relevant parts to the components that modify the state, e.g., [[IC message routing layer|message routing]].  
* The documentation provided in this page may slightly deviate from the current implementation in terms of API as well as naming of functions, variables, etc. However, it still conveys the high-level ideas required to understand how the component itself works and how it interacts with other components.
+
* Also see [https://mmapped.blog/posts/08-ic-xnet.html this] and [https://mmapped.blog/posts/02-ic-state-machine-replication.html this] blog post for some relevant and easier to digest background information.
 +
* The documentation provided in this page may slightly deviate from the current implementation in terms of API as well as naming of functions, variables, etc. However, it still conveys the high-level ideas required to understand how the component itself works and how it interacts with other components. The implementation also contains several optimizations which are, however, not important for the conceptual overview here and therefore skipped.
 +
* The notation used in this page is described [[Notation|here]].
  
 
=== Versioning of State ===
 
=== Versioning of State ===
The state is versioned in a way that is in line with the batch numbers of the batches processed by the deterministic state machine. In particular this means that a state labeled with height <math>h</math> is exactly the state resulting from a processing cycle of the deterministic state machine processing a batch h using the replicated state labeled with Height <math>h - 1</math>. For an overview of such a processing cycle, we refer the reader to the page on the [[IC message routing layer|message routing layer]].
+
The state is versioned in a way that is in line with the batch numbers of the batches processed by the deterministic state machine. In particular this means that a state labeled with height <math>h</math> is exactly the state resulting from a processing cycle of the deterministic state machine processing a batch h using the replicated state labeled with Height <math>h - 1</math>. For an overview of such a processing cycle, refer to the page on the [[IC message routing layer|message routing layer]].
 +
 
 +
== Component API ==
 +
 
 +
This sections lists the API functions of the State Manager together with informal descriptions of what they do. For a formal definition of the expected behavior of the API refer to the section Operation of the State Manager. They are split into three groups corresponding to the State Manger’s core functionalities.
 +
 
 +
=== Management of Replicated State ===
 +
 
 +
* <code>get_state() → ReplicatedState</code>: Returns the most recent available state.
 +
 
 +
* <code>get_state_at(height : Height) → ReplicatedState</code>: Returns the state at the specified Height.
 +
 
 +
* <code>latest_state_height() → Height</code>: Returns the height corresponding to the latest available state.
 +
 
 +
* <code>list_state_heights(cert_mask : CertificationMask) → Vec</code>: Returns a list of heights corresponding to states according to the selection. The CertificationMask type is defined as follows:
 +
<nowiki>CertificationMask : { CERT_CERTIFIED,  // Select states with completed certification
 +
                      CERT_UNCERTIFIED, // Select states with pending certification
 +
                      CERT_ANY          // Select both
 +
                    }</nowiki>
 +
 
 +
* <code>remove_state_at(height : Height)</code>: Remove state at the specified height (if any).
 +
 
 +
* <code>remove_states_below(height : Height)</code>: Remove all states having a height ≤ height.
 +
 
 +
* <code>commit_and_certify(s : ReplicatedState, height : Height, requires_full_state_hash : bool)</code>: Commits the the given state at the specified height and triggers certification of parts of it. Depending on whether requires_full_state_hash is set or not, it also triggers computation of a CryptoHashOfState representing the full state.
 +
 
 +
=== Certification of State ===
 +
 
 +
The API functions described in this section are generally used to certify state that is accessible by users and/or other subnets. The state manager is responsible for computing hashes representing the respective parts of the state whereas consensus will create the threshold signatures on the hash on behalf of the subnet and deliver the certifications once this has happened.
 +
 
 +
* <code>list_state_hashes_to_certify() → Vec<(Height, CryptoHashOfPartialState)></code>: Lists all state hashes which currently still need certification.
 +
 
 +
* <code>deliver_state_certification(certification : Certification)</code>: Delivers a certification corresponding to some state hash previously obtained from the function above.
 +
 
 +
* <code>subnets_with_certified_streams() → Set</code>: Returns the set of subnets with streams available in the most recent partially certified state.
 +
 
 +
* <code>derive_certified_stream(registry_version : RegistryVersion, own_subnet : SubnetId, remote_subnet : SubnetId, cs : CertifiedStream, begin : ℕ, end : ℕ, size_limit : ℕ) → CertifiedStreamSlice</code>: Derives a certified stream slice from a given certified stream slice. This function will always return an authentic certified stream slice or an error.
 +
 
 +
* <code>encode_certified_stream(remote_subnet : SubnetId, begin : StreamIndex, msg_limit : ℕ, size_limit : ℕ) → CertifiedStreamSlice</code>: Encodes a certified stream slice into its transport format.
 +
 
 +
* <code>decode_certified_stream(registry_version : RegistryVersion, own_subnet : SubnetId, remote_subnet : SubnetId, cs : CertifiedStreamSlice) → StreamSlice</code>: Decodes a certified stream slice from its transport format and strips off the signature related parts. This function fails if the given certified stream slice is not authentic.
 +
 
 +
* <code>decode_valid_certified_stream(own_subnet : SubnetId, cs : CertifiedStreamSlice) → StreamSlice</code>: Decodes a certified stream slice from its transport format and strips off the signature related parts.
 +
 
 +
=== State Integrity and Authentication for Nodes ===
 +
 
 +
As opposed to the certification in the previous section the API functions described in this section deal with hashes/certifications that are later going to be used by nodes to ensure they obtained an authentic copy of the state whenever they newly join or catch up after having fallen behind. Again state manager is responsible for computing the hashes while consensus is responsible for signing them on behalf of the subnet.
 +
 
 +
* <code>get_full_state_hash_at(height : Height) → CryptoHashOfState</code>: Returns the hash corresponding to the full state at the specified height, or nothing. This function is guaranteed to eventually return a hash in case requires_full_state_hash was set for the specified height upon commit_and_certify.
  
 
== Abstract State Representation ==
 
== Abstract State Representation ==
An implementation of the state manager distinguishes multiple representations of replicated state. In this section we give an abstract description of the various representations and discuss the requirements we need to impose on their relations.
+
An implementation of the state manager distinguishes multiple representations of replicated state. In this section This section gives an abstract description of the various representations and discuss the requirements needed to impose on their relations.
  
 
* <code>ReplicatedState</code>, which is the implementation’s internal representation of replicated state.
 
* <code>ReplicatedState</code>, which is the implementation’s internal representation of replicated state.
Line 26: Line 76:
 
* <code>StateHashTree</code> which is a structured digest of (parts of) canonical state, universally understandable across implementations.
 
* <code>StateHashTree</code> which is a structured digest of (parts of) canonical state, universally understandable across implementations.
  
The relation between the different representations is subsumed below. The space of instances of type <code>ReplicatedState</code> can be partitioned into multiple equivalence classes. The criterion for whether two <code>ReplicatedState</code> instances belong to the same equivalence class is whether or not they map to the same <code>CanonicalState</code>. More formally, we use the equivalence relation <math>R</math> defined as follows:
+
The relation between the different representations is subsumed below. The space of instances of type <code>ReplicatedState</code> can be partitioned into multiple equivalence classes. The criterion for whether two <code>ReplicatedState</code> instances belong to the same equivalence class is whether or not they map to the same <code>CanonicalState</code>. More formally, the equivalence relation <math>R</math> is used and defined as follows:
  
 
<math>
 
<math>
Line 45: Line 95:
  
 
From a <code>StateHashTree</code>, one can derive a <code>Digest</code>, representing a <code>StateHashTree</code> even more compactly, as well as a <code>Witness</code>. Deriving the latter requires additional information on the <code>(Partial)CanonicalState</code> the witness should be valid for, i.e., so that one can use the <code>Witness</code> and the <code>(Partial)CanonicalState</code> to recompute the <code>Digest</code> corresponding to the original <code>StateHashTree</code>.
 
From a <code>StateHashTree</code>, one can derive a <code>Digest</code>, representing a <code>StateHashTree</code> even more compactly, as well as a <code>Witness</code>. Deriving the latter requires additional information on the <code>(Partial)CanonicalState</code> the witness should be valid for, i.e., so that one can use the <code>Witness</code> and the <code>(Partial)CanonicalState</code> to recompute the <code>Digest</code> corresponding to the original <code>StateHashTree</code>.
 +
 +
=== Batch processing ===
 +
The figure below illustrates the operations of the state manager during batch processing, and, in doing so, also defines how processing a batch by the implementation should relate to the application of a batch to <code>CanonicalState</code> as defined in the specification of the message routing and execution environment layers (the blue arrows in the figure). Usually, a replica keeps "applying" the batches passed by consensus to some representative of an equivalence class in <code>ReplicatedState_R</code> to obtain the representative of an equivalence class in <code>ReplicatedState_R</code> corresponding to the next state version. If a replica falls too far behind it is required to catch up with the help of other replicas because the batches it requires to move forward are no longer available. This is done by obtaining a certain version of <code>CanonicalState</code> from other replicas via the state sync protocol.
 +
 +
[[File:Batch-processing.svg|thumb]]
 +
 +
=== Type definitions ===
 +
<code>CanonicalState</code> and <code>PartialCanonicalState</code> can be abstractly viewed as trees and represented as partial maps. The figure below gives an example of a <code>(Partial)CanonicalState</code> tree. Note that the labels used here are just exemplary and don't correspond to the actual state tree exposed by the implementation.
 +
 +
[[File:State-tree.svg|thumb]]
 +
 +
Formally, (Partial)CanonicalState is defined as:
 +
<nowiki>LabeledTree = Content | Label ↦ LabeledTree
 +
CanonicalState = LabeledTree
 +
PartialCanonicalState = LabeledTree</nowiki>
 +
 +
Both, <code>CanonicalState</code> and <code>PartialCanonicalState</code>, are aliases for the same type. However, they are named differently to capture their slightly different semantic.
 +
 +
The type of <code>Content</code> in the mapping is left abstract for now. It is a place holder for the concrete data placed in the state by the implementation.
 +
 +
=== Mappings between types ===
 +
The following mappings between the different state representations are required. ''No explicit algorithms are given for some of the mappings defined below but only provide abstract requirements on them. In security proofs and arguments, assumptions are explicitly stated on how the relevant functions are implemented and do the security proofs relative to them.''
 +
 +
* A bijective function between ReplicatedState_R and CanonicalState that is efficiently computable in both directions. This is modeled as two efficiently computable, injective functions:
 +
<nowiki>to_canonical  : ReplicatedState_R → CanonicalState
 +
from_canonical : CanonicalState → ReplicatedState_R</nowiki>
 +
where it holds that for all <code>c : CanonicalState</code> it is the case that <code>c = to_canonical(from_canonical(c))</code>. Note that these two functions will be implicitly defined by the implementation.
 +
 +
* A function <code>select</code> to derive <code>PartialCanonicalState</code> from <code>(Partial)CanonicalState</code> subject to some <code>Selector</code>:
 +
<nowiki>Selector = "Accept" | Label ↦ Selector
 +
select : LabeledTree × Selector → PartialCanonicalState
 +
</nowiki>
 +
which behaves as follows
 +
<nowiki>select(tree, "Accept") := tree
 +
select(tree, map_selector) := { (l ↦ select(subt, subs))
 +
                                    | (l ↦ subt) ∈ tree ∧
 +
                                      (l ↦ subs) ∈ map_selector
 +
                              }</nowiki>
 +
* A function <code>into_hashtree</code> mapping <code>(Partial)CanonicalState</code> to <code>StateHashTree</code>:
 +
<nowiki>
 +
into_hashtree : LabeledTree → StateHashTree</nowiki>
 +
 +
* A function digest to derive a Digest representing a given StateHashTree:
 +
<nowiki>digest : StateHashTree → Digest</nowiki>
 +
 +
* A function witness to derive a Witness from a StateHashTree and a Selector:
 +
<nowiki>witness : StateHashTree × Selector → Witness</nowiki>
 +
 +
* A function derive_witness to derive a Witness from a Witness and a PartialCanonicalState so that it is a valid witness for the given partial canonical state:
 +
<nowiki>derive_witness : Witness × PartialCanonicalState → Witness | Error</nowiki>
 +
 +
* A function recompute_digest to compute a Digest from a PartialCanonicalState and a Witness:
 +
<nowiki>recompute_digest : PartialCanonicalState × Witness → Digest | Error</nowiki>
 +
 +
==== Required Properties ====
 +
* '''Correctness''' requires that
 +
<nowiki>∀ c ∈ CanonicalState,                                              (1)
 +
∀ s1, s2 ∈ Selector,                                              (2)
 +
∀ h = into_hashtree(c),                                            (3)
 +
∀ d = digest(h),                                                  (4)
 +
∀ c1 = select(c, s1),                                              (5)
 +
∀ c2 = select(c1, s2),                                            (6)
 +
∀ w1 = witness(h, s1),                                            (7)
 +
∀ w2 = derive_witness(w1, c2),                                    (8)
 +
 +
it holds that
 +
 +
d = recompute_digest(c1, w1) ∧                                    (9)
 +
d = recompute_digest(c2, w2)                                      (10)</nowiki>
 +
 +
* '''Collision resistance''': It is intractable to come up with <code>c0 ∈ CanonicalState | PartialCanonicalState</code>, <code>c1 ∈ CanonicalState | PartialCanonicalState</code> and <code>w ∈ Witness</code> where it holds that
 +
<nowiki>∄ s ∈ Selector : c1 = select(c0, s) ∧                              (1)
 +
h = into_hashtree(c0) ∧                                            (2)
 +
digest(h) = recompute_digest(c1, w).                              (3)</nowiki>
 +
 +
=== State Hash Tree ===
 +
 +
==== Conversion from LabeledTree ====
 +
The conversion described in this section turns some <code>t : LabeledTree</code> into a corresponding binary HashTree representing <code>t</code>. Intuitively, the node types <code>Fork</code> and <code>Leaf</code> are used to enforce a binary tree structure, whereas the node type <code>LabeledNode</code> is used to appropriately reflect the labels present in the <code>LabeledTree</code>.
 +
 +
Recall that a labeled tree is represented as a partial map <code>LabeledTree = Content | Label ↦ LabeledTree</code> which, informally speaking, allows to define an individual node in terms of its outgoing labeled edges pointing to other nodes of the tree, where a node may again be a <code>LabeledTree</code> or it may be <code>Content</code> in case it is a leaf. Concretely, a certain tree could look as follows:
 +
[[File:Tree.svg|thumb|Graphical illustration of T]]
 +
<nowiki>T = { (label_1 ↦ subtree_1), (label_2 ↦ subtree_2), ..., (label_n ↦ subtree_n) }</nowiki>
 +
 +
===== Intermediate Representation 1: Ordering =====
 +
First, an ordering function is required, which assigns an order to the elements in a partial map based on their labels. This order can simply be lexicographic ordering of the labels. Tuples containing the edges as sequences are written as opposed to writing them as a set to indicate that they already went through ordering, i.e., whenever this notation is used, it is implicitly assumed that this conversion has already taken place.
 +
 +
Below is an illustration of how ordering is supposed to work based on an example. Assume a tree <code>T</code> defined as
 +
[[File:Tree-1.svg|thumb|Tree before ordering]]
 +
<nowiki>{ ("c" ↦ "demo3"),
 +
  ("a" ↦ "demo1"),
 +
  ("d" ↦ "demo4"),
 +
  ("b" ↦ { ("x" ↦ "demo2.1"), ("z" ↦ "demo2.3"), ("y" ↦ "demo2.2")} ),
 +
  ("e" ↦ "demo5")
 +
}</nowiki>
 +
 +
After ordering, this tree would be represented as
 +
[[File:Tree-2.svg|thumb|Tree after ordering]]
 +
<nowiki>( ("a" ↦ "demo1"),
 +
  ("b" ↦ ( ("x" ↦ "demo2.1"), ("y" ↦ "demo2.2"), ("z" ↦ "demo2.3") ) ),
 +
  ("c" ↦ "demo3"),
 +
  ("d" ↦ "demo4"),
 +
  ("e" ↦ "demo5")
 +
)</nowiki>
 +
 +
===== Intermediate Representation 2: Binary State Trees =====
 +
To describe the conversion, intermediate representation is introduced which has the same form as a hash tree but the inner nodes do not have associated hashes and the leaves contain Content as opposed to some Hash. In particular:
 +
<nowiki>BinaryStateTree = LabeledNode | Fork | Leaf
 +
 +
LabeledNode = Label × BinaryStateTree
 +
Fork = BinaryStateTree × BinaryStateTree
 +
Leaf = Content</nowiki>
 +
The conversion from an ordered version of <code>CanonicalState</code> to a <code>BinaryStateTree</code>, is defined as follows:
 +
<nowiki>// Convert LabeledTree into LabeledNodes
 +
convert(( (l_1 ↦ s_1), ..., (l_n ↦ s_n) )) :=
 +
    expand(
 +
          LabeledNode(l_1, convert(s_1)), ..., LabeledNode(l_n, convert(s_n))
 +
          )
 +
 +
convert(l : Leaf) := l
 +
 +
// Split in first k elements to the left, rest to the right, where k is highest power of two < n
 +
expand(c_1, c_2, ..., c_n) :=
 +
    Fork(
 +
        expand(c_1, ..., c_k),
 +
        expand(c_k+1,...,c_n)
 +
        )
 +
    where k := 2^max{ x ∈ ℕ : 2^x < n }
 +
 +
// Base case
 +
expand(c) := c</nowiki>
 +
 +
An alternative representation of the expand logic is to recursively group pairs of nodes in forks from left to right, starting at the leaves. In case there is an odd number of nodes one keeps the last node separate and includes it into a fork on the next possible level of the tree. This algorithm is additionally included (which behaves equivalently to the one above) because it may give pointers for efficient implementations of the conversion.
 +
<nowiki>// Expand even number of child nodes
 +
expand(c_1, c_2, ..., c_2n-1, c_2n) :=
 +
    expand(
 +
          Fork(c_1, c_2),
 +
          ...,
 +
          Fork(c_2n-1, c_2n)
 +
          )
 +
 +
// Expand odd number of child nodes
 +
expand(c_1, c_2, ..., c_2n-1, c_2n, c_2n+1) :=
 +
    expand(
 +
          Fork(c_1, c_2),
 +
          ...,
 +
          Fork(c_2n-1, c_2n),
 +
          c_2n+1
 +
          )
 +
 +
// Base case
 +
expand(c) := c</nowiki>
 +
 +
Consider the ordered tree from the example above:
 +
[[File:Tree-3.svg|thumb|Expanded tree <code>T' = expand(T)</code>]]
 +
<nowiki>T = ( ("a" ↦ "demo1"),
 +
      ("b" ↦ ( ("x" ↦ "demo2.1"), ("y" ↦ "demo2.2"), ("z" ↦ "demo2.3") ) ),
 +
      ("c" ↦ "demo3"),
 +
      ("d" ↦ "demo4"),
 +
      ("e" ↦ "demo5")
 +
    )
 +
</nowiki>
 +
 +
===== Conversion to HashTree =====
 +
 +
Formally, a state hash tree is defined as follows.
 +
<nowiki>HashTree = LabeledNode | Fork | Leaf
 +
 +
LabeledNode = Hash × Label × HashTree
 +
Fork = Hash × HashTree × HashTree
 +
Leaf = Hash</nowiki>
 +
 +
Based on a <code>BinaryStateTree</code> the conversion to a <code>HashTree</code> is specified relative to some hash function <code>H</code>:
 +
<nowiki>domain_sep(s) := byte(|s|) || s
 +
 +
convert : Leaf → Hash
 +
convert(l) := H(domain_sep("ic-hashtree-leaf") || l)
 +
 +
convert : Fork → Hash × HashTree × HashTree
 +
convert((t1, t2)) := (H(domain_sep("ic-hashtree-fork") || convert(t1).hash || convert(t2).hash),
 +
                      convert(t1),
 +
                      convert(t2)
 +
                    )
 +
 +
convert : LabeledNode → Hash × Label × HashTree
 +
convert((l, t)) := (H(domain_sep("ic-hashtree-labeled") || l || convert(t).hash),
 +
                    l,
 +
                    convert(t)
 +
                  )</nowiki>
 +
 +
== Operation of the State Manager ==
 +
 +
Abstractly the State Manager can be viewed as a repository maintaining multiple versions of state together with some metadata. In this document this repository is represented by a field called <code>states : Height ↦ StateContainer</code>, where StateContainer is a type with multiple named fields:
 +
<nowiki>StateContainer.state : CanonicalState
 +
StateContainer.checkpoint : bool
 +
StateContainer.certification : Certification</nowiki>
 +
 +
Note that the <code>CanonicalState</code> is used in the definition above. This is because the <code>ReplicatedState</code> is defined by the implementation. Using the <code>CanonicalState</code> allows us to define the behavior while not specifying how the state is supposed to look.
 +
 +
The field <code>states</code> is initialized whenever constructing a StateManager. Initially, <code>states</code> only contains an initial state with index 0. This state, henceforth called genesis_state is either a freshly constructed, empty state or is a state which is externally provided upon startup.
 +
<nowiki>new : () → Self
 +
new() :=
 +
  StateManager {
 +
      with
 +
        └─ states := { (0 ↦ genesis_state) }
 +
  }</nowiki>
 +
 +
=== API Behavior ===
 +
 +
==== Management of State ====
 +
The following section defines the behavior of the API responsible for replicated state management. Given that all of the API functions are methods of the state manager, the self parameters are made explicit.
 +
 +
* <code>get_state</code> returns the most recent available state.
 +
<nowiki>get_state : Self → CanonicalState
 +
get_state(self) := self.states[max(dom(self.states))].state</nowiki>
 +
 +
* <code>get_state_at</code> returns the state at the given height.
 +
<nowiki>get_state_at : Self × Height → CanonicalState
 +
get_state_at(self, height) := self.states[height].state</nowiki>
 +
 +
* <code>latest_state_height</code> returns the height corresponding to the latest state.
 +
<nowiki>latest_state_height : Self → Height
 +
latest_state_height(self) := max(dom(self.states))</nowiki>
 +
 +
 +
* <code>list_state_heights</code> returns a list of heights (in ascending order) corresponding to the states satisfying the specified criteria. The ordering is a total order, applied by some order function which is assumed to be implicitly defined by the Height type.
 +
<nowiki>list_state_heights : Self × CertificationMask → Vec<Height>
 +
list_state_heights(self, cert_mask) :=
 +
    order(
 +
            { h | (h ↦ s) ∈ self.states ∧
 +
                  s.certification ≠ ∅ ∧
 +
                  (
 +
                  cert_mask = CERT_CERTIFIED ∨
 +
                  cert_mask = CERT_ANY
 +
                  )
 +
            }
 +
          ∪ { h | (h ↦ s) ∈ self.states ∧
 +
                  s.certification = ∅ ∧
 +
                  (
 +
                  cert_mask = CERT_UNCERTIFIED ∨
 +
                  cert_mask = CERT_ANY
 +
                  )
 +
            }
 +
        )</nowiki>
 +
 +
* <code>remove_state_at</code> removes the state corresponding to the given height.
 +
<nowiki>% REQUIRE: height > 0
 +
%
 +
remove_state_at : Self × Height → Self
 +
remove_state_at(self, height) :=
 +
    self with
 +
            └─ states := { (h ↦ s) | (h ↦ s) ∈ self.states ∧
 +
                                      h ≠ height
 +
                        }</nowiki>
 +
 +
* <code>remove_states_below</code> removes all states corresponding to heights smaller than or equal to the given height.
 +
<nowiki>remove_states_below : Self × Height → Self
 +
remove_states_below(self, height) :=
 +
    self with
 +
            └─ states := { (h ↦ s) | (h ↦ s) ∈ self.states ∧
 +
                                    h ≤ height ∧
 +
                                    h > 0
 +
                        }</nowiki>
 +
 +
* <code>commit_and_certify</code> creates a new entry in states with state s, height height and also stores the decision regarding whether or not it should be possible to obtain a full state hash later on (requires_full_state_hash).
 +
<nowiki>commit_and_certify : Self × CanonicalState × Height × bool → Self
 +
commit_and_certify(self, s, height, requires_full_state_hash) :=
 +
    self with
 +
            └─ states := self.states ∪ { (height ↦ StateContainer {
 +
                                                      state : s,
 +
                                                      checkpoint : requires_full_state_hash,
 +
                                                      certification : ∅
 +
                                                    }
 +
                                        )
 +
                                      }</nowiki>
 +
 +
==== Certification of State ====
 +
This section defines the behavior of the API responsible for replicated state management.
 +
 +
* <code>list_state_hashes_to_certify</code> returns the root hashes (digests) corresponding to the states which have no certification yet, ordered by the height in an ascending manner. The ordering is applied by some order function which is not explicitly described here.
 +
<nowiki>list_state_hashes_to_certify : Self → Vec<(Height, CryptoHashOfPartialState)>
 +
list_state_hashes_to_certify(self) :=
 +
    order({ (h, hash) | (h ↦ s) ∈ self.states ∧
 +
                        s.certification = ∅ ∧
 +
                        partial_state_selector = { (streams ↦ "Accept"),
 +
                                                  (ingress_history ↦ "Accept")
 +
                                                }
 +
                        partial_state = select(s.state, partial_state_selector) ∧
 +
                        tree = into_hashtree(partial_state) ∧
 +
                        hash = digest(tree)
 +
        })</nowiki>
 +
 +
* <code>deliver_state_certification</code> stores the delivered certification in the state container pointed to by self.states[certification.height].
 +
<nowiki>deliver_state_certification : Self × Certification → Self
 +
deliver_state_certification(self, certification) :=
 +
    self with
 +
            └─ self.states[certification.height].certification := certification</nowiki>
 +
 +
* <code>subnets_with_certified_streams</code> returns the set of subnets to which there is an outgoing certified stream.
 +
<nowiki>subnets_with_certified_streams : Self → Set<SubnetId>
 +
subnets_with_certified_streams(self) :=
 +
    dom(self.states[max(h | (h ↦ s) ∈ self.states ∧ s.certification ≠ ∅)]
 +
                  ["streams"]
 +
      )</nowiki>
 +
 +
* <code>encode_certified_stream</code> encodes a stream slice which is part of the latest partially certified state to be served to remote_subnet by the XNetEndpoint. First define a function which does not take the size_limit parameter and then present a function with the size_limit parameter that makes use of this function.
 +
<nowiki>encode_certified_stream_helper : Self × SubnetId × ℕ → CertifiedStreamSlice
 +
encode_certified_stream_helper(self, remote_subnet, msg_limit) :=
 +
    CertifiedStreamSlice {
 +
        with
 +
          ├─ payload := select(state_container.state, selector)
 +
          ├─ witness := witness(into_hashtree(state_container.state), selector)
 +
          └─ certification := state_container.certification
 +
        where
 +
          ├─ state_container := self.states[max(dom(self.states))]
 +
          ├─ begin_index := min(dom(self.states[max(dom(self.states))]
 +
          │                                    ["streams"]
 +
          │                                    [remote_subnet]
 +
          │                                    ["messages"]
 +
          │                    ))
 +
          ├─ end_index := min({ begin_index + msg_limit,
 +
          │                    max(dom(self.states[max(dom(self.states))]
 +
          │                                        ["streams"]
 +
          │                                        [remote_subnet]
 +
          │                                        ["messages"]
 +
          │                        ))
 +
          │                  })
 +
          ├─ msg_selector := { (i ↦ "Accept") | i ≥ begin_index ∧ i ≤ end_index ∧ i ≠ ∅ }
 +
          └─ selector := { ("streams" ↦ { (remote_subnet ↦ { ("hdr" ↦ "Accept"),
 +
                                                              ("messages" ↦ msg_selector)
 +
                          } ) } ) }
 +
    }
 +
)</nowiki>
 +
The actual encoding function is defined relative to a size operator that may be defined to measure the absolute byte size of an encoding of the resulting CertifiedStreamSlice, but may also be defined to follow some different metric.
 +
<nowiki>encode_certified_stream : Self × SubnetId × ℕ × ℕ → CertifiedStreamSlice
 +
encode_certified_stream(self, remote_subnet, msg_limit, size_limit) :=
 +
    encode_certified_stream_helper(self, remote_subnet, i)
 +
        where
 +
          └─ i = max(i | i ≤ msg_limit ∧
 +
                          size(encode_certified_stream_helper(self, remote_subnet, i)) ≤ size_limit
 +
                    )</nowiki>
 +
 +
* <code>decode_valid_certified_stream</code> decodes a CertifiedStreamSlice that is known to be valid into a StreamSlice. It is used by (1) the block making logic to decode the CertifiedStreamSlices contained in past payloads, and (2) by message routing to decode the CertifiedStreamSlices received in a Batch for processing. To improve readability, a helper function is defined that is then used inside the description of the decoding function.
 +
<nowiki>new_queue: (StreamIndex ↦ Message) → Queue<StreamIndex> | None
 +
new_queue(stream_msgs) :=
 +
    if |stream_msgs| = 0 {
 +
        None
 +
    } else {
 +
        Queue<StreamIndex> {
 +
            with
 +
            ├─ elements := stream_msgs
 +
            └─ next_index := max(dom(stream_msgs)) + 1
 +
        }
 +
    }</nowiki>
 +
 +
* <code>decode_valid_certified_stream : SubnetId × CertifiedStreamSlice → StreamSlice</code>
 +
<nowiki>decode_valid_certified_stream(own_subnet, slice) :=
 +
    StreamSlice {
 +
        with
 +
          ├─ header := StreamHeader {
 +
          │                with
 +
          │                  ├─ begin := hdr_begin
 +
          │                  ├─ end := hdr_end
 +
          │                  └─ signals := stream_signals
 +
          │            }
 +
          └─ messages := new_queue(stream_msgs)
 +
        where
 +
          ├─ stream_msgs := slice.payload["streams"][own_subnet]["messages"]
 +
          ├─ stream_signals := slice.payload["streams"][own_subnet]["hdr"]["signals"]
 +
          ├─ hdr_begin := slice.payload["streams"][own_subnet]["hdr"]["begin"]
 +
          └─ hdr_end := slice.payload["streams"][own_subnet]["hdr"]["end"]
 +
    }</nowiki>
 +
 +
* <code>decode_certified_stream</code> decodes a CertifiedStreamSlice into a StreamSlice to be able to check whether an XNetPayload conforms with the block making rules.
 +
<nowiki>% FAIL IF:    |slice.payload["streams"]| ≠ 1
 +
%          ∨ |slice.payload["streams"][own_subnet]| < 1
 +
%          ∨ ∃ (i ↦ ⋅) ∈ slice.payload["streams"][own_subnet]["messages"] :
 +
%            ├─ (i + 1 ↦ ⋅) ∉ slice.payload["streams"][own_subnet]["messages"] ∧
 +
%            └─ i ∉ max(dom(slice.payload["streams"][own_subnet]["messages"]))
 +
%          ∨ Consensus.verifier.is_valid(remote_subnet,
 +
%                                        slice.certification,
 +
%                                        registry_version,
 +
%                                        recompute_digest(slice.payload, slice.witness),
 +
%                                        ) ≠ true
 +
%
 +
decode_certified_stream : RegistryVersion × SubnetId × SubnetId × CertifiedStreamSlice → StreamSlice
 +
decode_certified_stream(registry_version, own_subnet, remote_subnet, slice) :=
 +
    decode_valid_certified_stream(own_subnet, slice)</nowiki>
 +
 +
* <code>derive_certified_stream</code> is intended to be used by the Block Maker to derive a sub-slice from a slice previously obtained from the remote subnet. Similar to the encode_certified_stream method this function is defined in two steps.
 +
<nowiki>derive_certified_stream_helper : RegistryVersion ×
 +
                                SubnetId ×
 +
                                SubnetId ×
 +
                                CertifiedStreamSlice ×
 +
                                ℕ ×
 +
                                ℕ
 +
                                → CertifiedStreamSlice
 +
derive_certified_stream_helper(registry_version,
 +
                              own_subnet,
 +
                              remote_subnet,
 +
                              slice,
 +
                              begin,
 +
                              end)
 +
    := slice with
 +
                ├─ payload := select(slice.payload, selector)
 +
                └─ witness := derive(slice.witness, select(slice.payload, selector))
 +
            where
 +
                ├─ end_index := min({ end,
 +
                │                    max(dom(slice["streams"][own_subnet]["messages"]))
 +
                │                  })
 +
                ├─ msg_selector := { (i ↦ "Accept") | i ≥ begin ∧ i ≤ end_index }
 +
                └─ selector := { ("streams" ↦ { (own_subnet ↦ { ("hdr" ↦ "Accept"),
 +
                                                                ("messages" ↦ msg_selector)
 +
                              } ) } ) }</nowiki>
 +
The actual function to derive substreams is defined relative to a size operator that may be defined to measure the absolute byte size of an encoding of the resulting CertifiedStreamSlice, but may also be defined to follow some different metric.
 +
<nowiki>% FAIL IF:  |slice.payload["streams"]| ≠ 1 ∨
 +
%          |slice.payload["streams"][own_subnet]| < 1 ∨
 +
%          Consensus.verifier.is_valid(remote_subnet,
 +
%                                      slice.certification,
 +
%                                      registry_version,
 +
%                                      recompute_digest(slice.payload, slice.witness),
 +
%                                      ) ≠ true
 +
%
 +
% REQUIRE:  begin ≥ min(dom(slice.payload["streams"][own_subnet]["messages"]))
 +
%
 +
derive_certified_stream : RegistryVersion ×
 +
                          SubnetId ×
 +
                          SubnetId ×
 +
                          CertifiedStreamSlice ×
 +
                          ℕ ×
 +
                          ℕ ×
 +
                          ℕ
 +
                          → CertifiedStreamSlice
 +
derive_certified_stream(registry_version,
 +
                        own_subnet,
 +
                        remote_subnet,
 +
                        slice,
 +
                        begin,
 +
                        end,
 +
                        size_limit)
 +
    := derive_certified_stream_helper(registry_version,
 +
                                      own_subnet,
 +
                                      remote_subnet,
 +
                                      slice,
 +
                                      begin,
 +
                                      i)
 +
          where
 +
              └─ i = max(i | i ≤ end ∧
 +
                            size(derive_certified_stream_helper(registry_version,
 +
                                                                own_subnet,
 +
                                                                remote_subnet,
 +
                                                                slice,
 +
                                                                begin,
 +
                                                                i)
 +
                                ) ≤ size_limit
 +
                        )</nowiki>
 +
 +
==== State Integrity and Authentication for Nodes ====
 +
 +
* <code>get_full_state_hash_at</code> returns the root hash (digest) of the hash tree corresponding to the full state at height h if requires_full_state_hash was set upon the corresponding call to commit_and_certify.
 +
<nowiki>get_full_state_hash_at : Self × Height → CryptoHashOfState
 +
get_full_state_hash_at(self, height) := hash | (height ↦ s) ∈ self.states ∧
 +
                                              s.checkpoint = true ∧
 +
                                              tree = into_hashtree(s.state) ∧
 +
                                              hash = digest(tree)</nowiki>
 +
Note that from an abstract point of view it would not be required to decide early on (i.e., upon commit_and_certify) whether or not a hash of the full state will be required at some point. However, it is still written in this way as this early decision allows an optimized implementation to start the computation of the root hash already before get_full_state_hash_at is called.

Latest revision as of 20:08, 13 November 2023

Overview

The Internet Computer (IC) achieves its security and fault tolerance by replicating computation physical hardware devices run by independent node providers distributed across data centers globally. For scalability reasons, the Internet Computing Protocol (ICP) composes the IC of multiple independent subnets. Each subnet can be viewed as an independent replicated state machine that replicates its state over a subset of all the available nodes.

Roughly speaking, replication is achieved by having the two lower ICP layers (P2P & Consensus) agree on blocks containing batches of messages to be executed, and then having the two upper ICP layers (Message Routing & Execution) execute them. Blocks are organized as a chain, where each block builds on the previous block. Each Block has an associated height in the chain and one can look at execution of a batch of messages corresponding to the agreed upon Block at height [math]\displaystyle{ x }[/math] by the upper layers as taking the replicated state of version [math]\displaystyle{ x-1 }[/math], and "applying" the batch to it to obtain replicated state of version [math]\displaystyle{ x }[/math].

The state manager is the component that maintains a versioned repository of replicated state and takes care of its certification so that other stakeholders can verify the authenticity of (parts of) the state. This page presents an abstract model of the state manager and the replicated state, and formally defines how the abstract model and the implementation need to relate to each other. This model is also needed to define the behavior of other components that modify the replicated state, e.g., message routing.

The state manager also provides capabilities to sync state with other replicas in the same subnet so that replicas that have fallen behind can catch up. The latter is not yet covered in this page.

Remarks and Required Prior Knowledge

  • The goal of this document is to provide the next level of detail compared to the material available for message routing in the "How it works" section of internetcomputer.org. So it is recommended to study the material available there first.
  • There is no comprehensive type definition of the replicated state in this document, but there is the definition of the relevant parts to the components that modify the state, e.g., message routing.
  • Also see this and this blog post for some relevant and easier to digest background information.
  • The documentation provided in this page may slightly deviate from the current implementation in terms of API as well as naming of functions, variables, etc. However, it still conveys the high-level ideas required to understand how the component itself works and how it interacts with other components. The implementation also contains several optimizations which are, however, not important for the conceptual overview here and therefore skipped.
  • The notation used in this page is described here.

Versioning of State

The state is versioned in a way that is in line with the batch numbers of the batches processed by the deterministic state machine. In particular this means that a state labeled with height [math]\displaystyle{ h }[/math] is exactly the state resulting from a processing cycle of the deterministic state machine processing a batch h using the replicated state labeled with Height [math]\displaystyle{ h - 1 }[/math]. For an overview of such a processing cycle, refer to the page on the message routing layer.

Component API

This sections lists the API functions of the State Manager together with informal descriptions of what they do. For a formal definition of the expected behavior of the API refer to the section Operation of the State Manager. They are split into three groups corresponding to the State Manger’s core functionalities.

Management of Replicated State

  • get_state() → ReplicatedState: Returns the most recent available state.
  • get_state_at(height : Height) → ReplicatedState: Returns the state at the specified Height.
  • latest_state_height() → Height: Returns the height corresponding to the latest available state.
  • list_state_heights(cert_mask : CertificationMask) → Vec: Returns a list of heights corresponding to states according to the selection. The CertificationMask type is defined as follows:
CertificationMask : { CERT_CERTIFIED,   // Select states with completed certification
                      CERT_UNCERTIFIED, // Select states with pending certification
                      CERT_ANY          // Select both
                    }
  • remove_state_at(height : Height): Remove state at the specified height (if any).
  • remove_states_below(height : Height): Remove all states having a height ≤ height.
  • commit_and_certify(s : ReplicatedState, height : Height, requires_full_state_hash : bool): Commits the the given state at the specified height and triggers certification of parts of it. Depending on whether requires_full_state_hash is set or not, it also triggers computation of a CryptoHashOfState representing the full state.

Certification of State

The API functions described in this section are generally used to certify state that is accessible by users and/or other subnets. The state manager is responsible for computing hashes representing the respective parts of the state whereas consensus will create the threshold signatures on the hash on behalf of the subnet and deliver the certifications once this has happened.

  • list_state_hashes_to_certify() → Vec<(Height, CryptoHashOfPartialState)>: Lists all state hashes which currently still need certification.
  • deliver_state_certification(certification : Certification): Delivers a certification corresponding to some state hash previously obtained from the function above.
  • subnets_with_certified_streams() → Set: Returns the set of subnets with streams available in the most recent partially certified state.
  • derive_certified_stream(registry_version : RegistryVersion, own_subnet : SubnetId, remote_subnet : SubnetId, cs : CertifiedStream, begin : ℕ, end : ℕ, size_limit : ℕ) → CertifiedStreamSlice: Derives a certified stream slice from a given certified stream slice. This function will always return an authentic certified stream slice or an error.
  • encode_certified_stream(remote_subnet : SubnetId, begin : StreamIndex, msg_limit : ℕ, size_limit : ℕ) → CertifiedStreamSlice: Encodes a certified stream slice into its transport format.
  • decode_certified_stream(registry_version : RegistryVersion, own_subnet : SubnetId, remote_subnet : SubnetId, cs : CertifiedStreamSlice) → StreamSlice: Decodes a certified stream slice from its transport format and strips off the signature related parts. This function fails if the given certified stream slice is not authentic.
  • decode_valid_certified_stream(own_subnet : SubnetId, cs : CertifiedStreamSlice) → StreamSlice: Decodes a certified stream slice from its transport format and strips off the signature related parts.

State Integrity and Authentication for Nodes

As opposed to the certification in the previous section the API functions described in this section deal with hashes/certifications that are later going to be used by nodes to ensure they obtained an authentic copy of the state whenever they newly join or catch up after having fallen behind. Again state manager is responsible for computing the hashes while consensus is responsible for signing them on behalf of the subnet.

  • get_full_state_hash_at(height : Height) → CryptoHashOfState: Returns the hash corresponding to the full state at the specified height, or nothing. This function is guaranteed to eventually return a hash in case requires_full_state_hash was set for the specified height upon commit_and_certify.

Abstract State Representation

An implementation of the state manager distinguishes multiple representations of replicated state. In this section This section gives an abstract description of the various representations and discuss the requirements needed to impose on their relations.

  • ReplicatedState, which is the implementation’s internal representation of replicated state.
  • (Partial)CanonicalState which is a representation of (parts of) replicated state, universally understandable across implementations.
  • StateHashTree which is a structured digest of (parts of) canonical state, universally understandable across implementations.

The relation between the different representations is subsumed below. The space of instances of type ReplicatedState can be partitioned into multiple equivalence classes. The criterion for whether two ReplicatedState instances belong to the same equivalence class is whether or not they map to the same CanonicalState. More formally, the equivalence relation [math]\displaystyle{ R }[/math] is used and defined as follows:

[math]\displaystyle{ (x, y) \in R \Longleftrightarrow \texttt{to_canonical}(x) = \texttt{to_canonical}(y). }[/math]

This equivalence relation defines a type ReplicatedState_R that contains all the equivalence classes [x] over ReplicatedState with respect to [math]\displaystyle{ R }[/math], i.e.,

[math]\displaystyle{ [x] = \{ y~|~y ∈ \texttt{ReplicatedState}~∧~(x, y) ∈ R \}. }[/math]

State-types.svg

An object of type ReplicatedState, i.e., a representative of one of the equivalence classes in ReplicatedState_R, is created whenever the deterministic state machine implemented by the message routing and the execution layers calls the commit_and_certify function on the state manager. This state can then be internally converted to CanonicalState to be provided to other replicas within the state sync protocol or to be further converted to PartialCanonicalState and, e.g., made available to block makers on other subnets who need to include the subnet-to-subnet streams in blocks.

To have a compact and easy to transfer representation that is useful to verify the integrity or compare different versions of (Partial)CanonicalState, it can be converted into a StateHashTree. The conversion function is a mapping that needs to be collision resistant, meaning that, for some cs1 : (Partial)CanonicalState corresponding to some h : StateHashTree it is intractable to find a different cs2 : (Partial)CanonicalState that hashes to the same h. Intuitively, this means that a replica that gets hold of some h : StateHashTree can verify whether a certain piece of cs : (Partial)CanonicalState actually corresponds to the respective h. If h is authentic, one can use this mechanism to verify the authenticity of c.

From a StateHashTree, one can derive a Digest, representing a StateHashTree even more compactly, as well as a Witness. Deriving the latter requires additional information on the (Partial)CanonicalState the witness should be valid for, i.e., so that one can use the Witness and the (Partial)CanonicalState to recompute the Digest corresponding to the original StateHashTree.

Batch processing

The figure below illustrates the operations of the state manager during batch processing, and, in doing so, also defines how processing a batch by the implementation should relate to the application of a batch to CanonicalState as defined in the specification of the message routing and execution environment layers (the blue arrows in the figure). Usually, a replica keeps "applying" the batches passed by consensus to some representative of an equivalence class in ReplicatedState_R to obtain the representative of an equivalence class in ReplicatedState_R corresponding to the next state version. If a replica falls too far behind it is required to catch up with the help of other replicas because the batches it requires to move forward are no longer available. This is done by obtaining a certain version of CanonicalState from other replicas via the state sync protocol.

Batch-processing.svg

Type definitions

CanonicalState and PartialCanonicalState can be abstractly viewed as trees and represented as partial maps. The figure below gives an example of a (Partial)CanonicalState tree. Note that the labels used here are just exemplary and don't correspond to the actual state tree exposed by the implementation.

State-tree.svg

Formally, (Partial)CanonicalState is defined as:

LabeledTree = Content | Label ↦ LabeledTree
CanonicalState = LabeledTree
PartialCanonicalState = LabeledTree

Both, CanonicalState and PartialCanonicalState, are aliases for the same type. However, they are named differently to capture their slightly different semantic.

The type of Content in the mapping is left abstract for now. It is a place holder for the concrete data placed in the state by the implementation.

Mappings between types

The following mappings between the different state representations are required. No explicit algorithms are given for some of the mappings defined below but only provide abstract requirements on them. In security proofs and arguments, assumptions are explicitly stated on how the relevant functions are implemented and do the security proofs relative to them.

  • A bijective function between ReplicatedState_R and CanonicalState that is efficiently computable in both directions. This is modeled as two efficiently computable, injective functions:
to_canonical   : ReplicatedState_R → CanonicalState
from_canonical : CanonicalState → ReplicatedState_R

where it holds that for all c : CanonicalState it is the case that c = to_canonical(from_canonical(c)). Note that these two functions will be implicitly defined by the implementation.

  • A function select to derive PartialCanonicalState from (Partial)CanonicalState subject to some Selector:
Selector = "Accept" | Label ↦ Selector
select : LabeledTree × Selector → PartialCanonicalState

which behaves as follows

select(tree, "Accept") := tree
select(tree, map_selector) := { (l ↦ select(subt, subs))
                                    | (l ↦ subt) ∈ tree ∧
                                      (l ↦ subs) ∈ map_selector
                              }
  • A function into_hashtree mapping (Partial)CanonicalState to StateHashTree:
into_hashtree : LabeledTree → StateHashTree
  • A function digest to derive a Digest representing a given StateHashTree:
digest : StateHashTree → Digest
  • A function witness to derive a Witness from a StateHashTree and a Selector:
witness : StateHashTree × Selector → Witness
  • A function derive_witness to derive a Witness from a Witness and a PartialCanonicalState so that it is a valid witness for the given partial canonical state:
derive_witness : Witness × PartialCanonicalState → Witness | Error
  • A function recompute_digest to compute a Digest from a PartialCanonicalState and a Witness:
recompute_digest : PartialCanonicalState × Witness → Digest | Error

Required Properties

  • Correctness requires that
∀ c ∈ CanonicalState,                                              (1)
∀ s1, s2 ∈ Selector,                                               (2)
∀ h = into_hashtree(c),                                            (3)
∀ d = digest(h),                                                   (4)
∀ c1 = select(c, s1),                                              (5)
∀ c2 = select(c1, s2),                                             (6)
∀ w1 = witness(h, s1),                                             (7)
∀ w2 = derive_witness(w1, c2),                                     (8)

it holds that

d = recompute_digest(c1, w1) ∧                                     (9)
d = recompute_digest(c2, w2)                                      (10)
  • Collision resistance: It is intractable to come up with c0 ∈ CanonicalState | PartialCanonicalState, c1 ∈ CanonicalState | PartialCanonicalState and w ∈ Witness where it holds that
∄ s ∈ Selector : c1 = select(c0, s) ∧                              (1)
h = into_hashtree(c0) ∧                                            (2)
digest(h) = recompute_digest(c1, w).                               (3)

State Hash Tree

Conversion from LabeledTree

The conversion described in this section turns some t : LabeledTree into a corresponding binary HashTree representing t. Intuitively, the node types Fork and Leaf are used to enforce a binary tree structure, whereas the node type LabeledNode is used to appropriately reflect the labels present in the LabeledTree.

Recall that a labeled tree is represented as a partial map LabeledTree = Content | Label ↦ LabeledTree which, informally speaking, allows to define an individual node in terms of its outgoing labeled edges pointing to other nodes of the tree, where a node may again be a LabeledTree or it may be Content in case it is a leaf. Concretely, a certain tree could look as follows:

Graphical illustration of T
T = { (label_1 ↦ subtree_1), (label_2 ↦ subtree_2), ..., (label_n ↦ subtree_n) }
Intermediate Representation 1: Ordering

First, an ordering function is required, which assigns an order to the elements in a partial map based on their labels. This order can simply be lexicographic ordering of the labels. Tuples containing the edges as sequences are written as opposed to writing them as a set to indicate that they already went through ordering, i.e., whenever this notation is used, it is implicitly assumed that this conversion has already taken place.

Below is an illustration of how ordering is supposed to work based on an example. Assume a tree T defined as

Tree before ordering
{ ("c" ↦ "demo3"),
  ("a" ↦ "demo1"),
  ("d" ↦ "demo4"),
  ("b" ↦ { ("x" ↦ "demo2.1"), ("z" ↦ "demo2.3"), ("y" ↦ "demo2.2")} ),
  ("e" ↦ "demo5")
}

After ordering, this tree would be represented as

Tree after ordering
( ("a" ↦ "demo1"),
  ("b" ↦ ( ("x" ↦ "demo2.1"), ("y" ↦ "demo2.2"), ("z" ↦ "demo2.3") ) ),
  ("c" ↦ "demo3"),
  ("d" ↦ "demo4"),
  ("e" ↦ "demo5")
)
Intermediate Representation 2: Binary State Trees

To describe the conversion, intermediate representation is introduced which has the same form as a hash tree but the inner nodes do not have associated hashes and the leaves contain Content as opposed to some Hash. In particular:

BinaryStateTree = LabeledNode | Fork | Leaf

LabeledNode = Label × BinaryStateTree
Fork = BinaryStateTree × BinaryStateTree
Leaf = Content

The conversion from an ordered version of CanonicalState to a BinaryStateTree, is defined as follows:

// Convert LabeledTree into LabeledNodes
convert(( (l_1 ↦ s_1), ..., (l_n ↦ s_n) )) :=
    expand(
           LabeledNode(l_1, convert(s_1)), ..., LabeledNode(l_n, convert(s_n))
          )

convert(l : Leaf) := l

// Split in first k elements to the left, rest to the right, where k is highest power of two < n
expand(c_1, c_2, ..., c_n) :=
    Fork(
         expand(c_1, ..., c_k),
         expand(c_k+1,...,c_n)
        )
    where k := 2^max{ x ∈ ℕ : 2^x < n }

// Base case
expand(c) := c

An alternative representation of the expand logic is to recursively group pairs of nodes in forks from left to right, starting at the leaves. In case there is an odd number of nodes one keeps the last node separate and includes it into a fork on the next possible level of the tree. This algorithm is additionally included (which behaves equivalently to the one above) because it may give pointers for efficient implementations of the conversion.

// Expand even number of child nodes
expand(c_1, c_2, ..., c_2n-1, c_2n) :=
    expand(
           Fork(c_1, c_2),
           ...,
           Fork(c_2n-1, c_2n)
          )

// Expand odd number of child nodes
expand(c_1, c_2, ..., c_2n-1, c_2n, c_2n+1) :=
    expand(
           Fork(c_1, c_2),
           ...,
           Fork(c_2n-1, c_2n),
           c_2n+1
          )

// Base case
expand(c) := c

Consider the ordered tree from the example above:

Expanded tree T' = expand(T)
T = ( ("a" ↦ "demo1"),
      ("b" ↦ ( ("x" ↦ "demo2.1"), ("y" ↦ "demo2.2"), ("z" ↦ "demo2.3") ) ),
      ("c" ↦ "demo3"),
      ("d" ↦ "demo4"),
      ("e" ↦ "demo5")
    )

Conversion to HashTree

Formally, a state hash tree is defined as follows.

HashTree = LabeledNode | Fork | Leaf

LabeledNode = Hash × Label × HashTree
Fork = Hash × HashTree × HashTree
Leaf = Hash

Based on a BinaryStateTree the conversion to a HashTree is specified relative to some hash function H:

domain_sep(s) := byte(|s|) || s

convert : Leaf → Hash
convert(l) := H(domain_sep("ic-hashtree-leaf") || l)

convert : Fork → Hash × HashTree × HashTree
convert((t1, t2)) := (H(domain_sep("ic-hashtree-fork") || convert(t1).hash || convert(t2).hash),
                      convert(t1),
                      convert(t2)
                     )

convert : LabeledNode → Hash × Label × HashTree
convert((l, t)) := (H(domain_sep("ic-hashtree-labeled") || l || convert(t).hash),
                    l,
                    convert(t)
                   )

Operation of the State Manager

Abstractly the State Manager can be viewed as a repository maintaining multiple versions of state together with some metadata. In this document this repository is represented by a field called states : Height ↦ StateContainer, where StateContainer is a type with multiple named fields:

StateContainer.state : CanonicalState
StateContainer.checkpoint : bool
StateContainer.certification : Certification

Note that the CanonicalState is used in the definition above. This is because the ReplicatedState is defined by the implementation. Using the CanonicalState allows us to define the behavior while not specifying how the state is supposed to look.

The field states is initialized whenever constructing a StateManager. Initially, states only contains an initial state with index 0. This state, henceforth called genesis_state is either a freshly constructed, empty state or is a state which is externally provided upon startup.

new : () → Self
new() :=
  StateManager {
      with
         └─ states := { (0 ↦ genesis_state) }
  }

API Behavior

Management of State

The following section defines the behavior of the API responsible for replicated state management. Given that all of the API functions are methods of the state manager, the self parameters are made explicit.

  • get_state returns the most recent available state.
get_state : Self → CanonicalState
get_state(self) := self.states[max(dom(self.states))].state
  • get_state_at returns the state at the given height.
get_state_at : Self × Height → CanonicalState
get_state_at(self, height) := self.states[height].state
  • latest_state_height returns the height corresponding to the latest state.
latest_state_height : Self → Height
latest_state_height(self) := max(dom(self.states))


  • list_state_heights returns a list of heights (in ascending order) corresponding to the states satisfying the specified criteria. The ordering is a total order, applied by some order function which is assumed to be implicitly defined by the Height type.
list_state_heights : Self × CertificationMask → Vec<Height>
list_state_heights(self, cert_mask) :=
    order(
            { h | (h ↦ s) ∈ self.states ∧
                  s.certification ≠ ∅ ∧
                  (
                   cert_mask = CERT_CERTIFIED ∨
                   cert_mask = CERT_ANY
                  )
            }
          ∪ { h | (h ↦ s) ∈ self.states ∧
                  s.certification = ∅ ∧
                  (
                   cert_mask = CERT_UNCERTIFIED ∨
                   cert_mask = CERT_ANY
                  )
            }
        )
  • remove_state_at removes the state corresponding to the given height.
% REQUIRE: height > 0
%
remove_state_at : Self × Height → Self
remove_state_at(self, height) :=
    self with
            └─ states := { (h ↦ s) | (h ↦ s) ∈ self.states ∧
                                      h ≠ height
                         }
  • remove_states_below removes all states corresponding to heights smaller than or equal to the given height.
remove_states_below : Self × Height → Self
remove_states_below(self, height) :=
    self with
            └─ states := { (h ↦ s) | (h ↦ s) ∈ self.states ∧
                                     h ≤ height ∧
                                     h > 0
                         }
  • commit_and_certify creates a new entry in states with state s, height height and also stores the decision regarding whether or not it should be possible to obtain a full state hash later on (requires_full_state_hash).
commit_and_certify : Self × CanonicalState × Height × bool → Self
commit_and_certify(self, s, height, requires_full_state_hash) :=
    self with
            └─ states := self.states ∪ { (height ↦ StateContainer {
                                                       state : s,
                                                       checkpoint : requires_full_state_hash,
                                                       certification : ∅
                                                    }
                                         )
                                       }

Certification of State

This section defines the behavior of the API responsible for replicated state management.

  • list_state_hashes_to_certify returns the root hashes (digests) corresponding to the states which have no certification yet, ordered by the height in an ascending manner. The ordering is applied by some order function which is not explicitly described here.
list_state_hashes_to_certify : Self → Vec<(Height, CryptoHashOfPartialState)>
list_state_hashes_to_certify(self) :=
    order({ (h, hash) | (h ↦ s) ∈ self.states ∧
                        s.certification = ∅ ∧
                        partial_state_selector = { (streams ↦ "Accept"),
                                                   (ingress_history ↦ "Accept")
                                                 }
                        partial_state = select(s.state, partial_state_selector) ∧
                        tree = into_hashtree(partial_state) ∧
                        hash = digest(tree)
         })
  • deliver_state_certification stores the delivered certification in the state container pointed to by self.states[certification.height].
deliver_state_certification : Self × Certification → Self
deliver_state_certification(self, certification) :=
    self with
            └─ self.states[certification.height].certification := certification
  • subnets_with_certified_streams returns the set of subnets to which there is an outgoing certified stream.
subnets_with_certified_streams : Self → Set<SubnetId>
subnets_with_certified_streams(self) :=
    dom(self.states[max(h | (h ↦ s) ∈ self.states ∧ s.certification ≠ ∅)]
                   ["streams"]
       )
  • encode_certified_stream encodes a stream slice which is part of the latest partially certified state to be served to remote_subnet by the XNetEndpoint. First define a function which does not take the size_limit parameter and then present a function with the size_limit parameter that makes use of this function.
encode_certified_stream_helper : Self × SubnetId × ℕ → CertifiedStreamSlice
encode_certified_stream_helper(self, remote_subnet, msg_limit) :=
    CertifiedStreamSlice {
        with
           ├─ payload := select(state_container.state, selector)
           ├─ witness := witness(into_hashtree(state_container.state), selector)
           └─ certification := state_container.certification
        where
           ├─ state_container := self.states[max(dom(self.states))]
           ├─ begin_index := min(dom(self.states[max(dom(self.states))]
           │                                    ["streams"]
           │                                    [remote_subnet]
           │                                    ["messages"]
           │                    ))
           ├─ end_index := min({ begin_index + msg_limit,
           │                     max(dom(self.states[max(dom(self.states))]
           │                                        ["streams"]
           │                                        [remote_subnet]
           │                                        ["messages"]
           │                        ))
           │                  })
           ├─ msg_selector := { (i ↦ "Accept") | i ≥ begin_index ∧ i ≤ end_index ∧ i ≠ ∅ }
           └─ selector := { ("streams" ↦ { (remote_subnet ↦ { ("hdr" ↦ "Accept"),
                                                              ("messages" ↦ msg_selector)
                          } ) } ) }
    }
)

The actual encoding function is defined relative to a size operator that may be defined to measure the absolute byte size of an encoding of the resulting CertifiedStreamSlice, but may also be defined to follow some different metric.

encode_certified_stream : Self × SubnetId × ℕ × ℕ → CertifiedStreamSlice
encode_certified_stream(self, remote_subnet, msg_limit, size_limit) :=
    encode_certified_stream_helper(self, remote_subnet, i)
        where
           └─ i = max(i | i ≤ msg_limit ∧
                          size(encode_certified_stream_helper(self, remote_subnet, i)) ≤ size_limit
                     )
  • decode_valid_certified_stream decodes a CertifiedStreamSlice that is known to be valid into a StreamSlice. It is used by (1) the block making logic to decode the CertifiedStreamSlices contained in past payloads, and (2) by message routing to decode the CertifiedStreamSlices received in a Batch for processing. To improve readability, a helper function is defined that is then used inside the description of the decoding function.
new_queue: (StreamIndex ↦ Message) → Queue<StreamIndex> | None
new_queue(stream_msgs) :=
    if |stream_msgs| = 0 {
        None
    } else {
        Queue<StreamIndex> {
            with
             ├─ elements := stream_msgs
             └─ next_index := max(dom(stream_msgs)) + 1
        }
    }
  • decode_valid_certified_stream : SubnetId × CertifiedStreamSlice → StreamSlice
decode_valid_certified_stream(own_subnet, slice) :=
    StreamSlice {
        with
           ├─ header := StreamHeader {
           │                with
           │                   ├─ begin := hdr_begin
           │                   ├─ end := hdr_end
           │                   └─ signals := stream_signals
           │            }
           └─ messages := new_queue(stream_msgs)
        where
           ├─ stream_msgs := slice.payload["streams"][own_subnet]["messages"]
           ├─ stream_signals := slice.payload["streams"][own_subnet]["hdr"]["signals"]
           ├─ hdr_begin := slice.payload["streams"][own_subnet]["hdr"]["begin"]
           └─ hdr_end := slice.payload["streams"][own_subnet]["hdr"]["end"]
    }
  • decode_certified_stream decodes a CertifiedStreamSlice into a StreamSlice to be able to check whether an XNetPayload conforms with the block making rules.
% FAIL IF:    |slice.payload["streams"]| ≠ 1
%           ∨ |slice.payload["streams"][own_subnet]| < 1
%           ∨ ∃ (i ↦ ⋅) ∈ slice.payload["streams"][own_subnet]["messages"] :
%             ├─ (i + 1 ↦ ⋅) ∉ slice.payload["streams"][own_subnet]["messages"] ∧
%             └─ i ∉ max(dom(slice.payload["streams"][own_subnet]["messages"]))
%           ∨ Consensus.verifier.is_valid(remote_subnet,
%                                         slice.certification,
%                                         registry_version,
%                                         recompute_digest(slice.payload, slice.witness),
%                                        ) ≠ true
%
decode_certified_stream : RegistryVersion × SubnetId × SubnetId × CertifiedStreamSlice → StreamSlice
decode_certified_stream(registry_version, own_subnet, remote_subnet, slice) :=
    decode_valid_certified_stream(own_subnet, slice)
  • derive_certified_stream is intended to be used by the Block Maker to derive a sub-slice from a slice previously obtained from the remote subnet. Similar to the encode_certified_stream method this function is defined in two steps.
derive_certified_stream_helper : RegistryVersion ×
                                 SubnetId ×
                                 SubnetId ×
                                 CertifiedStreamSlice ×
                                 ℕ ×
                                 ℕ
                                 → CertifiedStreamSlice
derive_certified_stream_helper(registry_version,
                               own_subnet,
                               remote_subnet,
                               slice,
                               begin,
                               end)
    := slice with
                ├─ payload := select(slice.payload, selector)
                └─ witness := derive(slice.witness, select(slice.payload, selector))
             where
                ├─ end_index := min({ end,
                │                     max(dom(slice["streams"][own_subnet]["messages"]))
                │                  })
                ├─ msg_selector := { (i ↦ "Accept") | i ≥ begin ∧ i ≤ end_index }
                └─ selector := { ("streams" ↦ { (own_subnet ↦ { ("hdr" ↦ "Accept"),
                                                                ("messages" ↦ msg_selector)
                               } ) } ) }

The actual function to derive substreams is defined relative to a size operator that may be defined to measure the absolute byte size of an encoding of the resulting CertifiedStreamSlice, but may also be defined to follow some different metric.

% FAIL IF:  |slice.payload["streams"]| ≠ 1 ∨
%           |slice.payload["streams"][own_subnet]| < 1 ∨
%           Consensus.verifier.is_valid(remote_subnet,
%                                       slice.certification,
%                                       registry_version,
%                                       recompute_digest(slice.payload, slice.witness),
%                                      ) ≠ true
%
% REQUIRE:  begin ≥ min(dom(slice.payload["streams"][own_subnet]["messages"]))
%
derive_certified_stream : RegistryVersion ×
                          SubnetId ×
                          SubnetId ×
                          CertifiedStreamSlice ×
                          ℕ ×
                          ℕ ×
                          ℕ
                          → CertifiedStreamSlice
derive_certified_stream(registry_version,
                        own_subnet,
                        remote_subnet,
                        slice,
                        begin,
                        end,
                        size_limit)
    := derive_certified_stream_helper(registry_version,
                                      own_subnet,
                                      remote_subnet,
                                      slice,
                                      begin,
                                      i)
           where
              └─ i = max(i | i ≤ end ∧
                             size(derive_certified_stream_helper(registry_version,
                                                                 own_subnet,
                                                                 remote_subnet,
                                                                 slice,
                                                                 begin,
                                                                 i)
                                 ) ≤ size_limit
                        )

State Integrity and Authentication for Nodes

  • get_full_state_hash_at returns the root hash (digest) of the hash tree corresponding to the full state at height h if requires_full_state_hash was set upon the corresponding call to commit_and_certify.
get_full_state_hash_at : Self × Height → CryptoHashOfState
get_full_state_hash_at(self, height) := hash | (height ↦ s) ∈ self.states ∧
                                               s.checkpoint = true ∧
                                               tree = into_hashtree(s.state) ∧
                                               hash = digest(tree)

Note that from an abstract point of view it would not be required to decide early on (i.e., upon commit_and_certify) whether or not a hash of the full state will be required at some point. However, it is still written in this way as this early decision allows an optimized implementation to start the computation of the root hash already before get_full_state_hash_at is called.