IC message routing layer

Overview

The Internet Computer (IC) achieves its security and fault tolerance by replicating computation across node machines located in various independent data centers across the world. For scalability reasons, the Internet Computing Protocol (ICP) composes the IC of multiple independent subnets. Each subnet can be viewed as an independent replicated state machine that replicates its state over a subset of all the available nodes.

Roughly speaking, replication is achieved by having the two lower ICP layers (P2P & Consensus) agree on blocks containing batches of messages to be executed, and then having the two upper ICP layers (Message Routing & Execution) execute them. Blocks are organized as a chain, where each block builds on the previous block. Each block has an associated height in the chain and one can look at execution of a batch of messages corresponding to the agreed upon block at height [math]\displaystyle{ x }[/math] by the upper layers as taking the replicated state of version [math]\displaystyle{ x-1 }[/math], and "applying" the batch to it to obtain replicated state of version [math]\displaystyle{ x }[/math].

In this document we describe the role of the Message Routing layer in deterministic batch processing. Its responsibilities are:

Coordinating the deterministic processing of batches: Fetching the right versions of the replicated state and the registry view to process the batch, triggering the deterministic processing, and committing the resulting replicated state.

Deterministic processing of batches: Deterministic processing of batches relative to some replicated state and some registry view, resulting in an updated replicated state.

Transferring message streams from one subnet to another: Moving streams from one subnet to another.

Remarks and Required Prior Knowledge

The goal of this document is to provide the next level of detail compared to the material in the "How it works" section of internetcomputer.org. So it is recommended to study the material available there first.
This page builds upon definitions made in the page describing the state manager. Please refer to this page for missing definitions related to the replicated state etc.
Also see this and this blog post for some relevant and easier to digest background information.
The documentation provided in this page may slightly deviate from the current implementation in terms of API as well as naming of functions, variables, etc. However, it still conveys the high-level ideas required to understand how the component itself works and how it interacts with other components. The implementation also contains several optimizations which are, however, not important for the conceptual overview here and therefore skipped.
The notation used in this page is described here.

Replicated vs. Canonical State

While the external API functions defined in this document will always take state in its implementation specific representation, i.e., as ReplicatedState, we describe the operation the message routing component performs on the state based on its canonical representation, i.e., the CanonicalState. Given the relations between ReplicatedState and CanonicalState as defined in the specification of the state manager, this will implicitly define how an implementation needs to act on the respective parts of the ReplicatedState. We assume an implicit conversion from ReplicatedState to CanonicalState whenever we access some state passed to this component via an API function.

Guarantees Provided by Message Routing

Intuitively, the goal of the message routing layer is to enable transparent communication of canisters across subnets. This means that this layer formally does not add any guarantees the system provides, but simply needs to make sure that system invariants are preserved. Those system invariants include

guaranteed replies (each canister-to-canister request will eventually receive a reply),

canister-to-canister ordering (the order of canister-to-canister requests sent from one canister to another canister is preserved), and

authenticity (only messages that come from canisters on the IC are processed).

To ensure that the system invariants hold, message routing needs to provide the following guarantees:

Canister-to-canister messages will eventually be passed to the execution layer at the subnet the destination canister lives on exactly once.

If a message can not be delivered, a synthetic reject response must be produced.

If a canister [math]\displaystyle{ A }[/math] sends two messages [math]\displaystyle{ m_1 }[/math] and [math]\displaystyle{ m_2 }[/math] to a canister [math]\displaystyle{ B }[/math], then, if none of them gets synthetically rejected, it must be guaranteed that they are put in canister [math]\displaystyle{ B }[/math]'s input queue from [math]\displaystyle{ A }[/math] in that order.

Preliminaries

Description of the Relevant Parts of the Registry

The registry can be viewed as a central store of configuration information of the IC that is maintained by the NNS DAO. The content of the registry is held by a canister on the NNS subnet, and, roughly speaking, its authenticity is guaranteed by obtaining a certification on the content on behalf of the NNS using the certification mechanism as described in the state manager wiki page. Throughout this document we assume that the registry contents we work with are authentic.

The registry entries required by this component are set of all existing subnet ids, as well as a canister-to-subnet mapping subnet_assignment. Note that the actual implementation may choose to represent the required fields differently as long as they are conceptually equivalent.

Registry {
    subnets : Set<SubnetId>,
	subnet_assignment: CanisterId ↦ SubnetId
    ...
}

Description of the Relevant Canonical State

Below, we define the parts of the canonical state which are relevant for the description of this component together with some constraints we impose on the replicated state. Abstractly the CanonicalState is defined as a nested partial map. For easier readability we bundle together the entries of the outermost map in a data structure with multiple fields where the names of the fields represent the keys in the respective partial map, e.g., for some s : CanonicalState we can use s.ingress_queue to access s[ingress_queues]

We start by defining the individual fields of the type CanonicalState which are relevant in the context of this document. After that we give more details about the datatypes of the individual fields. We distinguish between the parts which are exclusively visible to message routing, and the parts which are also visible to the execution layer.

Parts visible to message routing and execution

CanonicalState {
    ...
    ingress_queues  : IngressQueues,
    input_queues    : InputQueues,
    output_queues   : OutputQueues,
    ...
}

Parts visible to Message Routing only

CanonicalState {
    ...
    streams               : Streams,
    expected_xnet_indices : Set<(SubnetId × StreamIndex)>
    ...
}

Even though there are parts of the state that are accessed by both message routing and execution, one can enforce a conceptual boundary between them. In particular, for input queues we have that message routing will only ever push messages to them, whereas for output queues we have that message routing will only ever pull messages from them. The opposite holds for the execution environment.

Abstract Queues

We define a generic queue type Queue<T> which has the following fields:

Queue<T> {
    next_index : ℕ,     // Rolling index; the index of the next message to be inserted
    elements   : ℕ ↦ T  // The elements currently in the queue
}

We define a new queue as new_queue : Queue<T> with new_queue.elements = ∅ and new_queue.next_index = 1. Furthermore, it has the following associated functions:

push takes a queue and a partial map of integers mapping to T, and returns a new queue consisting of the old queue with the given values appended. It also updates the next_index field so that it points to the index after the last inserted message.

push : Self × (ℕ ↦ T) → Self
push(self, values) :=
    self with
            ├─ next_index := self.next_index + |values|
            └─ elements := self.elements
                           ∪ { (i - 1 + k ↦ t) | i = self.next_index ∧
                                                 (j ↦ t) ∈ values ∧
                                                 k = rank(j, dom(values)) }

delete removes the given elements from the queues keeping the next_index

% REQUIRE: values ⊆ self.elements
delete : Self × (ℕ ↦ T) → Self
delete(self, values) :=
    self with
            ├─ next_index := self.next_index
            └─ elements := self.elements
                           \ values

clear removes all elements from the queues keeping the next_index

clear : Self → Self
clear(self) :=
    self with
            ├─ next_index := self.next_index
            └─ elements := ∅

We are often working with partial maps of type SomeIdentifier ↦ Queue<T>, in which case we will use the following shorthand notation. With q being a queue of the aforementioned type, and v being a partial map of type (SomeIdentifier × ℕ) ↦ T, we define the following semantic for the functions f ∈ { push, delete } associated to Queue<T>:

f_map : (SomeIdentifier ↦ Queue<T>) × ((SomeIdentifier × ℕ) ↦ T) → (SomeIdentifier ↦ Queue<T>)
f_map(q, v) = { (id ↦ queue') | (id ↦ queue) ∈ q ∧
                                (id ↦ values) ∈ v ∧
                                queue' = f(queue, values)
              } ∪
              { (id ↦ queue') | (id ↦ values) ∈ v ∧
                                ∄ (id ↦ ·) ∈ q ∧
                                queue' = f(Queue<T>::new_queue, values)
              } ∪
              { (id ↦ queue) | (id ↦ queue) ∈ q ∧
                               ∄ (id ↦ ·) ∈ v
              }

For the functions f ∈ { clear } we use

f_map : (SomeIdentifier ↦ Queue<T>) → (SomeIdentifier ↦ Queue<T>)
f_map(q) = { (id ↦ queue') | (id ↦ queue) ∈ q ∧
                             queue' = f(queue)
           }

We will henceforth omit the map postfix in f_map and simply use f if it is clear from the input type that the map variant of f should be used.

Indices

We define an Index to be an arbitrary length sequence, where every element in the sequence up to the last one can have an arbitrary type, and the last one is a natural number.

Index : X × ... × Y × ℕ

In addition we define the following semantic:

We define the prefix of an index Index i := (x, …, y, seq_nr) as prefix(i) := i[1…|i| - 1] = (x, …, y), i.e., it contains all elements of i except the last one.

We define the postfix of an Index i := (x, …, y, seq_nr) as postfix(i) := i[|i|] = seq_nr, i.e., the last element of the index sequence. As already mentioned, we require the postfix of an index to be a natural number.