Difference between revisions of "IC message routing layer"

From Internet Computer Wiki
Jump to: navigation, search
Line 463: Line 463:
 
== XNet Transfer ==
 
== XNet Transfer ==
 
After calling <code>commit_and_certify</code> at the end of a deterministic processing cycle, the state manager will take care of getting the committed state certified. Once certification is complete, the certified stream slices can be made available to block makers on other subnets. The <code>XNetTransfer</code> subcomponent is responsible to enable this transfer. It consists of
 
After calling <code>commit_and_certify</code> at the end of a deterministic processing cycle, the state manager will take care of getting the committed state certified. Once certification is complete, the certified stream slices can be made available to block makers on other subnets. The <code>XNetTransfer</code> subcomponent is responsible to enable this transfer. It consists of
 +
 +
[[File:Xnet.png|thumb|XNet transfer component diagram]]
  
 
* An <code>XNetEndpoint</code> which is responsible for serving certified stream slices and making them available to <code>XNetPayloadBuilders</code> on other subnetworks.
 
* An <code>XNetEndpoint</code> which is responsible for serving certified stream slices and making them available to <code>XNetPayloadBuilders</code> on other subnetworks.
  
 
* An <code>XNetPayloadBuilder</code>, which allows the block makers to obtain an <code>XNetPayload</code> containing the currently available certified streams originating from other subnetworks. The <code>XNetPayloadBuilder</code> obtains those streams by interacting with <code>XNetEndpoints</code> exposed by other subnets. The <code>XNetPayloadBuilder</code> also provides functionality for notaries to verify <code>XNetPayloads</code> contained in block proposals.
 
* An <code>XNetPayloadBuilder</code>, which allows the block makers to obtain an <code>XNetPayload</code> containing the currently available certified streams originating from other subnetworks. The <code>XNetPayloadBuilder</code> obtains those streams by interacting with <code>XNetEndpoints</code> exposed by other subnets. The <code>XNetPayloadBuilder</code> also provides functionality for notaries to verify <code>XNetPayloads</code> contained in block proposals.
 +
 +
We do not specify anything about the protocol run between the <code>XNetEndpoint</code> and the <code>XNetPayloadBuilder</code> to transfer the streams between two subnetworks. The only requirement we have is that certified streams made available by an <code>XNetEndpoint</code> of an honest replica on some source subnetwork, they can be obtained by an <code>XNetPayloadBuilder</code> of an honest replica on the destination subnetwork and that the information regarding which endpoints to contact is available in the Registry.
 +
 +
[[File:Xnet-sequence.png|thumb|XNet transfer sequence diagram]]

Revision as of 12:57, 3 November 2022

Overview

The Internet Computer (IC) achieves its security and fault tolerance by replicating computation across node machines located in various independent data centers across the world. For scalability reasons, the Internet Computing Protocol (ICP) composes the IC of multiple independent subnets. Each subnet can be viewed as an independent replicated state machine that replicates its state over a subset of all the available nodes.

Roughly speaking, replication is achieved by having the two lower ICP layers (P2P & Consensus) agree on blocks containing batches of messages to be executed, and then having the two upper ICP layers (Message Routing & Execution) execute them. Blocks are organized as a chain, where each block builds on the previous block. Each block has an associated height in the chain and one can look at execution of a batch of messages corresponding to the agreed upon block at height [math]\displaystyle{ x }[/math] by the upper layers as taking the replicated state of version [math]\displaystyle{ x-1 }[/math], and "applying" the batch to it to obtain replicated state of version [math]\displaystyle{ x }[/math].

In this document we describe the role of the Message Routing layer in deterministic batch processing. Its responsibilities are:

  • Coordinating the deterministic processing of batches: Fetching the right versions of the replicated state and the registry view to process the batch, triggering the deterministic processing, and committing the resulting replicated state.
  • Deterministic processing of batches: Deterministic processing of batches relative to some replicated state and some registry view, resulting in an updated replicated state.
  • Transferring message streams from one subnet to another: Moving streams from one subnet to another.

Remarks and Required Prior Knowledge

  • The goal of this document is to provide the next level of detail compared to the material in the "How it works" section of internetcomputer.org. So it is recommended to study the material available there first.
  • This page builds upon definitions made in the page describing the state manager. Please refer to this page for missing definitions related to the replicated state etc.
  • Also see this and this blog post for some relevant and easier to digest background information.
  • The documentation provided in this page may slightly deviate from the current implementation in terms of API as well as naming of functions, variables, etc. However, it still conveys the high-level ideas required to understand how the component itself works and how it interacts with other components. The implementation also contains several optimizations which are, however, not important for the conceptual overview here and therefore skipped.
  • The notation used in this page is described here.

Replicated vs. Canonical State

While the external API functions defined in this document will always take state in its implementation specific representation, i.e., as ReplicatedState, we describe the operation the message routing component performs on the state based on its canonical representation, i.e., the CanonicalState. Given the relations between ReplicatedState and CanonicalState as defined in the specification of the state manager, this will implicitly define how an implementation needs to act on the respective parts of the ReplicatedState. We assume an implicit conversion from ReplicatedState to CanonicalState whenever we access some state passed to this component via an API function.

Guarantees Provided by Message Routing

Intuitively, the goal of the message routing layer is to enable transparent communication of canisters across subnets. This means that this layer formally does not add any guarantees the system provides, but simply needs to make sure that system invariants are preserved. Those system invariants include

  • guaranteed replies (each canister-to-canister request will eventually receive a reply),
  • canister-to-canister ordering (the order of canister-to-canister requests sent from one canister to another canister is preserved), and
  • authenticity (only messages that come from canisters on the IC are processed).

To ensure that the system invariants hold, message routing needs to provide the following guarantees:

  • Canister-to-canister messages will eventually be passed to the execution layer at the subnet the destination canister lives on exactly once.
  • If a message can not be delivered, a synthetic reject response must be produced.
  • If a canister [math]\displaystyle{ A }[/math] sends two messages [math]\displaystyle{ m_1 }[/math] and [math]\displaystyle{ m_2 }[/math] to a canister [math]\displaystyle{ B }[/math], then, if none of them gets synthetically rejected, it must be guaranteed that they are put in canister [math]\displaystyle{ B }[/math]'s input queue from [math]\displaystyle{ A }[/math] in that order.

Preliminaries

Description of the Relevant Parts of the Registry

The registry can be viewed as a central store of configuration information of the IC that is maintained by the NNS DAO. The content of the registry is held by a canister on the NNS subnet, and, roughly speaking, its authenticity is guaranteed by obtaining a certification on the content on behalf of the NNS using the certification mechanism as described in the state manager wiki page. Throughout this document we assume that the registry contents we work with are authentic.

The registry entries required by this component are set of all existing subnet ids, as well as a canister-to-subnet mapping subnet_assignment. Note that the actual implementation may choose to represent the required fields differently as long as they are conceptually equivalent.

Registry {
    subnets : Set<SubnetId>,
	subnet_assignment: CanisterId ↦ SubnetId
    ...
}

Description of the Relevant Canonical State

Below, we define the parts of the canonical state which are relevant for the description of this component together with some constraints we impose on the replicated state. Abstractly the CanonicalState is defined as a nested partial map. For easier readability we bundle together the entries of the outermost map in a data structure with multiple fields where the names of the fields represent the keys in the respective partial map, e.g., for some s : CanonicalState we can use s.ingress_queue to access s[ingress_queues]

We start by defining the individual fields of the type CanonicalState which are relevant in the context of this document. After that we give more details about the datatypes of the individual fields. We distinguish between the parts which are exclusively visible to message routing, and the parts which are also visible to the execution layer.

Parts visible to message routing and execution

CanonicalState {
    ...
    ingress_queues  : IngressQueues,
    input_queues    : InputQueues,
    output_queues   : OutputQueues,
    ...
}

Parts visible to Message Routing only

CanonicalState {
    ...
    streams               : Streams,
    expected_xnet_indices : Set<(SubnetId × StreamIndex)>
    ...
}

Even though there are parts of the state that are accessed by both message routing and execution, one can enforce a conceptual boundary between them. In particular, for input queues we have that message routing will only ever push messages to them, whereas for output queues we have that message routing will only ever pull messages from them. The opposite holds for the execution environment.

Abstract Queues

We define a generic queue type Queue<T> which has the following fields:

Queue<T> {
    next_index : ℕ,     // Rolling index; the index of the next message to be inserted
    elements   : ℕ ↦ T  // The elements currently in the queue
}

We define a new queue as new_queue : Queue<T> with new_queue.elements = ∅ and new_queue.next_index = 1. Furthermore, it has the following associated functions:

  • push takes a queue and a partial map of integers mapping to T, and returns a new queue consisting of the old queue with the given values appended. It also updates the next_index field so that it points to the index after the last inserted message.
push : Self × (ℕ ↦ T) → Self
push(self, values) :=
    self with
            ├─ next_index := self.next_index + |values|
            └─ elements := self.elements
                           ∪ { (i - 1 + k ↦ t) | i = self.next_index ∧
                                                 (j ↦ t) ∈ values ∧
                                                 k = rank(j, dom(values)) }
  • delete removes the given elements from the queues keeping the next_index
% REQUIRE: values ⊆ self.elements
delete : Self × (ℕ ↦ T) → Self
delete(self, values) :=
    self with
            ├─ next_index := self.next_index
            └─ elements := self.elements
                           \ values
  • clear removes all elements from the queues keeping the next_index
clear : Self → Self
clear(self) :=
    self with
            ├─ next_index := self.next_index
            └─ elements := ∅

We are often working with partial maps of type SomeIdentifier ↦ Queue<T>, in which case we will use the following shorthand notation. With q being a queue of the aforementioned type, and v being a partial map of type (SomeIdentifier × ℕ) ↦ T, we define the following semantic for the functions f ∈ { push, delete } associated to Queue<T>:

f_map : (SomeIdentifier ↦ Queue<T>) × ((SomeIdentifier × ℕ) ↦ T) → (SomeIdentifier ↦ Queue<T>)
f_map(q, v) = { (id ↦ queue') | (id ↦ queue) ∈ q ∧
                                (id ↦ values) ∈ v ∧
                                queue' = f(queue, values)
              } ∪
              { (id ↦ queue') | (id ↦ values) ∈ v ∧
                                ∄ (id ↦ ·) ∈ q ∧
                                queue' = f(Queue<T>::new_queue, values)
              } ∪
              { (id ↦ queue) | (id ↦ queue) ∈ q ∧
                               ∄ (id ↦ ·) ∈ v
              }

For the functions f ∈ { clear } we use

f_map : (SomeIdentifier ↦ Queue<T>) → (SomeIdentifier ↦ Queue<T>)
f_map(q) = { (id ↦ queue') | (id ↦ queue) ∈ q ∧
                             queue' = f(queue)
           }

We will henceforth omit the map postfix in f_map and simply use f if it is clear from the input type that the map variant of f should be used.

Indices

We define an Index to be an arbitrary length sequence, where every element in the sequence up to the last one can have an arbitrary type, and the last one is a natural number.

Index : X × ... × Y × ℕ

In addition we define the following semantic:

  • We define the prefix of an index Index i := (x, …​, y, seq_nr) as prefix(i) := i[1…​|i| - 1] = (x, …​, y), i.e., it contains all elements of i except the last one.
  • We define the postfix of an Index i := (x, …​, y, seq_nr) as postfix(i) := i[|i|] = seq_nr, i.e., the last element of the index sequence. As already mentioned, we require the postfix of an index to be a natural number.
  • For an Index i, the operation [math]\displaystyle{ i + 1 }[/math] is defined as concatenate(prefix(i), postfix(i) + 1).
  • Two indices, Index i and Index j, are incomparable if prefix(i) ≠ prefix(j).
  • For two indices, Index i and Index j, we have that [math]\displaystyle{ i \leq j }[/math] if prefix(i) = prefix(j) and postfix(i) ≤ postfix(j).

Queues

We distinguish three different types of queues in the replicated state: ingress queues, input queues, and output queues. Ingress queues contain the incoming messages from users (i.e., ingress messages). Input queues contain the incoming canister-to-canister messages. Output queues contain the outgoing canister-to-canister messages.

Ingress queues are organized on a per destination basis. Messages in ingress queues are indexed by a concrete instance of Index called IngressIndex, which is a tuple consisting of the destination canister ID and a natural number, i.e.,

IngressIndex : CanisterId × ℕ

Input queues and output queues are organized on a per-source-and-destination basis. Messages in input- and output queues are indexed by a concrete instance of Index called QueueIndex, which is defined as follows:

QueueIndex : CanisterId × CanisterId × ℕ

The type representing all of the ingress queues is defined as follows:

IngressQueues : CanisterId ↦ Queue<Message>,

which means that IngressQueues.elements : IngressIndex ↦ Message.

The type representing all of the input queues is defined as follows:

InputQueues : (CanisterId × CanisterId) ↦ Queue<Message>,

which means that InputQueues.elements : QueueIndex ↦ Message.

The type representing all of the output queues is defined as follows:

OutputQueues : (CanisterId × CanisterId) ↦ Queue<Message>,

which means that OutputQueues.elements : QueueIndex ↦ Message.

Streams

Each individual Stream is scoped to a pair of subnets—​the subnet a stream originates from and subnet the stream is targeted at. An individual stream is organized in multiple substreams identified by a SubstreamId. The concrete definition of SubstreamId is up to the implementation. In the current implementation SubstreamId is defined to be the unit type (), i.e., we have flat streams. Messages in streams are indexed by a concrete instance of Index called StreamIndex which is defined as follows:

StreamIndex : SubstreamId × ℕ

A Stream is comprised of a sequence of Signal messages signals and a sequence of canister-to-canister messages msgs.

Stream {
    signals : StreamIndex ↦ {ACCEPT, REJECT},
    msgs    : SubstreamId ↦ Queue<Message>
}

which means that Stream.msgs.elements : StreamIndex ↦ Message.

While the subnet the stream originates from is implicitly determined, the target subnet needs to be made explicit. Hence, we define a data structure Streams holding all streams indexed by destination subnetwork:

Streams : SubnetId ↦ Stream

We may sometimes abuse the notation and directly access the fields defined for an individual Stream on the Streams type, in which case we obtain maps of the following type:

Streams.signals : SubnetId ↦ (StreamIndex ↦ {ACCEPT, REJECT})

Streams.msgs    : SubnetId ↦ (SubstreamId ↦ Queue<Message>)

(Certified) Stream Slices

StreamSlices and CertifiedStreamSlices, respectively, are used to transport streams from one to an other subnet within XNetPayloads that are part of consensus blocks. Essentially, a StreamSlice is a slice of a stream which retains the begin and the end of the original stream. A StreamSlice is wrapped in a CertifiedStreamSlice for transport so that authenticity can be guaranteed. Neither CertifiedStreamSlices nor StreamSlices are ever explicitly created within message routing, but instead one relies on the encoding and decoding routines provided by the state manager: A CertifiedStreamSlice is created by calling the respective encoding routine of the state manager. Such a CertifiedStreamSlice can then be decoded into a StreamSlice using the corresponding decoding routine provided by the state manager.

StreamSlice {
    stream    : Stream,
    begin     : Set<StreamIndex>,
    end       : Set<StreamIndex>
}
CertifiedStreamSlice {
    payload   : PartialCanonicalState
    witness   : Witness
    signature : Certification
}

For the precise relation of StreamSlice and CertifiedStreamSlice, refer to the specification of the state manager.

Batch

A batch consists of multiple elements including an ingress_payload constituting a sequence of ingress messages, and an xnet_payload.

Batch {
    batch_number             : Height
    registry_version         : RegistryVersion
    ingress_payload          : ℕ ↦ Message
    xnet_payload             : SubnetId ↦ CertifiedStreamSlice
    requires_full_state_hash : { TRUE, FALSE }
}

Decoded Batch

A decoded batch represents a batch where all transport-specific things are decoded into the format suitable for processing and some things which are not required inside the deterministic state machine are stripped off.

DecodedBatch {
    ingress_payload : ℕ ↦ Message
    xnet_payload : SubnetId ↦ StreamSlice
}

Currently this only means decoding the CertifiedStreamSlices into StreamSlices because we assume that the ingress payload is suitable to be processed right away. Formally there is a function, which, based on the own subnet id and the given batch decodes the batch into a decoded batch:

decode : SubnetId × Batch → DecodedBatch
decode(own_subnet, b) :=
    DecodedBatch {
        with
           ├─ ingress_payload := b.ingress_payload
           └─ xnet_payload :=
                  { (src_subnet ↦ slice) |
                      (src_subnet ↦ cert_slice) ∈ b.xnet_payload ∧
                      slice = StateManager.decode_valid_certified_stream(own_subnet,
                                                                         cert_slice
                                                                        )
                  }
    }

Message Routing

Message routing is triggered by incoming batches from consensus. For each Batch b, message routing will perform the following steps:

Components interacting with message routing during a deterministic processing round
Interactions of message routing with other components during a deterministic processing round
  • Obtain the ReplicatedState s of the right version w.r.t. Batch b.
  • Submit s, decode(own_subnet, b) for processing by the deterministic state machine comprised of the message routing and execution layer. This includes
    • An induction phase (cf. pre_process), where the valid messages in decode(own_subnet, b) are inducted. Among others, a message m in a StreamSlice from subnet X is considered valid if registry.get_registry_at(b.registry_version).subnet_assignment maps m.src to X.
    • An execution phase (cf. execute), which executes messages available in the induction pool.
    • An XNet message routing phase (cf. post_process), which moves the messages produced in the execution phase from the per-session output queues to the subnet-to-subnet streams according to the mapping defined by the subnet assignment in the registry.
  • Commit the replicated state, incrementally updated by the previous steps, to the state manager via commit_and_certify.


Deterministic State Machine

As shown in the sequence diagram above, the deterministic state machine implemented by message routing and execution applies batches provided by consensus to the appropriate state, additionally using some meta information provided by the registry. As discussed above, we will use state of type CanonicalState to generally describe the operations of the message-routing-related operations of this component.

Data flow during batch processing

The flow diagram below details the operation of the component. Its operation is logically split into three phases.

  • The induction phase, where the messages contained in the batch are preprocessed. This includes extracting them from the batch and, subject to their validity and the decision of VSR, added to the induction pool or not.
  • The execution phase, where the hypervisor is triggered to perform an execution cycle. The important thing from a message routing perspective is that it will take messages from the input queues and process them, which causes messages to be added to the output queues.
  • The XNet message routing phase, where the messages produced in the execution cycle are post-processed. This means that they are taken from the canister-to-canister output queues and routed into the appropriate subnet-to-subnet streams.

All messages will be added to the respective destination queue/stream preserving the order they appear in the respective source stream/queue.

API

The deterministic state machine does not provide any external API functions. It only provides the following functions resembling the state transformations implemented by the individual steps of the deterministic state machine depicted above. Refer to the previous section for context regarding when the individual functions are called.

  • pre_process(s : CanonicalState, subnet_assignment : (CanisterId ↦ SubnetId), b : DecodedBatch) → CanonicalState: Triggers the induction phase.
  • execute(s : CanonicalState) → CanonicalState: Triggers the execution phase.
  • post_process(s : CanonicalState, subnet_assignment : (CanisterId ↦ SubnetId)) → CanonicalState: Triggers the XNet message routing phase.

Abstractions of Other Parts of the System

Valid Set Rule (VSR) The VSR is a component that makes the decision of whether to ACCEPT a message or to REJECT a message. For message routing, ACCEPT has the semantic that the execution layer takes responsibility for the message, whereas REJECT has the semantic that the message is dropped and may require action from the message routing layer.

The operation of the VSR on ingress messages is defined as follows, where vsr_check_ingress : CanonicalState × Batch → Set<ℕ> is a deterministic function returning the indices of the messages in the ingress payload accepted by the VSR, which returns a possibly empty set of index-message tuples corresponding to the accepted messages in the ingress_payload of the batch. The set is determined by the concrete implementation of the VSR.

VSR(state, batch).ingress :=
  { ((m_i.dst, j) ↦ m_i) | (i ↦ m_i) ∈ batch.ingress_payload
                           ∧ i ∈ vsr_check_ingress(state, batch)
                           ∧ j = Rank(i, vsr_check_ingress(state, batch))
  }

Scheduler and Hypervisor. From the point of view of message routing, one can look at the the scheduler and the hypervisor together as one component. We model the functionality of scheduler and hypervisor as a deterministic function schedule_and_execute : CanonicalState → (IngressIndex ↦ Message) × (QueueIndex ↦ Message) × (QueueIndex ↦ Message) which computes the change set introduced by the Scheduler and the Hypervisor. It takes messages from the input queues, executes them and puts new messages to the output queues.

We will later use this function when we describe how the state transition function execute(CanonicalState) → CanonicalState transforms the state. For the sake of compact notation, we use the following fields to access the individual return values of the schedule_and_execute function.

  • First, we have consumed_ingress_messages, which contains a partial map IngressIndex ↦ Message containing all consumed ingress messages.
  • Second, we have consumed_xnet_messages, which contains a partial map QueueIndex ↦ Message containing all consumed cross-net messages.
  • Third, we have produced_messages which contains a partial map QueueIndex ↦ Message containing all produced messages, where the order of the messages implied by the queue index determines the order in which they need to be added to the queues.

Description of the State Transitions

Induction Phase. In the induction phase, one starts off with a CanonicalState S, some subnet_assignment and a DecodedBatch b and applies b to S relative to subnet_assignment to obtain S', i.e., one computes S' = pre_process(S, subnet_assignment, b).

We describe things here w.r.t. to a version of the VSR which will accept all messages, while in reality the VSR may reject some messages in case canisters migrate across subnets or subnets are split. So while the possibility that messages can be REJECTed by the VSR would require specific action of the message routing layer we omit those actions here for simplicity as they are not crucial to understand the basic functionality of message routing.

Before we define the actual state transition we define a couple of helper functions. First we define a function that determines the order of the messages in the queues based on the order of the messages in the incoming stream slices.

% REQUIRES: ∄ (s1 ↦ m1), (s2 ↦ m2) ∈ S :
%           └─ m1 = m2 ∧ s1 ≠ s2
%
% ENSURES: ∀ S satisfying the precondition above,
%          └─ ∀ (q1 ↦ m1), (q2 ↦ m2) ∈ queue_index(S) :
%             ├─ ∃ s1, s2 :
%             │  └─ (s1 ↦ m1) ∈ S ∧ (s2 ↦ m2) ∈ S ∧
%             └─ (m1.dst = m2.dst ∧ s1 ≤ s2) ==> q1 ≤ q2
%
queue_index: ((SubnetId × StreamIndex) ↦  Message) → ((CanisterId × ℕ) ↦ Message))
queue_index(S) := {
  % We do not provide a concrete implementation of this function as there are
  % multiple possible implementations and the choice for one also depends on
  % how priorities/fairness etc. are handled.
  %
  % A trivial implementation is to iterate over the given stream slices S per
  % subnet and for each individual slice iterate over all the messages in the
  % order they appear in the slice and push each message m on the right queue,
  % i.e., the one belonging to the destination canister. This is also the way
  % things are currently implemented.
}

Based on this we can now define a function that maps over the indexes of the valid XNet messages.

map_valid_xnet_messages : (SubnetId ↦ Slice) ×
                          (CanisterId ↦ SubnetId) →
                          ((CanisterId × ℕ) ↦ Message)
map_valid_xnet_messages(slices, subnet_assignment) :=
    queue_index({ ((subnet, index) ↦ m) | (subnet ↦ slice) ∈ slices ∧
                                          (index ↦ m) ∈ slice.msgs ∧
                                          subnet_assignment[m.src] = subnet ∧

               })


Finally, we can define the state S' resulting from computing pre_process(S, b):

S with
  % Append the ingress messages accepted by the VSR to the appropriate ingress_queue
  ingress_queues            := push(S.ingress_queues, VSR(S, b).ingress)

  % Append the canister to canister messages accepted by the VSR to the appropriate
  % input queue.
  input_queues              := push(S.input_queues,
                                    map_valid_xnet_messages(VSR(S, b).xnet, subnet_assignment)
                                   )

  % Garbage collect the messages which have accepted by the target subnet.
  % (As soon as the VSR does no longer ACCEPT all messages, one would have
  %  to make sure that rejected messages are appropriately re-enqueued in
  %  the streams)
  streams.msgs              := delete(S.streams.msgs,
                                 { (concatenate(subnet, index) ↦ msg) |
                                       (subnet ↦ slice) ∈ b.xnet_payload ∧
                                       (i ↦ ·) ∈ slice.signals ∧
                                       index = concatenate(subnet, i)
                                 }
                               )

  % Add the signals reflecting the decisions made by the VSR in the current round and
  % garbage collect the signals which have already been processed on the other subnet
  % (one knows that a signal has been processed when the message is no longer included
  % in a given slice).
  streams.signals           := S.streams.signals
                               ∪ VSR(S, b).signals
                               \ { (index ↦ signal) |
                                       (subnet ↦ slice) ∈ b.xnet_payload ∧
                                       (i ↦ signal) ∈ S.streams[subnet].signals ∧
                                       index = concatenate(subnet, i) ∧
                                       j ∈ slice.begin ∧
                                       i < j
                                 }

  % Update the expected XNet indexes so that the block maker can compute which messages
  % to include in a block referencing this state.
  expected_xnet_indices     := { index     | index ∈ S.expected_xnet_indices ∧
                                             ∄ (i ↦ ·) ∈ b.xnet_payload.msgs.elements :
                                             └─ prefix(index) = prefix(i)
                               } ∪
                               { index + 1 | index ∈ max(dom(b.xnet_payload.msgs.elements)) }

Execution Phase. In the execution phase, one starts off with a CanonicalState S, schedules messages for execution by the hypervisor, and triggers the hypervisor to execute them, i.e., one computes S' = execute(S) where S is the state after the induction phase. From the perspective of message routing, the state S' resulting from computing execute(S) looks as follows:

S with
  % Delete the consumed ingress messages from the respective ingress queues
  ingress_queues    := delete(S.ingress_queue, schedule_and_execute(S).consumed_ingress_messages)

  % Delete the consumed canister to canister messages from the respective input queues
  input_queues      := delete(S.input_queues, schedule_and_execute(S).consumed_xnet_messages)

  % Append the produced messages to the respective output queues
  output_queues     := push(S.output_queues, schedule_and_execute(S).produced_messages)

  % Execution specific state is transformed by the execution environment; the precise transition
  % function is out of scope here.

XNet Message Routing Phase. In the XNet message routing phase, one takes all the messages from the canister-to-canister output queues and, according to the subnet_assignment, puts them into a subnet-to-subnet stream, i.e., it computes S' = post_process(S, registry), where S is the state after the execution phase and registry represents a view of the registry.

Before we define the state transition, we define a helper function to appropriately handle messages targeted at canisters that do not exist according to the given subnet assignment.

% Remove all messages from output queues targeted at non-existent canisters according
% to the subnet assignment.
filter : ((CanisterId × CanisterId) ↦ Queue<Message>) × (CanisterId ↦ SubnetId) → ((CanisterId × CanisterId) ↦ Queue<Message>)
filter(queues, subnet_assignment) :=
    delete(queues, { (q_index ↦ msg) | (q_index ↦ msg) ∈ queues.elements ∧
                                       q_index = (·, dst, ·) ∧
                                       dst ∉ dom(subnet_assignment)
                   }
          )

Produce NON_EXISTENT_CANISTER replies telling the sending canister that the destination canister does not exist.

% Produce NON_EXISTENT_CANISTER messages to be pushed to input queues 
% of the senders of messages where the destination does not exist
non_existent_canister_replies : ((CanisterId × CanisterId) ↦ Queue<Message>) × (CanisterId ↦ SubnetId) → (QueueIndex ↦ Message)
non_existent_canister_replies(queues, subnet_assignment) :=
  { ((dst, src, i) ↦ NON_EXISTENT_CANISTER) | (q_index ↦ msg) ∈ queues.elements ∧
                                              q_index = (src, dst, i) ∧
                                              dst ∉ dom(subnet_assignment)
  })

Non flat streams. As already mentioned before, the specification leaves it open whether one flat stream is produced per destination subnet, or whether each of the streams has multiple substreams—​this can be decided by the implementation. To enable this, a StreamIndex is defined to be a tuple of SubstreamId and a natural number. If we have a flat stream, StreamIndex is defined to be the unit type () which effectively means that the implementation can use natural numbers as stream index as one does not need to make the SubstreamId explicit in this case. In contrast, if we have per-destination (or per-source) substreams, StreamIndex is defined to be a CanisterId.

Formally, this means that the implementation must fix a mapping function that—​based on a given prefix of a QueueIndex, i.e., a src-dst tuple—​decides on the prefix of the StreamIndex, i.e., the SubstreamId.

substream_id: (CanisterId × CanisterId) → SubstreamId

% Definition of substream_id for flat streams
substream_id((src, dst)) := ()

% Definition of substream_id for per-destination canister substreams
substream_id((src, dst)) := dst

Description of the actual state transition. The state S' resulting from computing post_process(S, subnet_assignment) is defined as follows:

S with
  % Clear the output queues
  output_queues := clear(S.output_queues)

  % Route the messages produced in the previous execution phase to the appropriate streams
  % taking into account ordering and capacity management constraints enforced by stream_index.
  streams.msgs  := {
    let msgs = S.streams.msgs

    % Iterate over filtered messages preserving order of messages in queues.
    for each (q_index ↦ msg) ∈ filter(S.output_queues, subnet_assignment)
      msgs = push(msgs, { (concatenate(substream_id(prefix(q_index)), postfix(q_index)) ↦ msg) })

    return msgs
  }


  % Push NON_EXISTENT_CANISTER replies to input queues of the respective canisters
  input_queues := push(S.input_queues,
                       non_existent_canister_replies(S.output_queues, subnet_assignment))

Ordering of Messages in the Stream & Fairness. As long as the invariant that the canister-to-canister ordering of messages is preserved when iterating over the filtered messages in the state transition described above, the implementation can take the freedom to apply alternative orderings.

Also note that, while the state transition defined above empties the output queues completely, this is not crucial to the design and one could hold back messages as long as this does not violate the ordering requirement.

XNet Transfer

After calling commit_and_certify at the end of a deterministic processing cycle, the state manager will take care of getting the committed state certified. Once certification is complete, the certified stream slices can be made available to block makers on other subnets. The XNetTransfer subcomponent is responsible to enable this transfer. It consists of

XNet transfer component diagram
  • An XNetEndpoint which is responsible for serving certified stream slices and making them available to XNetPayloadBuilders on other subnetworks.
  • An XNetPayloadBuilder, which allows the block makers to obtain an XNetPayload containing the currently available certified streams originating from other subnetworks. The XNetPayloadBuilder obtains those streams by interacting with XNetEndpoints exposed by other subnets. The XNetPayloadBuilder also provides functionality for notaries to verify XNetPayloads contained in block proposals.

We do not specify anything about the protocol run between the XNetEndpoint and the XNetPayloadBuilder to transfer the streams between two subnetworks. The only requirement we have is that certified streams made available by an XNetEndpoint of an honest replica on some source subnetwork, they can be obtained by an XNetPayloadBuilder of an honest replica on the destination subnetwork and that the information regarding which endpoints to contact is available in the Registry.

XNet transfer sequence diagram