IC P2P (peer to peer) layer
The peer-to-peer layer of the IC is responsible for delivering messages between peers within subnets. It is designed to ensure efficient delivery of messages even in the presence of malicious activity.
Each client of the P2P layer, i.e., components in the layers above it like Consensus, maintains a collection of artifacts (artifact pool). Artifacts are messages that are exchanged between peers within a subnet to create, validate and agree on blocks. These can be, for example, consensus block proposals, ingress messages from users, or signature shares of responses for the HTTPS outcalls feature. Nodes distribute these artifacts to their peers so all peers have the same view of the state of the subnet. This allows nodes to include ingress messages sent to other peers when they create block proposals and to catch up efficiently if they have been disconnected or joining a new subnet.
The P2P layer is invoked by the artifact pools when an artifact needs to be delivered to peers. This can happen when the node itself generates a new artifact (e.g., a new block proposal), or when it wants to relay an artifact it received from another peer (for example, one would want to relay block proposals even when all peers are connected to all peers, since a malicious peer can send one block proposal to a subset of peers, and another (or none) to another subset. This can slow down a subnet. Relaying proposals prevents this). Relay also helps in cases when some nodes are only connected to some peers. This can happen when a node’s ISP changes its BGP peering settings.
In order to save bandwidth, which would be wasted by sending an artifact to a peer that already has the artifact in its pool, the P2P layer does not send out artifacts immediately to all peers when requested by a client to broadcast an artifact. Instead, it creates a message called advert, which is a very small message with the hash of the artifact and some more metadata, and broadcasts this advert to all peers. Other peers then see the advert and decide whether they would like to download the corresponding artifact from that peer, and if so, they send an explicit request message to that peer, which will respond with the artifact itself.
This approach clearly trades throughput for latency, but for the requirements of the IC’s consensus protocol this is desired.
To share the bandwidth fairly among all peers and to avoid running out of memory, the number of in-flight requests is limited.
Some artifacts, like state sync artifacts, are too big to be sent as a single chunk. For these artifacts, a single advert represents a set of chunks that together make up a complete artifact. When a downloaded artifact is chunkable, the P2P layer will attempt to download the corresponding chunks from multiple peers which advertised the corresponding artifact. This speeds up the download and better utilizes the bandwidth.
Verification of Artifacts
Each advert contains an integrity hash of the corresponding artifact. The integrity hash is the result of applying a cryptographic hash function over the content of the artifact. When an artifact download is complete, the same function is applied locally on the downloaded content, and only if the result matches the advertised hash value, the artifact is processed and provided to the corresponding artifact pool of the corresponding client.
It is important to note that this is only a first layer of defense: a malicious node can send an advert with some integrity hash, and then an artifact that matches the advertised hash. This will pass all checks in the P2P layer, and the client will then have to notice that the artifact does not meet its requirements, e.g., is not signed correctly. The clients in the IC implementation always perform validation checks to catch such attacks before processing artifacts further or relaying them to other peers.
Chunkable artifacts are validated both per-chunk and as a whole, where the corresponding client is responsible for per-chunk validation.
In some cases, for example when a node notices an error in receiving adverts, or when it joins a new subnet, it sends out a retransmission request to its peers. The request can go to one specific peer (e.g., when a connectivity issue is detected with that peer), or to all peers (e.g., when the node joins). The request contains information about the node’s current state, and peers can respond with adverts that can help the node make progress from its current state, as reported in the retransmission request.
While retransmission requests are meant for helping nodes to catch up on missed adverts, they may also issue them periodically to make sure they haven’t missed anything.
The underlying Transport component of the P2P layer is responsible for maintaining the actual connections between peers in a subnet. It creates TLS over TCP connections between the peers, where both sides authenticate using their private keys.
Each peer has its public key stored in the registry canister of the Network Nervous System (NNS). The registry canister also contains the most up-to-date subnet membership information (which nodes belong to which subnet), as well as historic information. Each node learns its own membership, as well as who its peers are, what their IP addresses are, and what their public keys are, by querying the registry canister. Nodes always make sure they only connect to peers in their subnet by enforcing two-way authentication when establishing the TLS connections.
Subnet memberships change over time, with nodes being added and removed from subnets regularly based on NNS accepted proposals. The Transport component continuously tracks these changes and adjusts the connections accordingly. In some cases, nodes need to maintain connections to peers that are no longer in the subnet record of the newest registry canister entries, while consensus still considers them to be part of the subnet, and thus the Transport component allows for such connections until they are no longer required.
The Transport component also provides mechanisms for keeping the connection alive, quickly identifies connection issues (using a heartbeat mechanism), and automatically reconnects when the connection breaks. Whenever a TCP connection is idle for more than 200ms, a heartbeat message is sent over. On the receive side, when no data (including heartbeats) is received for more than 5s, the connection is being dropped and reconnected. This is done to avoid waiting for a possibly very long timeout duration of the TCP protocol, which sometimes happens upon Internet routing change events. After a connection is re-established, a retransmission request is sent to the corresponding peer.