The Internet Computer for Computer Scientists

To a first approximation, the IC (Internet Computer) is a network of replicated state machines.

Each replicated state machine comprises a subnet of replicas. Subnets may communicate with one another, but otherwise they operate (for the most part) independently of each other.

As in any replicated state machine, a series of transaction requests is processed. A transaction request may come from either an external client or from another state machine in the IC. The replicas in a subnet must run a consensus protocol to order the incoming transaction requests, so that each replica processes the transaction requests in the same order. Each replica processes the transaction requests in the agreed-upon order. In processing a transaction request, each replica will update its internal state according to a deterministic function that maps the pair (current state, transaction request) to a new state. Because all replicas in a subnet process transaction requests in the same order, their internal states will evolve over time in exactly the same way. In response to processing a transaction, a subnet may also generate an outgoing message, which can be sent to either an external client or to another state machine in the IC.

The reason for using a replicated state machine, rather than just a single state machine, is to achieve fault tolerance: a subnet should continue to function — and to function correctly — even if some replicas are faulty. Generally in this area, one considers two types of replica failures: crash failure and Byzantine failures. A crash failure occurs when a replica abruptly stops and does not resume. Byzantine failures are failures in which a replica may deviate in an arbitrary way from its prescribed protocol. Moreover, with Byzantine failures, one or possibly several replicas may be directly under the control of a malicious adversary. Of the two types of failures, Byzantine failures are potentially far more disruptive.

Protocols for consensus and for implementing replicated state machines typically make assumptions about how many replicas may be faulty and to to what degree (crash or Byzantine) they may be faulty. In the IC, the assumption is that if a given subnet has [math]\displaystyle{ n }[/math] replicas, then less than [math]\displaystyle{ n/3 }[/math] of these replicas are faulty and these faults may be Byzantine. (Note that the different subnets in the IC may have different sizes.)

Protocols for consensus and for implementing replicated state machines also typically make assumptions about the communication model, which characterizes the ability of an adversary to delay the delivery of messages between replicas. At opposite ends of the spectrum, there are the following models:

In the synchronous model, there exists some known finite time bound [math]\displaystyle{ \delta }[/math], such that for any message sent, it will be delivered in less than time [math]\displaystyle{ \delta }[/math].
In the asynchronous model, for any message sent, the adversary can delay its delivery by any finite amount of time, so that there is no bound on the time to deliver a message; however, each message must eventually be delivered.

Since the replicas in an IC-subnet are typically distributed around the globe, a synchronous communication model would be highly unrealistic. In this setting, the most robust model is the asynchronous model. Unfortunately, there are no known consensus protocols in this model that are truly practical. So like most other practical Byzantine fault tolerant systems that cannot rely on synchronous communication, the IC opts for a compromise: a partial synchrony communication model. Such partial synchrony models can be formulated in various ways. The partial synchrony assumption used by the IC says, roughly speaking, that for each subnet, communication among replicas in that subnet is periodically synchronous for short intervals of time; moreover, the synchrony bound [math]\displaystyle{ \delta }[/math] does not need to be known in advance. This partial synchrony assumption is only needed to ensure that the consensus protocol makes progress (the so-called liveness property) — it is not needed to ensure correct behavior of consensus (the so-called safety property), nor is it needed anywhere else in the IC protocol stack.

Under the assumption of partial synchrony and Byzantine faults, it is known that our bound of [math]\displaystyle{ f \lt n/3 }[/math] on the number of faults is optimal.