|
|
(26 intermediate revisions by 5 users not shown) |
Line 1: |
Line 1: |
− | The job of the consensus layer of the IC is to order transaction requests so that all replicas in a subnet will process
| + | Please see [https://internetcomputer.org/how-it-works/consensus/ How Consensus Layer works] |
− | such requests in the same order.
| |
− | There are many protocols in the literature for this problem.
| |
− | The IC uses a new consensus protocol, which is described here at a high level.
| |
− | For more details, see the paper https://eprint.iacr.org/2021/1330 (in particular, Protocol ICC1 in that paper).
| |
− | | |
− | Any secure consensus protocol should guarantee two properties, which (roughly stated) are:
| |
− | * <b>safety</b>: all replicas in fact agree on the same ordering of transaction requests, and
| |
− | * <b>liveness</b>: all replicas should make steady progress.
| |
− | The paper https://eprint.iacr.org/2021/1330 proves that the IC consensus protocol satisfies both of these properties
| |
− | | |
− | The IC consensus protocol is designed to be
| |
− | * extremely simple, and
| |
− | * robust (performance degrades gracefully when some replicas are malicious).
| |
− | | |
− | ==Assumptions==
| |
− | | |
− | As discussed [[The Internet Computer for Computer Scientists|in the introduction]], we assume
| |
− | * a subnet of <math>n</math> replicas, and
| |
− | * at most <math>f < n/3</math> of the replicas are faulty.
| |
− | Faulty replicas may exhibit arbitrary, malicious (i.e., Byzantine) behavior.
| |
− | To simplify the presentation, we assume here that <math>n=3f+1</math>.
| |
− | | |
− | | |
− | We assume that communication is <b>asynchronous</b>, with no <i>a priori</i>
| |
− | bound on the delay of messages sent between replicas.
| |
− | In fact, the scheduling of message delivery may be completely under adversarial control.
| |
− | The IC consensus protocol guarantees safety under this very weak communication assumption.
| |
− | However, to guarantee liveness, we need to assume a form of <b>partial synchrony</b>,
| |
− | which (roughly stated) says that the network will be periodically synchronous
| |
− | for short intervals of time.
| |
− | In such intervals of synchrony, all undelivered
| |
− | messages will be delivered in less than time <math>\delta</math>,
| |
− | for some fixed bound <math>\delta</math>.
| |
− | The bound <math>\delta</math> does not have to be known in advance
| |
− | (the protocol is initialized with a reasonable bound, but will dynamically
| |
− | adapt and increase this bound if it is too small).
| |
− | Regardless of whether we are assuming an asynchronous
| |
− | or a partially synchronous network,
| |
− | we assume that every message sent from one honest
| |
− | party to another will <i>eventually</i> be delivered.
| |
− | | |
− | ==Protocol overview==
| |
− | | |
− | Like a number of consensus protocols,
| |
− | the IC consensus protocol is based in a [[wikipedia:blockchain|blockchain]].
| |
− | As the protocol progresses, a tree of blocks is grown,
| |
− | starting from a special <i>genesis block</i> that is the root of the tree.
| |
− | Each non-genesis block in the tree contains (among other things)
| |
− | a <i>payload</i>, consisting of a sequence of transaction requests,
| |
− | and a hash of the block's parent in the tree.
| |
− | The honest replicas have a consistent view of this tree:
| |
− | while each replica may have a different, partial view
| |
− | of this tree, all the replicas have a view of the <i>same</i> tree.
| |
− | In addition, as the protocol progresses,
| |
− | there is always a path of <i>finalized</i> blocks in this tree.
| |
− | Again, the honest replicas have a consistent view of this path:
| |
− | while each replica may have a different, partial view
| |
− | of this path, all the parties have a view of the <i>same</i> path.
| |
− | The transaction requests in the payloads of the blocks along this
| |
− | path are the ordered transaction requests will be processed by the [[IC execution layer|execution layer]]
| |
− | of the Internet Computer.
| |
− | | |
− | The protocol proceeds in <b>rounds</b>.
| |
− | In round <math>h</math> of the protocol,
| |
− | one or more blocks of <b>height</b> <math>h</math> are added to the tree.
| |
− | That is, the blocks added in round <math>h</math> are always at a distance
| |
− | of exactly <math>h</math> from the root.
| |
− | In each round, a pseudo-random process is used to assign
| |
− | each replica a unique <b>rank</b>, which is an
| |
− | integer in the range <math>0,\ldots,h-1</math>.
| |
− | This pseudo-random process is implemented using a <b>random beacon</b>
| |
− | [[#Random Beacon|(see below)]].
| |
− | The party of lowest rank is the <b>leader</b> of that round.
| |
− | When the leader is honest and the network is synchronous,
| |
− | the leader will propose a block, which will be added to the tree;
| |
− | moreover, this will be the <i>only</i> block added to the tree
| |
− | in this round and it will extend the finalized path.
| |
− | If the leader is not honest or the network is not synchronous,
| |
− | some other parties of higher rank may also propose blocks,
| |
− | and also have their blocks added to the tree.
| |
− | In any case, the logic of the protocol gives highest priority
| |
− | to the leader's proposed block
| |
− | and <i>some</i> block or blocks will be added to this tree in this round.
| |
− | Even if the protocol proceeds for a few rounds without extending the
| |
− | finalized path, the height of the tree will continue to grow
| |
− | with each round, so that when the finalized path is extended in round <math>h</math>,
| |
− | the finalized path will be of length <math>h</math>.
| |
− | A consequence of this,
| |
− | even if the <i>latency</i> occasionally
| |
− | occasionally increases because of faulty replicas or unexpectedly
| |
− | high network latency,
| |
− | the <i>throughput</i> of the protocol
| |
− | remains essentially constant.
| |
− | | |
− | The protocol also enjoys the property known as <b>optimistic responsiveness</b>,
| |
− | which means that
| |
− | so long as the leader is honest, the protocol may proceed at the pace
| |
− | of the actual network delay, rather than some upper bound on
| |
− | the network delay.
| |
− | | |
− | ==Public keys==
| |
− | | |
− | To implement the protocol, each replica is associated with a
| |
− | public key for the [[wikipedia:BLS_digital_signature|BLS signature scheme]], and each replica also holds
| |
− | the corresponding secret signing key.
| |
− | The association of replicas to public keys is maintained by the
| |
− | [[IC network nervous system (NNS)|network nervous system (NNS) of the Internet Computer]].
| |
− | These BLS signatures will be used to authenticate messages
| |
− | sent by replicas.
| |
− | The protocol also uses the
| |
− | <b>signature aggregation</b> feature of BLS signatures,
| |
− | which allows many signatures on the same message to be aggregated into
| |
− | a compact multi-signature.
| |
− | | |
− | ==Random Beacon==
| |
− | | |
− | In addition to BLS signatures and multi-signatures as discussed above,
| |
− | the protocol makes use of a BLS [[wikipedia:Threshold_cryptosystem|threshold signature scheme]] to implement the
| |
− | above-mentioned <b>random beacon</b>.
| |
− | The random beacon for height <math>h</math> is a <math>(f+1)</math>-threshold signature
| |
− | on a message unique to height <math>h</math>.
| |
− | In each round of the protocol, each replica broadcasts its share of the beacon for
| |
− | the next round, so that when the next round begins, all replicas should have enough shares
| |
− | to reconstruct the beacon for that round.
| |
− | As discussed above, the random beacon at height <math>h</math> is used to
| |
− | assign a pseudo-random rank to each replica that will be used in round <math>h</math> of the protocol.
| |
− | Because of the security properties of the threshold signature,
| |
− | an adversary will not be able to predict the ranking of the replicas more than one round in advance,
| |
− | and these rankings will effectively be as good as random.
| |
− | To set up such a threshold signature scheme, the Internet Computer runs a
| |
− | [[wikipedia:Distributed_key_generation|distributed key generation (DKG)]] protocol.
| |
− | There are many DKG protocols in the literature.
| |
− | The Internet Computer implements a new [[non-interactive DKG (NiDKG) protocol]].
| |
− | | |
− | | |
− | ==Block making==
| |
− | | |
− | Each replica may at different points in time play the role of a <b>block maker</b>.
| |
− | As a block maker in round <math>h</math>,
| |
− | the replica proposes a block <math>B</math> of height <math>h</math>
| |
− | that to be child of a block <math>B'</math> of height <math>h-1</math>
| |
− | in the tree of blocks.
| |
− | To do this, the block maker first gathers together a <b>payload</b>
| |
− | consisting of all transaction requests it knows about
| |
− | (but not including those already included in payloads in blocks in the
| |
− | path through the tree ending at <math>B'</math>).
| |
− | The block <math>B</math> consists of
| |
− | * the payload,
| |
− | * the hash of <math>B'</math>,
| |
− | * the rank of the block maker,
| |
− | * the height <math>h</math> of the block.
| |
− | After forming the block <math>B</math>, the block maker forms
| |
− | a <b>block proposal</b>, consisting of
| |
− | * the block <math>B</math>,
| |
− | * the block maker's identity, and
| |
− | * the block maker's signature on <math>B</math>.
| |
− | A block maker will broadcast its block proposal to all other replicas.
| |
− | | |
− | ==Notarization==
| |
− | | |
− | A block is effectively added to the tree of blocks when it becomes
| |
− | <b>notarized</b>.
| |
− | For a block to become notarized,
| |
− | <math>2f+1</math>
| |
− | distinct replicas must <b>support</b> its notarization.
| |
− | | |
− | Given a proposed block <math>B</math> at height <math>h</math>,
| |
− | a replica will determine if the proposal is <b>valid</b>,
| |
− | which means that <math>B</math> has the syntactic form described above.
| |
− | In particular, <math>B</math> should contain the hash of
| |
− | a block <math>B'</math> of height <math>h'</math> that is already
| |
− | in the tree of blocks (i.e., already notarized).
| |
− | In addition, the payload of <math>B</math> must satisfy certain
| |
− | conditions (in particulat, all of the transaction requests in the payload
| |
− | must satisfy various constraints, but these constraints are unrelated
| |
− | to consensus).
| |
− | Also, the rank of the block maker (as recorded in the block <math>B</math>)
| |
− | must match
| |
− | the rank assigned in round <math>h</math> by the random beacon
| |
− | to the replica that proposed the block (as recorded in the block proposal) .
| |
− | | |
− | If the block is valid and certain other constraints hold,
| |
− | the replica will <b>support</b> the notarization of the block
| |
− | by broadcasting a <b>notarization share</b> for <math>B</math>,
| |
− | consisting of
| |
− | * the hash of <math>B</math>,
| |
− | * the height <math>h</math> of <math>B</math>,
| |
− | * the identity of the supporting replica, and
| |
− | * the supporting replica's signature on a message comprising the hash of <math>B</math> and the height <math>h</math>.
| |
− | Any set of <math>2f+1</math> notarization shares on <math>B</math>
| |
− | may be aggregated together to form a
| |
− | <b>notarization</b> for <math>B</math>,
| |
− | consisting of
| |
− | * the hash of <math>B</math>,
| |
− | * the height <math>h</math> of <math>B</math>,
| |
− | * the set of identities of the <math>2f+1</math> supporting replicas,
| |
− | * an aggregation of the <math>2f+1</math> signatures on the message comprising the hash of <math>B</math> and the height <math>h</math>.
| |
− | Each replica will finish round <math>h</math>
| |
− | as soon as it obtains a notarized block of height <math>h</math>,
| |
− | and will not subsequently support the notarization of
| |
− | any other blocks at height <math>h</math>.
| |
− | At this point in time, such a replica will also broadcast this
| |
− | notarization to all other replicas.
| |
− | | |
− | | |
− | | |
− | | |
− | ==Finalization==
| |
− | | |
− | There may be more than one
| |
− | notarized block at a given height <math>h</math>.
| |
− | However, if a block is <b>finalized</b>, then we can be sure that there
| |
− | is no other notarized block at height <math>h</math>.
| |
− | Let us call this the <b>safety invariant</b>.
| |
− | | |
− | For a block to become finalized,
| |
− | <math>2f+1</math>
| |
− | distinct replicas must support its finalization.
| |
− | Recall that round <math>h</math> ends for a replica when
| |
− | it obtains a notarized block <math>B</math> of height <math>h</math>.
| |
− | At that point in time, such a replica will
| |
− | check if it supported the notarization of any block at height <math>h</math>
| |
− | <i>other</i> than block <math>B</math> (it may or may not have
| |
− | supported the notarization of <math>B</math> itself).
| |
− | If not, the replica will support the finalization of <math>B</math>
| |
− | by broadcasting a
| |
− | <b>finalization share</b> for <math>B</math>.
| |
− | A finalization share has exactly the same format as a notarization share
| |
− | (but is tagged in such a way notarization shares
| |
− | and finalization shares cannot be confused with one another).
| |
− | Any set of <math>2f+1</math> finalization shares on <math>B</math>
| |
− | may be aggregated together to form a <b>finalization</b> for <math>B</math>,
| |
− | which has exactly the same format as a notarization
| |
− | (but again, is appropriately tagged).
| |
− | Any replica that obtains a finalized block will broadcast the finalization
| |
− | to all other replicas.
| |
− | | |
− | We prove the safety invariant
| |
− | [[#Proof of safety invariant|below]].
| |
− | One consequence of the safety invariant is the following.
| |
− | Suppose two blocks <math>B</math> and <math>B'</math> are finalized,
| |
− | where <math>B</math> has height <math>h</math>, <math>B'</math> has height
| |
− | <math>h'\le h</math>.
| |
− | Then the safety invariant implies that the path
| |
− | in the tree of notarized blocks
| |
− | ending at <math>B'</math> is a prefix of the path ending at <math>B</math>
| |
− | (if not, then there would be two notarized blocks at height <math>h'</math>,
| |
− | contradicting the finalization invariant).
| |
− | Thus, whenever a replica sees a finalized block <math>B</math>,
| |
− | it may view all ancestors of <math>B</math> as being
| |
− | <b>implicitly finalized</b>, and because of the
| |
− | safety invariant, the <b>safety property</b> is
| |
− | guaranteed to hold for these (explicitly and implicitly) finalized blocks.
| |
− | | |
− | | |
− | | |
− | ==Delay functions==
| |
− | | |
− | The protocol makes use of two <b>delay functions</b>,
| |
− | <math>\Delta_\textrm{m}</math> and <math>\Delta_\textrm{n}</math>,
| |
− | which control
| |
− | the timing of block making and notarization activity.
| |
− | Both of these functions map the rank <math>r</math> of the proposing replica
| |
− | no a nonnegative delay amount,
| |
− | and it is assumed that each function is monotonely increasing in <math>r</math>,
| |
− | and that
| |
− | <math>\Delta_\textrm{m}(r) \le \Delta_\textrm{n}(r)</math>
| |
− | for all <math>r = 0, \ldots, n-1</math>.
| |
− | The recommended definition of these functions is
| |
− | <math>\Delta_\textrm{m}(r) = 2 \delta r</math>
| |
− | and
| |
− | <math>\Delta_\textrm{n}(r) = 2 \delta r + \epsilon</math>,
| |
− | where <math>\delta</math> is an upper bound on the time to deliver
| |
− | messages from one honest replica to another,
| |
− | and <math>\epsilon \ge 0</math> is a "governor" to keep the
| |
− | protocol from running too fast.
| |
− | With these definitions, liveness will be ensured in those
| |
− | rounds in which (1) the leader is honest, and (2) messages really
| |
− | are delivered between honest parties within time <math>\delta</math>.
| |
− | Indeed,
| |
− | if (1) and (2) both hold in a given round, then the block proposed
| |
− | by the leader in that round will be finalized.
| |
− | Let us call this the <b>liveness invariant</b>.
| |
− | We prove this
| |
− | [[#Proof of liveness invariant|below]].
| |
− | | |
− | | |
− | ==Putting it all together==
| |
− | | |
− | We now describe in more detail how the protocol works;
| |
− | specifically, we describe more precisely when a replica will
| |
− | propose a block and when a replica will support the notarization of a block.
| |
− | A given replica <math>P</math> will record the time at which it
| |
− | enters a given round <math>h</math>,
| |
− | which happens when it has obtained (1) some notarization for a
| |
− | block of height <math>h-1</math>, and (2) the random beacon for
| |
− | round <math>h</math>.
| |
− | Since the random beacon for round <math>h</math> has been determined,
| |
− | <math>P</math> can determine its own rank <math>r_P</math>, as well as the
| |
− | rank <math>r_Q</math> of each other replica <math>Q</math>
| |
− | for round <math>h</math>.
| |
− | | |
− | ===Block making===
| |
− | | |
− | Replica <math>P</math> will only propose its own block <math>B_P</math>
| |
− | provided (1) at least <math>\Delta_\textrm{m}(r_P)</math> time
| |
− | units have passed since the beginning of the round,
| |
− | and (2) there is no valid lower ranked block currently seen by <math>P</math>.
| |
− | | |
− | Note that since <math>P</math> is guaranteed to have a notarized block
| |
− | of height <math>h-1</math>
| |
− | when it enters round <math>h</math>, it can make its proposed
| |
− | block a child of this notarized block (or any other
| |
− | notarized block of height <math>h-1</math> that it may have).
| |
− | | |
− | ===Notarization===
| |
− | | |
− | Replica <math>P</math> will support the notarization
| |
− | of a valid block <math>B_Q</math> proposed by a
| |
− | replica <math>Q</math> of rank
| |
− | <math>r_Q</math>
| |
− | provided
| |
− | (1) at least <math>\Delta_\textrm{n}(r_Q)</math> time
| |
− | units have passed since the beginning of the round,
| |
− | and
| |
− | (2) there is no block of rank less than <math>r_Q</math>
| |
− | currently seen by <math>P</math>.
| |
− | | |
− | ===Block proposal relay===
| |
− | | |
− | | |
− | Suppose <math>P</math> sees
| |
− | a valid block proposal that is proposed by a
| |
− | replica <math>Q</math> of rank
| |
− | <math>r_Q < r_P</math>
| |
− | such that
| |
− | (1) at least <math>\Delta_\textrm{m}(r_Q)</math> time
| |
− | units have passed since the beginning of the round,
| |
− | and (2) there is no block of rank less than <math>r_Q</math>
| |
− | currently seen by <math>P</math>.
| |
− | Then at this point in time, if it has not already done so,
| |
− | <math>P</math> will broadcast this block proposal to all other replicas.
| |
− | | |
− | ===Proof of liveness invariant===
| |
− | | |
− | Here is a proof of the liveness invariant:
| |
− | # Suppose the first honest <math>Q</math> replica finishes round <math>h-1</math> at time <math>t</math>.
| |
− | # Under partial synchrony at time <math>t</math>, all honest replicas will enter round <math>h</math> before time <math>t+\delta</math> (the notarization that ended round <math>h-1</math> for <math>Q</math>, as well as the necessary round <math>h</math> random beacon shares, will arrive at all honest replicas before this time).
| |
− | # The leader <math>P</math> in round <math>h</math> will propose a block <math>B</math> before time <math>t+\delta</math>, and again by partial synchrony, this block proposal will be delivered to all other replicas before time <math>t+2\delta</math>.
| |
− | # Assuming <math>\Delta_\mathrm{n}(1) \ge \Delta_\mathrm{m}(0) + 2\delta</math>, the protocol logic guarantees that each honest replicas support the notarization of block <math>B</math> and no other block, and thus <math>B</math> will become notarized and finalized.
| |
− | | |
− | | |
− | ===Proof of safety invariant===
| |
− | | |
− | Here is a proof of the safety invariant:
| |
− | # Suppose that the number of corrupt parties is exactly <math>f^* \le f < n/3</math>.
| |
− | # If a block <math>B</math> is finalized, then its finalization must have been supported by a set <math>S</math> of at least <math>n-f-f^*</math> honest replicas (by the security property for aggregate signatures).
| |
− | # Suppose (by way of contradction) that another block <math>B' \ne B</math> were notarized. Then its notarization must have been supported by a set <math>S'</math> of at least <math>n-f-f^*</math> honest replicas (again, by the security property for aggregate signatures).
| |
− | # The sets <math>S</math> and <math>S'</math> are disjoint (by the finalization logic).
| |
− | # Therefore, <math>n-f^* \ge |S \cup S'| = |S| + |S'| \ge 2 (n-f-f^*)</math>, which implies <math>n \le 3f</math>, a contradiction.
| |