|
|
(27 intermediate revisions by 5 users not shown) |
Line 1: |
Line 1: |
− | The job of the consensus layer of the IC is to order transaction requests so that all replicas in a subnet will process
| + | Please see [https://internetcomputer.org/how-it-works/consensus/ How Consensus Layer works] |
− | such requests in the same order.
| |
− | There are many protocols in the literature for this problem.
| |
− | The IC uses a new consensus protocol, which is described here at a high level.
| |
− | For more details, see the paper https://eprint.iacr.org/2021/1330 (in particular, Protocol ICC1 in that paper).
| |
− | | |
− | Any secure consensus protocol should guarantee two properties, which (roughly stated) are:
| |
− | * <b>safety</b>: all replicas in fact agree on the same ordering of transaction requests, and
| |
− | * <b>liveness</b>: all replicas should make steady progress.
| |
− | The paper https://eprint.iacr.org/2021/1330 proves that the IC consensus protocol satisfies both of these properties
| |
− | | |
− | The IC consensus protocol is design to be
| |
− | * extremely simple, and
| |
− | * robust (performance degrades gracefully when some replicas are malicious).
| |
− | | |
− | As discussed [[The Internet Computer for Computer Scientists|in the introduction]], we assume
| |
− | * a subnet of <math>n</math> replicas, and
| |
− | * at most <math>f < n/3</math> of the replicas are faulty.
| |
− | Faulty replicas may exhibit arbitrary, malicious (i.e., Byzantine) behavior.
| |
− | To simplify the presentation, we assume here that <math>n=3f+1</math>.
| |
− | | |
− | | |
− | We assume that communication is <b>asynchronous</b>, with no <i>a priori</i>
| |
− | bound on the delay of messages sent between replicas.
| |
− | In fact, the scheduling of message delivery may be completely under adversarial control.
| |
− | The IC consensus protocol guarantees safety under this very weak communication assumption.
| |
− | However, to guarantee liveness, we need to assume a form of <b>partial synchrony</b>,
| |
− | which (roughly stated) says that the network will be periodically <b>timely</b> .
| |
− | Somewhat more precisely, there exists a bound <math>\Delta</math> such that
| |
− | periodically all undelivered
| |
− | messages will be delivered within time <math>\Delta</math>.
| |
− | The bound <math>\Delta</math> does not have to be known in advance
| |
− | (the protocol is initialized with a bound, but will dynamically
| |
− | adapt and increase this bound if it is too small).
| |
− | Regardless of whether we are assuming an asynchronous
| |
− | or a partially synchronous network,
| |
− | we assume that every message sent from one honest
| |
− | party to another will <i>eventually</i> be delivered.
| |
− | | |
− | Like a number of consensus protocols,
| |
− | the IC consensus protocol is based in a [[wikipedia:blockchain|blockchain]].
| |
− | As the protocol progresses, a tree of blocks is grown,
| |
− | starting from a special <i>genesis block</i> that is the root of the tree.
| |
− | Each non-genesis block in the tree contains (among other things)
| |
− | a <i>payload</i>, consisting of a sequence of transaction requests,
| |
− | and a hash of the block's parent in the tree.
| |
− | The honest replicas have a consistent view of this tree:
| |
− | while each replica may have a different, partial view
| |
− | of this tree, all the replicas have a view of the <i>same</i> tree.
| |
− | In addition, as the protocol progresses,
| |
− | there is always a path of <i>finalized</i> blocks in this tree.
| |
− | Again, the honest replicas have a consistent view of this path:
| |
− | while each replica may have a different, partial view
| |
− | of this path, all the parties have a view of the <i>same</i> path.
| |
− | The transaction requests in the payloads of the blocks along this
| |
− | path are the ordered transaction requests will be processed by the [[IC execution layer|execution layer]]
| |
− | of the Internet Computer.
| |
− | | |
− | The protocol proceeds in <b>rounds</b>.
| |
− | In round <math>h</math> of the protocol,
| |
− | one or more blocks of <b>height</b> <math>h</math> are added to the tree.
| |
− | That is, the blocks added in round <math>h</math> are always at a distance
| |
− | of exactly <math>h</math> from the root.
| |
− | In each round, a pseudo-random process is used to assign
| |
− | each replica a unique <b>rank</b>, which is an
| |
− | integer in the range <math>0,\ldots,h-1</math>.
| |
− | This is pseudo-random process is implemented using a <b>random beacon</b> (see below).
| |
− | The party of lowest rank is the <b>leader</b> of that round.
| |
− | When the leader is honest and the network is timely,
| |
− | the leader will propose a block, which will be added to the tree;
| |
− | moreover, this will be the <i>only</i> block added to the tree
| |
− | in this round and it will extend the finalized path.
| |
− | If the leader is not honest or the network is not timely,
| |
− | some other parties of higher rank may also propose blocks,
| |
− | and also have their blocks added to the tree.
| |
− | In any case, the logic of the protocol gives highest priority
| |
− | to the leader's proposed block
| |
− | and <i>some</i> block or blocks will be added to this tree in this round.
| |
− | | |
− | When the network is timely,
| |
− | the message complexity in a given round of the IC consensus protocol is <math>O(n^2)</math>
| |
− | with overwhelming probability.
| |
− | The protocol also enjoys the property known as <b>optimistic responsiveness</b>,
| |
− | which means that
| |
− | so long as the leader is honest, the protocol will proceed at the pace
| |
− | of the actual network delay, rather than some upper bound on
| |
− | the network delay.
| |
− | | |
− | ==Public keys==
| |
− | | |
− | To implement the protocol, each replica is associated with a
| |
− | public key for the [[wikipedia:BLS_digital_signature|BLS signature scheme]], and each replica also holds
| |
− | the corresponding secret signing key.
| |
− | The association of replicas to public keys is maintained by the
| |
− | [[IC network nervous system (NNS)|network nervous system (NNS) of the Internet Computer]].
| |
− | These BLS signatures will be used to authenticate messages
| |
− | sent by replicas.
| |
− | The protocol also uses the
| |
− | <b>signature aggregation</b> feature of BLS signatures,
| |
− | which allows many signatures on the same message to be aggregated into
| |
− | a compact multi-signature.
| |
− | | |
− | ==Random Beacon==
| |
− | | |
− | In addition to BLS signatures and multi-signatures as discussed above,
| |
− | the protocol makes use of a BLS [[wikipedia:Threshold_cryptosystem|threshold signature scheme]] to implement the
| |
− | above-mentioned <b>random beacon</b>.
| |
− | The random beacon for height <math>h</math> is a <math>(f+1)</math>-threshold signature
| |
− | on a message unique to height <math>h</math>.
| |
− | In each round of the protocol, each replica broadcasts its share of the beacon for
| |
− | the next round, so that when the next round begins, all replicas should have enough shares
| |
− | to reconstruct the beacon for that round.
| |
− | As discussed above, the random beacon at height <math>h</math> is used to
| |
− | assign a pseudo-random rank to each replica that will be used in round <math>h</math> of the protocol.
| |
− | Because of the security properties of the threshold signature,
| |
− | an adversary will not be able to predict the ranking of the replicas more than one round in advance,
| |
− | and these rankings will effectively be as good as random.
| |
− | To set up such a threshold signature scheme, the Internet Computer runs a
| |
− | [[wikipedia:Distributed_key_generation|distributed key generation (DKG)]] protocol.
| |
− | There are many DKG protocols in the literature.
| |
− | The Internet Computer implements a new [[non-interactive DKG (NiDKG) protocol]].
| |
− | | |
− | | |
− | ==Block making==
| |
− | | |
− | Each replica may at different points in time play the role of a <b>block maker</b>.
| |
− | As a block maker in round <math>h</math>,
| |
− | the replica proposes a block <math>B</math> of height <math>h</math>
| |
− | that to be child of a block <math>B'</math> of height <math>h-1</math>
| |
− | in the tree of blocks.
| |
− | To do this, the block maker first gathers together a <b>payload</b>
| |
− | consisting of all transaction requests it knows about
| |
− | (but not including those already included in payloads in blocks in the
| |
− | path through the tree ending at <math>B'</math>).
| |
− | The block <math>B</math> consists of
| |
− | * the payload,
| |
− | * the hash of <math>B'</math>,
| |
− | * the rank of the block maker,
| |
− | * the height <math>h</math> of the block.
| |
− | After forming the block <math>B</math>, the block maker forms
| |
− | a <b>block proposal</b>, consisting of
| |
− | * the block <math>B</math>,
| |
− | * the block maker's identity, and
| |
− | * the block maker's signature on <math>B</math>.
| |
− | A block maker will broadcast its block proposal to all other replicas.
| |
− | | |
− | ==Notarization==
| |
− | | |
− | A block is only added to the tree of blocks when it becomes
| |
− | <b>notarized</b>.
| |
− | For a block to become notarized,
| |
− | <math>2f+1</math>
| |
− | distinct replicas must support its notarization.
| |
− | Given a proposed <math>B</math> at height <math>h</math>,
| |
− | a replica will verify a number of conditions,
| |
− | and if these tests pass, the replica will support the notarization of the block
| |
− | by broadcasting a <b>notarization share</b> for <math>B</math>,
| |
− | consisting of
| |
− | * the hash of <math>B</math>,
| |
− | * the height <math>h</math> of <math>B</math>,
| |
− | * the identity of the supporting replica, and
| |
− | * the notarizing replica's signature on a message comprising the hash of <math>B</math> and the height <math>h</math>.
| |
− | Any set of <math>2f+1</math> notarization shares on <math>B</math>
| |
− | may be aggregated together to form a
| |
− | <b>notarization</b> for <math>B</math>,
| |
− | consisting of
| |
− | * the hash of <math>B</math>,
| |
− | * the height <math>h</math> of <math>B</math>,
| |
− | * the set of identities of the <math>2f+1</math> supporting replicas,
| |
− | * an aggregation of the <math>2f+1</math> signatures on the message comprising the hash of <math>B</math> and the height <math>h</math>.
| |
− | Each replica will finish round <math>h</math>
| |
− | as soon as it obtains a notarized block of height <math>h</math>,
| |
− | and will not subsequently notarize any other blocks at height <math>h</math>.
| |
− | | |
− | ==Finalization==
| |
− | | |
− | There may be more than one
| |
− | notarized block at a given height <math>h</math>.
| |
− | However, if a block is <b>finalized</b>, then we can be sure that there
| |
− | is no other notarized block at height <math>h</math>.
| |
− | Let us call this the <b>finalization invariant</b>.
| |
− | | |
− | For a block to become finalized,
| |
− | <math>2f+1</math>
| |
− | distinct replicas must support its finalization.
| |
− | Recall that round <math>h</math> ends for a replica when
| |
− | it obtains a notarized block <math>B</math> of height <math>h</math>.
| |
− | At that point in time, such a replica will
| |
− | check if it supported the notarization of any block at height <math>h</math>
| |
− | other than block <math>B</math>.
| |
− | If not, the replica will support the finalization of <math>B</math>
| |
− | by broadcasting a
| |
− | <b>finalization share</b> for <math>B</math>.
| |
− | A finalization share has exactly the same format as a notarization share
| |
− | (but is tagged in such a way notarization shares
| |
− | and finalization shares cannot be confused with one another).
| |
− | Any set of <math>2f+1</math> finalization shares on <math>B</math>
| |
− | may be aggregated together to form a <b>finalization</b> for <math>B</math>,
| |
− | which has exactly the same format as a notarization.
| |
− | | |
− | One can easily see that the logic for finalizing blocks
| |
− | implies that the finalization invariant must hold.
| |
− | Here is a proof:
| |
− | # Suppose that the number of corrupt parties is exactly <math>f^* \le f < n/3</math>.
| |
− | # If a block <math>B</math> is finalized, then its finalization must have been supported by a set <math>S</math> of at least <math>n-f-f^*</math> honest replicas (by the security property for aggregate signatures).
| |
− | # If another block <math>B' \ne B</math> were notarized, then its notarization must have been supported by a set <math>S'</math> of at least <math>n-f-f^*</math> honest replicas (again, by the security property for aggregate signatures).
| |
− | # The sets <math>S</math> and <math>S'</math> are disjoint (by the finalization logic).
| |
− | # Therefore, <math>n-f^* \ge |S| + |S'| \ge 2 (n-f-f^*)</math>, which implies <math>n \le 3f</math>, a contradiction.
| |
− | | |
− | | |
− | | |
− | One consequence of the finalization invariant is the following.
| |
− | Suppose one replica sees a finalized block <math>B</math> at height <math>h</math>
| |
− | and that another replica sees a finalized block
| |
− | <math>B'</math> at height <math>h' \le h</math>.
| |
− | Then the finalization invariant implies that the path in the tree of notarized blocks
| |
− | ending at <math>B'</math> is a prefix of the path ending at <math>B</math>
| |
− | (if not, then there would be two notarized blocks at height <math>h'</math>,
| |
− | contradicting the finalization invariant).
| |
− | Thus, whenever replica sees a finalized block <math>B</math>,
| |
− | it may view all ancestors of <math>B</math> as being
| |
− | <b>implicitly finalized</b>, and because of the
| |
− | finalization invariant, the <b>safety property</b> is
| |
− | guaranteed to hold for these (explicitly and implicitly) finalized blocks.
| |
− | | |
− | | |
− | | |
− | ==Delay functions==
| |
− | | |
− | The protocol makes use of two <b>delay functions</b>,
| |
− | <math>\Delta_\textrm{p}</math> and <math>\Delta_\textrm{n}</math>,
| |
− | which control
| |
− | the timing of block making and notarization activity.
| |
− | Both of these functions map the rank of the proposing replica
| |
− | no a nonnegative delay amount,
| |
− | and it is assumed that
| |
− | <math>\Delta_\textrm{p}(r) \le \Delta_\textrm{n}(r)</math>
| |
− | for all <math>r = 0, \ldots, n-1</math>.
| |
− | The recommended definition of these functions is
| |
− | <math>\Delta_\textrm{p}(r) = 2 \delta r</math>
| |
− | and
| |
− | <math>\Delta_\textrm{n}(r) = 2 \delta r + \epsilon</math>,
| |
− | where <math>\delta</math> is an upper bound on the time to deliver
| |
− | messages from one honest replica to another,
| |
− | and <math>\epsilon</math> is a "governor" to keep the
| |
− | protocol from running too fast.
| |
− | With these definitions, liveness will be ensured in those
| |
− | rounds in which (1) the leader is honest, and (2) messages really
| |
− | are delivered between honest parties within time <math>\delta</math>.
| |
− | Indeed,
| |
− | if (1) and (2) both hold in a given round, then the block proposed
| |
− | by the leader in that round will be finalized.
| |
− | Let us call this the <b>liveness invariant</b>.
| |
− | Note that even if (1) or (2) do not hold in a given round,
| |
− | the protocol is still guaranteed to make progress in the sense
| |
− | that at least one notarized block will be added to the tree in that round.
| |
− | Also note that the safety property of the protocol is always
| |
− | guaranteed.
| |
− | | |
− | ==Putting it all together==
| |
− | | |
− | We now describe in more detail how the protocol works;
| |
− | specifically, we describe more precisely when a replica will
| |
− | propose a block and when a replica will support the notarization of a block.
| |
− | A given replica <math>P</math> will record the time at which it
| |
− | enters a given round <math>h</math>,
| |
− | which happens when it has obtained (1) some notarization for a
| |
− | block of height <math>h-1</math>, and (2) the random beacon for
| |
− | round <math>h</math>.
| |
− | Since the random beacon for round <math>h</math> has been determined,
| |
− | <math>P</math> can determine its own rank <math>r_P</math>, as well as the
| |
− | rank <math>r_Q</math> of each other replica <math>Q</math>
| |
− | for round <math>h</math>.
| |
− | | |
− | ===Block making===
| |
− | | |
− | Replica <math>P</math> will only propose its own block <math>B_P</math>
| |
− | provided (1) at least <math>\Delta_\textrm{p}(r_P)</math> time
| |
− | units have passed since the beginning of the round,
| |
− | and (2) there is no <i>better</i> block available to <math>P</math>.
| |
− | A block is deemed to be <i>better</i> than <math>B_P</math>
| |
− | if it was proposed by a replica of rank less than <math>r_P</math>
| |
− | that has not been <i>disqualified</i>
| |
− | [[#Disqualify or relay|(see below)]].
| |
− | | |
− | Note that since <math>P</math> is guaranteed to have a notarized block
| |
− | of height <math>h-1</math>
| |
− | when it enters round <math>h</math>, it can make its proposed
| |
− | block a child of this notarized block (or any other
| |
− | notarized block of height <math>h-1</math> that it may have).
| |
− | | |
− | ===Notarization===
| |
− | | |
− | Replica <math>P</math> will support the notarization
| |
− | of a valid block <math>B_Q</math> proposed by a
| |
− | non-disqualified replica <math>Q</math> of rank
| |
− | <math>r_Q</math>
| |
− | provided
| |
− | (1) at least <math>\Delta_\textrm{n}(r_Q)</math> time
| |
− | units have passed since the beginning of the round,
| |
− | (2) there is no better block available to <math>P</math>,
| |
− | and (3) there is no
| |
− | <i>evidence of inconsistent behavior</i> by <math>Q</math> in this round
| |
− | [[#Disqualify or relay|(see below)]].
| |
− | | |
− | ===Disqualify or relay===
| |
− | | |
− | | |
− | Suppose <math>P</math> sees
| |
− | of a valid block <math>B_Q</math> proposed by a
| |
− | non-disqualified replica <math>Q</math> of rank
| |
− | <math>r_Q < r_P</math>
| |
− | such that
| |
− | (1) at least <math>\Delta_\textrm{p}(r_Q)</math> time
| |
− | units have passed since the beginning of the round,
| |
− | and (2) there is no better block available to <math>P</math>.
| |
− | At this point in time, <math>P</math> will either
| |
− | <b>disqualify</b> the replica <math>Q</math> (for the current round)
| |
− | or
| |
− | <b>relay</b> the block <math>B_Q</math>.
| |
− | | |
− | Party <math>P</math> will <b>disqualify</b> <math>Q</math> if there is
| |
− | <i>evidence of inconsistent behavior</i> by <math>Q</math> in this round.
| |
− | Such evidence consists of either (1) two different block proposals
| |
− | by <math>Q</math> in this round, or (2) a <i>proof of inconsistent behavior</i>
| |
− | obtained from another replica.
| |
− | Such a proof of inconsistent behavior is simply a message consisting
| |
− | of two distinct block proposals from <math>Q</math>.
| |
− | In either case, <math>P</math> will mark <math>Q</math> as <i>disqualified</i>
| |
− | for the current round, and broadcast to all replicas a proof of inconsistent
| |
− | behavior.
| |
− | Note that since block proposals are signed, it is infeasible
| |
− | to construct a proof of inconsistent behavior against an honest replica.
| |
− | | |
− | If <math>P</math> does not disqualify <math>Q</math>, and has not already
| |
− | relayed the block <math>B_Q</math>, it will <b>relay</b> <math>B_Q</math>, which
| |
− | means it will broadcast <math>B_Q</math> to all replicas and mark
| |
− | <math>B_Q</math> as <i>relayed</i>.
| |