Difference between revisions of "IC consensus layer"

From Internet Computer Wiki
Jump to: navigation, search
m
(Replaced content with "Please see [https://internetcomputer.org/how-it-works/consensus/ How Consensus Layer works]")
Tag: Replaced
 
Line 1: Line 1:
 
+
Please see [https://internetcomputer.org/how-it-works/consensus/ How Consensus Layer works]
The job of the consensus layer of the IC is to order transaction requests so that all replicas in a subnet will process
 
such requests in the same order.
 
There are many protocols in the literature for this problem.
 
The IC uses a new consensus protocol, which is described here at a high level.
 
For more details, see the paper https://eprint.iacr.org/2021/632 (in particular, Protocol ICC1 in that paper).
 
 
 
Any secure consensus protocol should guarantee two properties, which (roughly stated) are:
 
* <b>safety</b>: all replicas in fact agree on the same ordering of transaction requests, and
 
* <b>liveness</b>: all replicas should make steady progress.
 
The paper https://eprint.iacr.org/2021/632 proves that the IC consensus protocol satisfies both of these properties.
 
 
 
The IC consensus protocol is designed to be
 
* <b>simple</b>,
 
* <b>robust</b> (performance degrades gracefully when some replicas are crashed or malicious), and
 
* <b>optimistically responsive</b> (when certain optimistic conditions are met, the protocol may proceed at the pace of the actual network delay, rather than some worst-case upper bound on the network delay).
 
 
 
==Assumptions==
 
 
 
As discussed [[The Internet Computer for Computer Scientists|in the introduction]], assume
 
* a subnet of <math>n</math> replicas, and
 
* at most <math>f < n/3</math> of the replicas are faulty.
 
Faulty replicas may exhibit  arbitrary, malicious (i.e., Byzantine) behavior.
 
 
 
 
 
Assume communication is <b>asynchronous</b>, with no <i>a priori</i>
 
bound on the delay of messages sent between replicas.
 
In fact, the scheduling of message delivery may be completely under adversarial control.
 
The IC consensus protocol guarantees safety under this very weak communication assumption.
 
However, to guarantee liveness, a form of <b>partial synchrony</b> is needed,
 
which (roughly stated) says that the network will be periodically synchronous
 
for short intervals of time.
 
In such intervals of synchrony, all undelivered messages will be delivered in less than time <math>\delta</math>,
 
for some fixed bound <math>\delta</math>.
 
The bound <math>\delta</math> does not have to be known in advance (the protocol is initialized with a reasonable bound, but will dynamically adapt and increase this bound if it is too small).
 
Regardless of whether an asynchronous
 
or a partially synchronous network is assumed,
 
it is assumed that every message sent from one honest
 
party to another will <i>eventually</i> be delivered.
 
 
 
==Protocol overview==
 
 
 
Like a number of consensus protocols,
 
the IC consensus protocol is based on a [[wikipedia:blockchain|blockchain]].
 
As the protocol progresses, a tree of blocks is grown,
 
starting from a special <i>genesis block</i> that is the root of the tree.
 
Each non-genesis block in the tree contains (among other things)
 
a <i>payload</i>, consisting of a sequence of transaction requests,
 
and a hash of the block's parent in the tree.
 
The honest replicas have a consistent view of this tree:
 
while each replica may have a different, partial view
 
of this tree, all the replicas have a view of the <i>same</i> tree.
 
In addition, as the protocol progresses,
 
there is always a path of <i>finalized</i> blocks in this tree.
 
Again, the honest replicas have a consistent view of this path:
 
while each replica may have a different, partial view
 
of this path, all the parties have a view of the <i>same</i> path.
 
The transaction requests in the payloads of the blocks along this
 
path are the ordered transaction requests will be processed by the [[IC execution layer|execution layer]]
 
of the Internet Computer.
 
 
 
The protocol proceeds in <b>rounds</b>.
 
In round <math>h</math> of the protocol,
 
one or more blocks of <b>height</b> <math>h</math> are added to the tree.
 
That is, the blocks added in round <math>h</math> are always at a distance
 
of exactly <math>h</math> from the root.
 
In each round, a pseudo-random process is used to assign
 
each replica a unique <b>rank</b>, which is an
 
integer in the range <math>0,\ldots,n-1</math>.
 
This pseudo-random process is implemented using a <b>random beacon</b>
 
[[#Random Beacon|(see below)]].
 
The party of lowest rank is the <b>leader</b> of that round.
 
When the leader is honest and the network is synchronous,
 
the leader will propose a block, which will be added to the tree;
 
moreover, this will be the <i>only</i> block added to the tree
 
in this round and it will extend the finalized path.
 
If the leader is not honest or the network is not synchronous,
 
some other parties of higher rank may also propose blocks,
 
and also have their blocks added to the tree.
 
In any case, the logic of the protocol gives highest priority
 
to the leader's proposed block
 
and <i>some</i> block or blocks will be added to this tree in this round.
 
Even if the protocol proceeds for a few rounds without extending the
 
finalized path, the height of the tree will continue to grow
 
with each round, so that when the finalized path is extended in round <math>h</math>,
 
the finalized path will be of length <math>h</math>.
 
A consequence of this,
 
even if the <i>latency</i> occasionally
 
occasionally increases because of faulty replicas or unexpectedly
 
high network latency,
 
the <i>throughput</i> of the protocol
 
remains essentially constant.
 
 
 
 
 
 
 
==Public keys==
 
 
 
To implement the protocol, each replica is associated with a
 
public key for the [[wikipedia:BLS_digital_signature|BLS signature scheme]], and each replica also holds
 
the corresponding secret signing key.
 
The association of replicas to public keys is maintained by the [[Network Nervous System]] (NNS) of the Internet Computer.
 
These BLS signatures will be used to authenticate messages
 
sent by replicas.
 
 
 
The protocol also uses the
 
<b>signature aggregation</b> feature of BLS signatures,
 
which allows many signatures on the same message to be aggregated into
 
a compact multi-signature.
 
The protocol will use these multi-signatures
 
for [[#Notarization|notarizations]] and [[#Finalization|finalizations]],
 
which are aggregations of <math>n-f</math> signatures on messages
 
of a certain form.
 
 
 
==Random Beacon==
 
 
 
In addition to BLS signatures and multi-signatures as discussed above,
 
the protocol makes use of a BLS [[wikipedia:Threshold_cryptosystem|threshold signature scheme]] to implement the
 
above-mentioned <b>random beacon</b>.
 
The random beacon for height <math>h</math> is a <math>(f+1)</math>-threshold signature
 
on a message unique to height <math>h</math>.
 
In each round of the protocol, each replica broadcasts its share of the beacon for
 
the next round, so that when the next round begins, all replicas should have enough shares
 
to reconstruct the beacon for that round.
 
As discussed above, the random beacon at height <math>h</math> is used to
 
assign a pseudo-random rank to each replica that will be used in round <math>h</math> of the protocol.
 
Because of the security properties of the threshold signature,
 
an adversary will not be able to predict the ranking of the replicas more than one round in advance,
 
and these rankings will effectively be as good as random.
 
To set up such a threshold signature scheme, the Internet Computer runs a
 
[[wikipedia:Distributed_key_generation|distributed key generation (DKG)]] protocol.
 
There are many DKG protocols in the literature.
 
The Internet Computer implements a new [[non-interactive DKG (NiDKG) protocol]].
 
 
 
 
 
==Block making==
 
 
 
Each replica may at different points in time play the role of a <b>block maker</b>.
 
As a block maker in round <math>h</math>,
 
the replica proposes a block <math>B</math> of height <math>h</math>
 
that is to be a child of some block <math>B'</math> of height <math>h-1</math>
 
in the tree of blocks.
 
To do this, the block maker first gathers together a <b>payload</b>
 
consisting of all transaction requests it knows about
 
(but not including those already included in payloads in blocks in the
 
path through the tree ending at <math>B'</math>).
 
The block <math>B</math> consists of
 
* the payload,
 
* the hash of  <math>B'</math>,
 
* the rank of the block maker,
 
* the height <math>h</math> of the block.
 
After forming the block <math>B</math>, the block maker forms
 
a <b>block proposal</b>, consisting of
 
* the block <math>B</math>,
 
* the block maker's identity, and
 
* the block maker's signature on <math>B</math>.
 
A block maker will broadcast its block proposal to all other replicas.
 
 
 
==Notarization==
 
 
 
A block is effectively added to the tree of blocks when it becomes
 
<b>notarized</b>.
 
For a block to become notarized,
 
<math>n-f</math>
 
distinct replicas must <b>support</b> its notarization.
 
 
 
Given a proposed block <math>B</math> at height <math>h</math>,
 
a replica will determine if the proposal is <b>valid</b>,
 
which means that <math>B</math> has the syntactic form described above.
 
In particular, <math>B</math> should contain the hash of
 
a block <math>B'</math> of height <math>h'</math> that is already
 
in the tree of blocks (i.e., already notarized).
 
In addition, the payload of <math>B</math> must satisfy certain
 
conditions (in particular, all of the transaction requests in the payload
 
must satisfy various constraints, but these constraints are unrelated
 
to consensus).
 
Also, the rank of the block maker (as recorded in the block <math>B</math>)
 
must match
 
the rank assigned in round <math>h</math> by the random beacon
 
to the replica that proposed the block (as recorded in the block proposal) .
 
 
 
If the block is valid and certain other constraints hold,
 
the replica will <b>support</b> the notarization of the block
 
by broadcasting a <b>notarization share</b> for <math>B</math>,
 
consisting of
 
* the hash of <math>B</math>,
 
* the height <math>h</math> of <math>B</math>,
 
* the identity of the supporting replica, and
 
* the supporting replica's signature on a message comprising the hash of <math>B</math> and the height <math>h</math>.
 
Any set of <math>n-f</math> notarization shares on <math>B</math>
 
may be aggregated together to form a
 
<b>notarization</b> for <math>B</math>,
 
consisting of
 
* the hash of <math>B</math>,
 
* the height <math>h</math> of <math>B</math>,
 
* the set of identities of the <math>n-f</math> supporting replicas,
 
* an aggregation of the <math>n-f</math> signatures on the message comprising the hash of <math>B</math> and the height <math>h</math>.
 
 
 
As soon as a replica obtains a notarized block of height <math>h</math>,
 
it will finish round <math>h</math>,
 
and will subsequently not support the notarization of
 
any other blocks at height <math>h</math>.
 
At this point in time, such a replica will also relay this
 
notarization to all other replicas.
 
Note that this replica may have obtained the notarization either by
 
(1) receiving it from another replica, or (2) aggregating <math>n-f</math>
 
notarization shares that it has received.
 
 
 
 
 
The <b>growth invariant</b> states that each honest replica will
 
eventually complete each round and start the next, so that the tree of notarized blocks
 
continues to grow (and this holds only assuming asynchronous eventual delivery,
 
and not partial synchrony).
 
The proof of growth invariant can be found [[#Proof of growth invariant|below]].
 
 
 
 
 
 
 
==Finalization==
 
 
 
There may be more than one
 
notarized block at a given height <math>h</math>.
 
However, if a block is <b>finalized</b>, then it is sure that there
 
is no other notarized block at height <math>h</math>.
 
Let us call this the <b>safety invariant</b>.
 
 
 
For a block to become finalized,
 
<math>n-f</math>
 
distinct replicas must <b>support</b> its finalization.
 
Recall that round <math>h</math> ends for a replica when
 
it obtains a notarized block <math>B</math> of height <math>h</math>.
 
At that point in time, such a replica will
 
check if it supported the notarization of any block at height <math>h</math>
 
<i>other</i> than block <math>B</math> (it may or may not have
 
supported the notarization of <math>B</math> itself).
 
If not, the replica will <b>support</b> the finalization of <math>B</math>
 
by broadcasting a
 
<b>finalization share</b> for <math>B</math>.
 
A finalization share has exactly the same format as a notarization share
 
(but is tagged in such a way notarization shares
 
and finalization shares cannot be confused with one another).
 
Any set of <math>n-f</math> finalization shares on <math>B</math>
 
may be aggregated together to form a <b>finalization</b> for <math>B</math>,
 
which has exactly the same format as a notarization
 
(but again, is appropriately tagged).
 
Any replica that obtains a finalized block will broadcast the finalization
 
to all other replicas.
 
 
 
The proof of safety invariant can be found [[#Proof of safety invariant|below]].
 
One consequence of the safety invariant is the following.
 
Suppose two blocks <math>B</math> and <math>B'</math> are finalized,
 
where <math>B</math> has height <math>h</math>, <math>B'</math> has height
 
<math>h'\le h</math>.
 
Then the safety invariant implies that the path
 
in the tree of notarized blocks
 
ending at <math>B'</math> is a prefix of the path ending at <math>B</math>
 
(if not, then there would be two notarized blocks at height <math>h'</math>,
 
contradicting the finalization invariant). 
 
Thus, whenever a replica sees a finalized block <math>B</math>,
 
it may view all ancestors of <math>B</math> as being
 
<b>implicitly finalized</b>, and because of the
 
safety invariant, the <b>safety property</b> is
 
guaranteed to hold for these (explicitly and implicitly) finalized blocks.
 
 
 
 
 
 
 
==Delay functions==
 
 
 
The protocol makes use of two <b>delay functions</b>,
 
<math>\Delta_\textrm{m}</math> and <math>\Delta_\textrm{n}</math>,
 
which control
 
the timing of block making and notarization activity.
 
Both of these functions map the rank <math>r</math> of the proposing replica
 
to a nonnegative delay amount,
 
and it is assumed that each function is monotonically increasing in <math>r</math>,
 
and that
 
<math>\Delta_\textrm{m}(r) \le \Delta_\textrm{n}(r)</math>
 
for all <math>r = 0, \ldots, n-1</math>.
 
The recommended definition of these functions is
 
<math>\Delta_\textrm{m}(r) = 2 \delta r</math>
 
and
 
<math>\Delta_\textrm{n}(r) = 2 \delta r + \epsilon</math>,
 
where <math>\delta</math> is an upper bound on the time to deliver
 
messages from one honest replica to another,
 
and <math>\epsilon \ge 0</math> is a "governor" to keep the
 
protocol from running too fast.
 
With these definitions, liveness will be ensured in those
 
rounds in which (1) the leader is honest, and (2) messages really
 
are delivered between honest parties within time <math>\delta</math>.
 
Indeed,
 
if (1) and (2) both hold in a given round, then the block proposed
 
by the leader in that round will be finalized.
 
Let us call this the <b>liveness invariant</b>.
 
The proof of liveness invariant can be found [[#Proof of liveness invariant|below]].
 
 
 
==An example==
 
 
 
[[File:Consensus-example.png|600px]]
 
 
 
The figure above illustrates a block tree.
 
Each block is labeled with its height (30, 31, 32, ...)
 
and the rank of its block maker.
 
The figure also shows that each block in the tree is notarized,
 
as indicated by the diamond-N symbol.
 
This means that for each notarized block in the tree,
 
at least <math>n-f</math> distinct replicas supported its notarization.
 
As one can see, there can be more than one notarized block
 
in the tree at a given height.
 
For example, at height 31, note there are two notarized blocks,
 
proposed by block makers of rank 1 and 2.
 
The same thing happens at height 34.
 
Note that the block at height 36 is also explicitly
 
finalized, as indicated by the diamond-F symbol.
 
This means that <math>n-f</math> distinct replicas supported this block's finalization,
 
which means that these replicas (or at least, the honest replicas
 
among these)
 
did not support the notarization of any other block.
 
All of the ancestors of this block,
 
which are shaded gray, are considered implicitly finalized.
 
 
 
 
 
 
 
 
 
==Putting it all together==
 
 
 
This section describes in more detail how the protocol works;
 
specifically, this section describes more precisely when a replica will
 
propose a block and when a replica will support the notarization of a block.
 
A given replica <math>P</math> will record the time at which it
 
enters a given round <math>h</math>,
 
which happens when it has obtained (1) some notarization for a
 
block of height <math>h-1</math>, and (2) the random beacon for
 
round <math>h</math>.
 
Since the random beacon for round <math>h</math> has been determined,
 
<math>P</math> can determine its own rank <math>r_P</math>, as well as the
 
rank <math>r_Q</math> of each other replica <math>Q</math>
 
for round <math>h</math>.
 
 
 
===Random beacon details===
 
 
 
As soon as a replica enters round <math>h</math>, it will generate and broadcast
 
its share of the random beacon at round <math>h+1</math>.
 
 
 
===Block making details===
 
 
 
Replica <math>P</math> will only propose its own block <math>B_P</math>
 
provided (1) at least <math>\Delta_\textrm{m}(r_P)</math> time
 
units have passed since the beginning of the round,
 
and (2) there is no valid lower ranked block currently seen by <math>P</math>.
 
 
 
Note that since <math>P</math> is guaranteed to have a notarized block
 
of height <math>h-1</math>
 
when it enters round <math>h</math>, it can make its proposed
 
block a child of this notarized block (or any other
 
notarized block of height <math>h-1</math> that it may have).
 
Also note that when <math>P</math> broadcasts its proposal for <math>B_P</math>,
 
it must also ensure that it also has relayed the notarization of
 
<math>B_P</math>'s parent to all replicas.
 
 
 
Suppose a replica <math>Q</math> sees
 
a valid block proposal from a replica <math>P</math>
 
of rank
 
<math>r_P < r_Q</math>
 
such that
 
(1) at least <math>\Delta_\textrm{m}(r_P)</math> time
 
units have passed since the beginning of the round,
 
and (2) there is no block of rank less than <math>r_P</math>
 
currently seen by <math>Q</math>.
 
Then at this point in time, if it has not already done so,
 
<math>Q</math> will relay this block proposal
 
(along with the notarization of the proposed block's parent)
 
to all other replicas.
 
 
 
===Notarization details===
 
 
 
Replica <math>P</math> will support the notarization
 
of a valid block <math>B_Q</math> proposed by a
 
replica <math>Q</math> of rank
 
<math>r_Q</math>
 
provided
 
(1) at least <math>\Delta_\textrm{n}(r_Q)</math> time
 
units have passed since the beginning of the round,
 
and
 
(2) there is no block of rank less than <math>r_Q</math>
 
currently seen by <math>P</math>.
 
 
 
===Proof of growth invariant===
 
 
 
The growth invariant states that
 
each honest replica will
 
eventually complete each round and start the next.
 
Assume that all honest replicas have started round <math>h</math>.
 
Let <math>r^*</math> be the rank of the lowest ranked
 
honest replica <math>P^*</math> in round <math>h</math>.
 
Eventually, <math>P^*</math> will either (1) propose its own block,
 
or (2) relay a valid block proposed by a lower ranked replica.
 
In either case, some block must eventually be supported by all honest replicas,
 
which means that some block will become notarized and all honest replicas
 
will finish round <math>h</math>.
 
All honest replicas will also receive the shares needed to construct the random beacon for round <math>h+1</math>, and so will start round <math>h+1</math>.
 
 
 
 
 
===Proof of safety invariant===
 
 
 
The <b>safety invariant</b> states that if a block is finalized in a given round,
 
then no other block may be notarized in that round.
 
Here is a proof of the safety invariant:
 
# Suppose that the number of corrupt parties is exactly <math>f^* \le f < n/3</math>.
 
# If a block <math>B</math> is finalized, then its finalization must have been supported by a set <math>S</math> of at least <math>n-f-f^*</math> honest replicas (by the security property for aggregate signatures).
 
# Suppose (by way of contradiction) that another block <math>B' \ne B</math> were notarized. Then its notarization must have been supported by a set <math>S'</math> of at least <math>n-f-f^*</math> honest replicas (again, by the security property for aggregate signatures).
 
# The sets <math>S</math> and <math>S'</math> are disjoint (by the finalization logic).
 
# Therefore, <math>n-f^* \ge |S \cup S'| = |S| + |S'| \ge 2 (n-f-f^*)</math>, which implies <math>n \le 3f</math>, a contradiction.
 
 
 
===Proof of liveness invariant===
 
 
 
It is said that the network is <b><math>\delta</math>-synchronous at time <math>t</math></b> if
 
if all messages that have been sent by honest replicas at or before time <math>t</math> arrive at their destinations before time <math>t+\delta</math>.
 
 
 
The <b>liveness invariant</b> may be stated as follows.
 
Suppose that <math>\Delta_\mathrm{n}(1) \ge \Delta_\mathrm{m}(0) + 2\delta</math>.
 
Also
 
suppose that in a given round <math>h</math>,
 
* the leader <math>P</math> in round <math>h</math> is honest,
 
* the first honest replica <math>Q</math> to enter round <math>h</math> does so at time <math>t</math>, and
 
* the network is <math>\delta</math>-synchronous at times <math>t</math> and <math>t+\delta+\Delta_\mathrm{m}(0)</math>.
 
Then the block proposed by <math>P</math> in round <math>h</math> will be finalized.
 
 
 
Here is a proof of the liveness invariant:
 
# Under partial synchrony at time <math>t</math>, all honest replicas will enter round <math>h</math> before time <math>t+\delta</math> (the notarization that ended round <math>h-1</math> for <math>Q</math>, as well as the necessary round <math>h</math> random beacon shares, will arrive at all honest replicas before this time).
 
# The leader <math>P</math> in round <math>h</math> will propose a block <math>B</math> before time <math>t+\delta+\Delta_\mathrm{m}(0)</math>, and again by partial synchrony, this block proposal will be delivered to all other replicas before time <math>t+2\delta+\Delta_\mathrm{m}(0)</math>.
 
# Since <math>\Delta_\mathrm{n}(1) \ge \Delta_\mathrm{m}(0) + 2\delta</math>, the protocol logic guarantees that each honest replica supports the notarization of block <math>B</math> and no other block, and thus <math>B</math> will become notarized and finalized. 
 
 
 
==Other issues==
 
 
 
===Growth latency===
 
 
 
Under a partial synchrony assumption, a quantitative version of the growth invariant can also be formulated and proven.
 
For simplicity, assume that the delay functions are defined
 
as recommended above: <math>\Delta_\textrm{m}(r) = 2 \delta r</math>
 
and
 
<math>\Delta_\textrm{n}(r) = 2 \delta r + \epsilon</math>,
 
and further assume that <math>\epsilon \le \delta</math>.
 
Suppose that at time <math>t</math>, the highest numbered round entered
 
by any honest replica is <math>h</math>.
 
Let <math>r^*</math> be the rank of the lowest ranked
 
honest replica <math>P^*</math> in round <math>h</math>.
 
Finally, suppose that the network is <math>\delta</math>-synchronous
 
at all times in the interval <math>[t,t+(3r^*+2)\delta]</math>.
 
Then all honest replicas will start round <math>h+1</math> before time
 
<math>t+3(r^*+1)\delta</math>.
 
 
 
===Locally adjusted delay functions===
 
 
When a replica does not see any finalized blocks for several rounds, it will
 
start increasing its own delay function <math>\Delta_\textrm{n}</math>
 
for notarization.
 
Replicas need not agree on these locally adjusted notarization delay functions.
 
 
 
Also, while replicas do not explicitly adjust the delay function
 
<math>\Delta_\textrm{p}</math>, local clock drift can be mathematically modeled by locally adjusting both delay functions.
 
 
 
Thus, there are many delay functions,
 
parameterized by replica and round.
 
The critical condition <math>\Delta_\mathrm{n}(1) \ge \Delta_\mathrm{m}(0) + 2\delta</math> needed for liveness then becomes
 
<math>\max \Delta_\mathrm{n}(1) \ge \min \Delta_\mathrm{m}(0) + 2\delta</math>,
 
where the <math>\max</math> and <math>\min</math> are taken over all the honest replicas in a given round.
 
Thus, if finalization fails for enough rounds, all honest replicas
 
will eventually increase their notarization delay until this holds and finalization
 
will then resume.
 
If some honest replicas increase their notarization latency function
 
more than other replicas, there is no penalty in terms of liveness
 
(but there may be in terms of growth latency).
 
 
 
 
 
===Fairness===
 
 
 
Another property that is important in consensus protocols is <b>fairness</b>.
 
Rather than give a general definition, observe that
 
the liveness invariant also implies a useful fairness property.
 
Recall that the liveness invariant basically says that in any round
 
where the leader is honest and the network is synchronous,
 
then the block proposed by the leader will be finalized.
 
In those rounds where this happens, the fact that the leader is
 
honest ensures that it will include in the payload of its block
 
all of the transaction requests it knows about (modulo limits on
 
the payload size).
 
So, very roughly speaking,
 
any transaction request that is disseminated to enough
 
replicas will be included in a finalized block in a reasonable amount
 
of time with high probability.
 

Latest revision as of 21:18, 8 March 2024