Difference between revisions of "Replica Upgrades"
Diego.prats (talk | contribs) |
Diego.prats (talk | contribs) |
||
Line 85: | Line 85: | ||
Before the state at height 30 can be computed, block 30 needs to be finalized. In order to reach finalization on that block, the protocol produces blocks until there is a finalization for a block larger or equal to 30. In case of an upgrade, these blocks don’t contain messages to canisters to avoid changing the state beyond height 30. | Before the state at height 30 can be computed, block 30 needs to be finalized. In order to reach finalization on that block, the protocol produces blocks until there is a finalization for a block larger or equal to 30. In case of an upgrade, these blocks don’t contain messages to canisters to avoid changing the state beyond height 30. | ||
+ | |||
+ | [[File:Consensus proceeds until a CUP.png|600px|frameless]] | ||
Once a block of height >= 30 is finalized, state 30 can be computed and finally certified. With that, the system has all of the information necessary to build a CUP for that height. | Once a block of height >= 30 is finalized, state 30 can be computed and finally certified. With that, the system has all of the information necessary to build a CUP for that height. |
Revision as of 19:17, 9 May 2023
Overview
In general, blockchain upgrades are hard:
- First, there has to be agreement by the community that a new version should be adopted.
- Next, the point of time at which the upgrade should be applied has to be agreed upon,
- Finally, the actual upgrade needs to take place, with each step having to cope with node failures and malicious activities.
Because this is a difficult undertaking, such upgrades have traditionally often been realized as a fork of the chain in other blockchain projects. ICP blockchain supports evolution of its protocol via its replica software stack in an integrated fashion: decisions on software stack upgrades are controlled by NNS voting, actual upgrades are then executed autonomously, without human interaction.
Architecture
The governance system in the Network Nervous System is used to let the Neuron holders vote on replica upgrades. This happens in two steps:
- A first NNS proposal asks to “elect” a new replica version, meaning to add it to the list of versions that are currently supported by the IC.
- Subsequent NNS proposals can then be made to upgrade individual subnets to the new version.
Once the governance system triggers a replica upgrade for a subnet, the subnet’s consensus layer autonomously decides when an upgrade should be executed based on agreement among the subnet’s nodes.
- The nodes have built-in support for downloading and applying upgrade packages without human intervention.
- An upgrade package contains the entire software stack needed to run a node. After verifying the package content corresponds to the version the community voted to run, nodes then reboot into the new version autonomously.
Technical details
Goals
- Allow arbitrary changes to all layers of the protocol stack
- Preserve all canister state.
- Minimize downtime.
- Roll out upgrades autonomously.
Registry
There is a component called the registry, which is implemented as a canister smart contract in the Network Nervous System subnet. Essentially, it stores all configuration information for the Internet Computer. It is implemented as a versioned key/value store, where each mutation shows up as a new version in the registry.
The network is upgraded on a per-subnet basis, and each subnet has a record in the registry that indicates which nodes constitute the subnet and what protocol version they should be running. In order to trigger an upgrade, the network simply changes the version information for the subnet that it wants to upgrade. Whether or not a subnet should be upgraded to a certain version is decided by votes in the governance system rather than by individual node providers.
Note that the registry contains the desired configuration, but because of the delays of the subnet observing the registry change and actually performing the upgrade, the NNS subnet does not actually know which version is currently running in a subnet.
State Machine Replication
The Internet Computer implements state machine replication in each subnet. The idea of state machine replication is that the state is guaranteed to be identical on all honest nodes. Because each state transition is triggered by a sequence of agreed-upon ordered inputs and because the transition function is deterministic, the state is guaranteed to remain identical on honest nodes.
On the Internet Computer, the input to the state machine are update calls sent by users as well as inter-canister messages from other subnets. The consistent ordering of inputs is guaranteed by consensus, and the state includes the state of the message routing layer and execution layer (which itself includes all Canister code and data).
In order to build the state at height h, the state at the previous height h-1 is taken and the input messages from block h are applied to that state. The input in these blocks was agreed upon by consensus. For state machine replication, the code we are executing needs to be deterministic when processing a block at height h.
Upgrades can change the state machine, i.e., how execution and message routing layers operate. Hence, the network needs to make sure that upgrades are running at the same time everywhere, relative to the block height.
Upgrades can also change details in the consensus layer, e.g,. how notarization happens. Hence upgrades must be executed at the same block height, as otherwise different and potentially conflicting notarization schemes might be used to notarize blocks of the same height.
Finally, upgrades can also entail protocol-breaking changes to networking details during those upgrades, which might make it impossible for nodes of different versions to communicate. In order to reach consensus, all honest nodes need to be able to communicate with each other, which again implies that upgrades need to be executed at the same block height on all node machines.
Note that nodes are not going to arrive at a block height h at the same physical time, because the Internet Computer is a distributed system that does utilize any notion of a global time.
In theory, some nodes might be far behind other nodes, which means that for some period of time, the nodes in a subnet are going to run different Internet Computer versions, and the network needs to be able to cope with that.
Upgrade Process
The overall process looks as follows:
Subnet A is running Internet Computer v1 (version 1). An upgrade is triggered to v2 at registry version r. Nodes in subnet A will eventually agree to use that new Internet Computer version at a registry version r at a certain block height h.
Nodes running v1 will create blocks and compute state up to and including that height h, and nodes running v2 take over at height h+1.
Between states h and h+1, state needs to be handed over between the two versions. For this to work, a snapshot of the state at height h needs to be taken.
Since these two versions are clearly separated, the network can even switch to a completely different consensus algorithm starting from height h+1, because there is never direct communication between the v2 protocol and the v1 protocol.
There is an artifact that can be used to do exactly that: catchup package (“CUP”). It contains all relevant information required for consensus to resume from it. Moreover, the CUP for height h refers to a registry version which in turn indicates which replica version is to be run height greater than h.
CUPs are signed by a subnet, so their integrity can be verified by means of the subnet’s threshold signature. One requirement for the CUPs in the context of upgrades is that they have to be readable from both the old version (v1) and the new version (v2), so the CUPs need to be backward and forward compatible.
One challenge is that the network needs to make sure that each node in the subnet runs the correct Internet Computer version. All honest nodes must participate in version v1 until the handover CUP is created, and then join as a v2 node and start producing blocks as in v2. If some of the honest nodes run an incorrect version of the Internet Computer, the entire subnet could get stuck.
In order for a node to decide which version it should run, it first queries the registry to find out which subnet it should join. With that information, it also finds out the identity of the peers in that subnet. Then the protocol asks all peers for their latest CUPs, checks whether it already has a local CUP, and also checks a CUP that is included in the registry (“Genesis CUP”). The highest of all the CUPs that have been found can be used to determine the Internet Computer version that is running in that subnet at the moment.
Since the IC is a decentralized system and building CUPs is a collective effort by multiple nodes in the subnet, it is not known which of the nodes participated in creating the most up-to-date CUP. In order to tolerate f Byzantine nodes, 2f+1 nodes must sign a CUP for it to be valid. Consequently, if fewer than f+1 of the nodes are queried for CUPs, there is no guarantee that one of them has the most recent version of the CUP.
In the worst case the CUPs that a node processes are outdated and a scheduled upgrade is not immediately detected. CUPs supported by less than 2f+1 nodes are always ignored as in this case the CUP’s signature cannot be verified successfully. As discussed earlier, nodes in a subnet might be running different Internet Computer versions. To avoid incompatibilities between versions, CUPs are fetched from peers over a separate communication channel using an endpoint dedicated to the exchange of CUPs. The peer-to-peer layer is not used for this purpose. That allows the logic for exchanging CUPs to be kept relatively simple, and makes it easier to keep it backward and forward compatible.
Consensus Details
Consider an example of a series of blocks, from 27 to 30, and a series of states (in this case, 27 and 28). In block 30, consensus chooses to use a new registry version r, which triggers an upgrade. Consensus now knows that it has to build a CUP referring to the new registry version r, but it cannot currently do it because it did not yet compute the state at height 30.
Before the state at height 30 can be computed, block 30 needs to be finalized. In order to reach finalization on that block, the protocol produces blocks until there is a finalization for a block larger or equal to 30. In case of an upgrade, these blocks don’t contain messages to canisters to avoid changing the state beyond height 30.
Once a block of height >= 30 is finalized, state 30 can be computed and finally certified. With that, the system has all of the information necessary to build a CUP for that height. Honest nodes retain their state during upgrades. The new IC version may choose to convert that state as a post-upgrade step, if they wish to, otherwise the state from the previous version can directly be used by nodes after the upgrade is complete.
The CUP is used as a handover point between the two versions. Hence, artifacts from v1 must not spill over to v2, as otherwise we could end up with multiple non-empty blocks for the same block height, which would be incorrect and possibly lead to a fork in the chain. By annotating each consensus artifact with the protocol version number it was produced with, the protocol can ignore artifacts for a different version and thus avoid this problem.
User experience
During the time between the creation of block 30 until the upgrade is executed the user experience is affected. While the CUP for height 30 is created, query calls continue to be processed, but no further update calls can be accepted since nodes of v1 are not allowed to create non-empty blocks after height 30. Essentially, canisters on the subnet are essentially running in read-only mode at this point. After the upgrade CUP has been signed the subnet also briefly becomes unavailable for query calls, as the upgrade needs to be installed and the VMs need to be rebooted to apply that upgrade. Overall, the downtime during upgrades of subnets is in the order of a few minutes.