Difference between revisions of "Manual Node Recovery Guide"

From Internet Computer Wiki
Jump to: navigation, search
m
(Drafting new recovery guide)
Line 1: Line 1:
🚧🚧🚧 UNDER CONSTRUCTION 🚧🚧🚧
+
This runbook describes what steps node providers need to take during an NNS recovery.
  
'''⚠️⚠️⚠️ Disclaimer:''' Do not attempt this unless you are certain it is appropriate. Manual node recovery should only occur in the event of a critical IC failure. In such cases, the process will be coordinated by reputable IC community leaders and discussed publicly with node providers.
+
=== Security warning ===
 +
⚠️⚠️⚠️ Don’t get tricked into compromising your nodes. Only complete a manual node recovery if all of the following conditions are met:
  
and it is being actively discussed or recommended in trusted community forums.
+
* A subnet recovery is announced on the Internet Computer Status Page
 +
* The DFINITY team reached out on the dedicated Matrix channel #ic-node-providers-incident-response:matrix.org.
 +
** Only the DFINITY team is able to send messages on this channel. In case of an incident, permissions are adapted so that everyone can send messages.  
  
===1. Receive recovery version and short hash===
+
=== Prerequisites ===
A recovery coordinator will notify all subnet Node Providers of the recovery image <code>version</code> and associated short (6-character) <code>hash</code> that Node Providers will use to apply the upgrade
 
  
===2. Complete Recovery===
+
* The recovery coordinator should have communicated with you the following:
Note that the recovery can be completed from the node's remote BMC console view or from the physical console
+
** The recovery parameters:
 
+
*** The <code>VERSION</code>: the commit ID of the recovery-GuestOS update image
#Reboot the machine
+
*** The <code>VERSION-HASH</code>: the SHA256 sum of the recovery-GuestOS update image.
#During reboot, at the grub menu, press 'e' to edit the boot parameters (you must press 'e' before the 15-second timeout)
+
*** The <code>RECOVERY-HASH</code>: the SHA256 sum of the recovery.tar.zst
#:''If you miss the timeout, don't worry. Just reboot the machine and try again.''
+
** The node(s): which specific nodes managed by the NP/NO are part of the target subnet.
#:[[File:Host grub boot menu.png|580px|screenshot]]
+
* Obtain console access to all nodes you run that are part of the target subnet.
#From the GRUB edit mode screen, add the boot parameters: <code>recovery=1 version=ABC hash=XYZ</code> ''(replace ABC and XYZ with the version and hash provided by the recovery coordinator. Note that the hash should only be six-characters long.)''
+
** Note that the recovery can be completed from a physical console or from the node's remote BMC virtual console view.
#:[[File:Grub boot edit 1.png|480px|screenshot]]
 
#:→→→
 
#:[[File:Grub boot menu 2.png|480px|screenshot]]
 
#:🚧 Do not follow the screenshot's version and hash values! Use the <code>version</code> and <code>hash</code> values provides by the recovery coordinator! 🚧
 
#Press ctrl-X or F10 to boot
 
 
 
===3. Wait for recovery confirmation===
 
...
 

Revision as of 19:06, 4 December 2025

This runbook describes what steps node providers need to take during an NNS recovery.

Security warning

⚠️⚠️⚠️ Don’t get tricked into compromising your nodes. Only complete a manual node recovery if all of the following conditions are met:

  • A subnet recovery is announced on the Internet Computer Status Page
  • The DFINITY team reached out on the dedicated Matrix channel #ic-node-providers-incident-response:matrix.org.
    • Only the DFINITY team is able to send messages on this channel. In case of an incident, permissions are adapted so that everyone can send messages.

Prerequisites

  • The recovery coordinator should have communicated with you the following:
    • The recovery parameters:
      • The VERSION: the commit ID of the recovery-GuestOS update image
      • The VERSION-HASH: the SHA256 sum of the recovery-GuestOS update image.
      • The RECOVERY-HASH: the SHA256 sum of the recovery.tar.zst
    • The node(s): which specific nodes managed by the NP/NO are part of the target subnet.
  • Obtain console access to all nodes you run that are part of the target subnet.
    • Note that the recovery can be completed from a physical console or from the node's remote BMC virtual console view.