Difference between revisions of "Node Provider Troubleshooting"

From Internet Computer Wiki
Jump to: navigation, search
m
m (add missing period)
 
(37 intermediate revisions by 4 users not shown)
Line 1: Line 1:
==Troubleshooting individual Nodes==
+
==Specific troubleshooting guides==
  
* [[Possible Node Onboarding Errors]]
+
* [[Troubleshooting Node Deployment Errors]]
* [[Unhealthy Nodes|Troubleshooting Unhealthy Nodes]]
+
* [[Troubleshooting Unhealthy Nodes]]
* [[Updating Firmware]]
+
* [[Troubleshooting Networking Issues]]
* [[iDRAC access and TSR logs]]
+
* [[Troubleshooting Failed NNS proposals]]
  
==Node Status on the Dashboard==
+
==Getting the node ID from a node==
The dashboard lists each node by the principal of the currently-running OS. Node Providers track privately which server corresponds to each principal. This includes updating their records when a node is redeployed and gets a new principal.
 
  
There are four statuses of node:
+
# Hook up a console to the node.
 +
# The node ID will print to the screen upon a fresh boot and every 10 minutes thereafter.
 +
# If a node does not show its principal, consult the [[Troubleshooting Node Deployment Errors]] page.
  
* '''Active in Subnet''' - This is a node which is healthy and is currently running a subnet.
+
==Node Provider Matrix channel ==
* '''Awaiting Subnet''' - This is a node which is healthy and is currently a spare node. It is not running a subnet but it keeping itself updated so that it is ready at a moment's notice to take part in a subnet
+
'''<u>After first consulting relevant documentation</u>''', discuss your issue with other Node Providers in the [[Node Provider Matrix channel]].
* '''Offline''' - This is a node which has completely failed. The failure is recent enough that it hasn't been removed from the registry yet. If there is an outage of some sort at the data center, then the node should come back online and be healthy once it's resolved, as long as it doesn't take too long. Make sure that connectivity to the node is properly supplied before doing anything else. If there are no issues with connectivity, then [[Unhealthy Nodes|troubleshooting steps]] should be taken. Note that the node will have to be removed from the registry before it can be redeployed, if redeployment is needed.
 
* '''Degraded''' - This node is struggling to keep up with the blockchain. If it's a temporary issue then it should catch back up and become healthy again. If it's a permanent issue, then it will eventually fail and go offline. If it's removed from the registry before it fails completely then it will disappear from the dashboard.
 
* '''Not listed at all'''. If a node is not listed at all, then it had an issue and it was already removed from the registry. [[Unhealthy Nodes|Troubleshooting steps]] should be taken.
 
 
 
==Changing your Node Provider principal in the NNS==
 
* [[Changing Your Node Provider Principal]]
 
 
 
== Changing a DC principal ==
 
If the HSM that was used to deploy your nodes gets lost or corrupted, you can either replace the HSM with a new one, or you can replace the principal using the HSM-less method.
 
 
 
To replace using the HSM-less method, use steps 1, 4, 5, 6, 9, and 10 of the [[Node Provider Onboarding|Node Provider Onboarding instructions]]. Please note:
 
 
 
* When the proposal is submitted in step 9, you will need to wait several days for the proposal to pass.
 
* Make sure you explain in your proposal who you are and why you are replacing the principal to help ensure that your proposal is accepted by the community.
 
* You will then create the IC-OS image with the new principal, and it will be used to onboard the nodes using the options in the onboarding directions for using a <code>node_operator_private_key.pem</code> file.
 
 
 
== Node Provider Matrix channel ==
 
Discuss your issue with other Node Providers in the [[Node Provider Matrix channel]].
 

Latest revision as of 16:10, 12 July 2024

Specific troubleshooting guides

Getting the node ID from a node

  1. Hook up a console to the node.
  2. The node ID will print to the screen upon a fresh boot and every 10 minutes thereafter.
  3. If a node does not show its principal, consult the Troubleshooting Node Deployment Errors page.

Node Provider Matrix channel

After first consulting relevant documentation, discuss your issue with other Node Providers in the Node Provider Matrix channel.