Difference between revisions of "Node Provider Troubleshooting"

From Internet Computer Wiki
Jump to: navigation, search
m (Added instructions for joining Matrix/Element channel)
(Added the "Node Status on the Dashboard" section)
Line 5: Line 5:
 
* [[Updating Firmware]]
 
* [[Updating Firmware]]
 
* [[iDRAC access and TSR logs]]
 
* [[iDRAC access and TSR logs]]
 +
 +
==Node Status on the Dashboard==
 +
The dashboard lists each node by the principal of the currently-running OS. Node Providers track privately which server corresponds to each principal. This includes updating their records when a node is redeployed and gets a new principal.
 +
 +
There are four statuses of node:
 +
 +
* '''Active in Subnet''' - This is a node which is healthy and is currently running a subnet.
 +
* '''Awaiting Subnet''' - This is a node which is healthy and is currently a spare node. It is not running a subnet but it keeping itself updated so that it is ready at a moment's notice to take part in a subnet
 +
* '''Offline''' - This is a node which has completely failed. The failure is recent enough that it hasn't been removed from the registry yet. If there is an outage of some sort at the data center, then the node should come back online and be healthy once it's resolved, as long as it doesn't take too long. Make sure that connectivity to the node is properly supplied before doing anything else. If there are no issues with connectivity, then [[Unhealthy Nodes|troubleshooting steps]] should be taken. Note that the node will have to be removed from the registry before it can be redeployed, if redeployment is needed.
 +
* '''Degraded''' - This node is struggling to keep up with the blockchain. If it's a temporary issue then it should catch back up and become healthy again. If it's a permanent issue, then it will eventually fail and go offline. If it's removed from the registry before it fails completely then it will disappear from the dashboard.
 +
* '''Not listed at all'''. If a node is not listed at all, then it had an issue and it was already removed from the registry. [[Unhealthy Nodes|Troubleshooting steps]] should be taken.
  
 
==Changing your Node Provider principal in the NNS==
 
==Changing your Node Provider principal in the NNS==

Revision as of 20:23, 21 June 2023

Troubleshooting individual Nodes

Node Status on the Dashboard

The dashboard lists each node by the principal of the currently-running OS. Node Providers track privately which server corresponds to each principal. This includes updating their records when a node is redeployed and gets a new principal.

There are four statuses of node:

  • Active in Subnet - This is a node which is healthy and is currently running a subnet.
  • Awaiting Subnet - This is a node which is healthy and is currently a spare node. It is not running a subnet but it keeping itself updated so that it is ready at a moment's notice to take part in a subnet
  • Offline - This is a node which has completely failed. The failure is recent enough that it hasn't been removed from the registry yet. If there is an outage of some sort at the data center, then the node should come back online and be healthy once it's resolved, as long as it doesn't take too long. Make sure that connectivity to the node is properly supplied before doing anything else. If there are no issues with connectivity, then troubleshooting steps should be taken. Note that the node will have to be removed from the registry before it can be redeployed, if redeployment is needed.
  • Degraded - This node is struggling to keep up with the blockchain. If it's a temporary issue then it should catch back up and become healthy again. If it's a permanent issue, then it will eventually fail and go offline. If it's removed from the registry before it fails completely then it will disappear from the dashboard.
  • Not listed at all. If a node is not listed at all, then it had an issue and it was already removed from the registry. Troubleshooting steps should be taken.

Changing your Node Provider principal in the NNS

IC Node Providers Matrix/Element channel

There is an open Matrix channel that's intended to bring together all existing, future, and potential future Node Providers: https://app.element.io/#/room/#ic-node-providers:matrix.org

The channel runs on the open and decentralized Matrix network. Among other ways the channel is also accessible from element.io and from the Element desktop app. The Element desktop app is similar in functionality to Slack, and they offer a web UI, a desktop client, and a mobile app.

We recommend that you add an email address in the Element Profile settings and to enable notifications for missed messages.

🔗 How do I set up email notifications?
You can set Element up to email you when you have missed some activity (new messages, new invites…). You can do this in the Notification section of your Settings and turn on the toggle labelled as ‘Enable email notifications’.