Node Provider Maintenance Guide
Troubleshooting
See the Node Provider Troubleshooting guide for info on troubleshooting failed onboardings, unhealthy nodes, networking, and more.
Submitting NNS proposals
As a part of being a Node Provider, you will likely have to submit some NNS proposals. The page at the following link describes some of these proposals: Node Provider NNS proposals
Monitoring
You are expected to regularly monitor the health of your nodes. Node health status is available on the public dashboard. Example: node status.
Community Tools and Resources
Several node providers have generously shared tools to facilitate monitoring node health. These tools can provide notifications in case of node issues.
Aviate Labs Node Monitor
- Turnkey Solution: Receive email alerts for unhealthy nodes.
- Link: AviateLabs Node Monitor
DIY Node Monitoring
- GitHub Repository: Run your own node monitoring system.
- Link: Aviate Labs GitHub
Prometheus Exporter for Node Status
- GitHub Repository: A tool for exporting node status to a Prometheus-compatible format.
- Link: IC Node Status Prometheus Exporter
Common maintenance tasks
- Removing a Node From the Registry
- Adding additional node machines to existing Node Allowance
- Updating your node's IPv4 and domain name
- Moving a node from one DC to another
Permitted tools
For security and confidentiality reasons, other tools are not allowed to run on the same machine in parallel with the replica. In case you need to troubleshoot an issue, it is recommended to either boot the machine from a USB drive that has a live Linux distribution (e.g. Ubuntu) or to debug from an auxiliary machine in the same rack on which you have complete control, as described in Unhealthy Nodes#Setting Up an Auxiliary Machine for Network Diagnostics
Scheduled DC outages
When your DC notifies you of a scheduled outage, you must:
- Notify DFINITY on the Node Provider Matrix channel
- Make sure your nodes return to one of the healthy statuses when the DC outage is resolved:
- Active in Subnet - The node is healthy and actively functioning within a subnet.
- Awaiting Subnet - The node is operational and prepared to join a subnet when necessary.
- If a node is degraded at first, give it a little bit of time in case it needs to catch up, but make sure that it does return to one of the two healthy statuses.
Node rewards based on useful work
The Internet Computer protocol can tolerate up to 1/3 of nodes misbehaving. There is an ongoing activity to automatically issue node rewards based on useful work, and also to automatically reduce node remuneration in case nodes are misbehaving. This will provide a financial incentive for honest behavior. Please follow the forum and the Matrix channel to stay informed about these activities.
In the meantime, the recommendation is to prepare for this by making sure that your nodes are online and healthy at all times, otherwise you risk penalties even before the automatic node rewards based on useful work become active.
Subnet recovery
In case subnet recovery is needed, we may have to reach out to you for assistance. Please make sure you closely follow activities in the Matrix Channel, and enable notifications on new messages -- especially direct mentions.
Peer-support and bug reports / resolution: Node Provider Matrix Channel
Node Providers are encouraged to join the dedicated Node Provider Matrix channel. This platform can be used for discussing maintenance-related queries and sharing insights, report issues, and search for previous resolutions for operations.
Please consult the Matrix channel for troubleshooting issues only after consulting the Node Provider Troubleshooting guide
Communication Guidelines on the Matrix Channel
As a Node Provider, ensure your notifications are enabled to receive new messages promptly. Your input or intervention might be crucial, especially in urgent situations.
It is recommended to add the node provider name to your alias (handle) on the communication platform, to facilitate communication and enable others to quickly and easily mention you.