Difference between revisions of "Node Provider Troubleshooting"

Latest revision as of 16:10, 12 July 2024

Specific troubleshooting guides

Getting the node ID from a node

Hook up a console to the node.
The node ID will print to the screen upon a fresh boot and every 10 minutes thereafter.
If a node does not show its principal, consult the Troubleshooting Node Deployment Errors page.

Node Provider Matrix channel

After first consulting relevant documentation, discuss your issue with other Node Providers in the Node Provider Matrix channel.

@@ Line 5: / Line 5: @@
 * [[Troubleshooting Networking Issues]]
 * [[Troubleshooting Failed NNS proposals]]
-* [[Updating Firmware]]
-* [[iDRAC access and TSR logs]]
-* [[Checking node CPU and memory speed]]
-* For changing your Node Provider or DC principal, please refer to [[Node Provider NNS proposals]]
 ==Getting the node ID from a node==
-* Hook up a console to the node
-* The node ID will print to the screen upon a fresh boot and every 10 minutes thereafter.
-* If a node does not show its principal, consult the [[Troubleshooting Node Deployment Errors]] page.
-==Node Status on the Dashboard==
+# Hook up a console to the node.
-The dashboard provides real-time status of each node in the network. Nodes are identified by the principal of the currently deployed operating system, so the principal of the node will change upon node redeployment. Node Providers are expected to maintain a private record correlating each server with its principal. This record is crucial for tracking, especially when nodes are redeployed with new principals.
+# The node ID will print to the screen upon a fresh boot and every 10 minutes thereafter.
+# If a node does not show its principal, consult the [[Troubleshooting Node Deployment Errors]] page.
-===== Metrics and Monitoring =====
-Metrics are collected from nodes situated in three key geographical locations: Frankfurt (FR1), Chicago (CH1), and San Francisco (SF1). Each location is equipped with an independent monitoring and observability system. These systems apply specific rules to identify normal and abnormal node behaviors.
-===== Alerts and Troubleshooting =====
-When a node exhibits abnormal behavior, an ALERT is triggered by the monitoring system. The nature of the alert is indicated on the dashboard under the node's status.[[File:Dashboard-degraded-node.png|center|frameless|499x499px|Screenshot of a degraded node status page]]
-In the event of an ALERT, follow the provided [[Unhealthy Nodes|troubleshooting steps]]. If your issue and solution are not listed, please contribute by adding them to the page.
-The dashboard indicates four possible statuses for each node:
-*'''Active in Subnet''' - The node is healthy and actively functioning within a subnet.
-*'''Awaiting Subnet''' - The node is operational and prepared to join a subnet when necessary. Node providers still get full rewards for the node.
-*'''Degraded''' - Metrics can be scraped from the node, indicating it is online, but an ALERT has been raised. This status suggests the node may be struggling to keep up with network demands. Intervention from the node provider and following the [[Unhealthy Nodes|troubleshooting steps]] should be followed to resolve the issue. If you need to remove a node from the registry to service it, see [[Removing a Node From the Registry]]
-*'''Offline''' - The monitoring system is unable to scrape metrics, possibly due to node failure or data center outage. Prioritize verifying network connectivity and hardware functionality. [[Unhealthy Nodes|Troubleshooting steps]] should be followed to resolve the issue.
-*'''Not listed at all'''. A missing node from the list may indicate significant issues, requiring immediate attention and troubleshooting. If the node was functioning previously and is now not listed at all, this generally means that it started encountering issues and was removed from the registry. [[Unhealthy Nodes|Troubleshooting steps]] should be followed to resolve the issue.
 ==Node Provider Matrix channel ==
-'''After first consulting relevant documentation''', discuss your issue with other Node Providers in the [[Node Provider Matrix channel]].
+'''<u>After first consulting relevant documentation</u>''', discuss your issue with other Node Providers in the [[Node Provider Matrix channel]].
-Back to [[Node Provider Documentation]]