Difference between revisions of "Troubleshooting Node Deployment Errors"
m (Andrew.battat moved page Possible Node Onboarding Errors to Troubleshooting Node Deployment Errors: Better name in line with the "Node Deployment Guides" page) |
m (Add details on accessing setupOS shell) |
||
Line 4: | Line 4: | ||
If you encounter an error not listed here, please capture a screenshot and detail when it happened, which stage in onboarding you were at, the status of any lights on the server, and any other relevant details. Post your issue and accompanying screenshots in the [https://app.element.io/#/room/#ic-node-providers:matrix.org IC Node Provider Matrix channel]. | If you encounter an error not listed here, please capture a screenshot and detail when it happened, which stage in onboarding you were at, the status of any lights on the server, and any other relevant details. Post your issue and accompanying screenshots in the [https://app.element.io/#/room/#ic-node-providers:matrix.org IC Node Provider Matrix channel]. | ||
+ | |||
== Node won't register to the IC == | == Node won't register to the IC == | ||
==== Example Error ==== | ==== Example Error ==== | ||
Line 15: | Line 16: | ||
Additionally, [https://wiki.internetcomputer.org/wiki/Node_Provider_Networking_Guide#What_NOT_to_do verify that there is no external network filtering] (external firewalls, packet filters, rate limiters). | Additionally, [https://wiki.internetcomputer.org/wiki/Node_Provider_Networking_Guide#What_NOT_to_do verify that there is no external network filtering] (external firewalls, packet filters, rate limiters). | ||
+ | |||
+ | == Getting a shell during Node (SetupOS) installation, to troubleshoot a failure == | ||
+ | * Hit enter until you see a login prompt | ||
+ | * Log in with user <code>root</code> and empty password | ||
+ | * Now you have root access for diagnostics, etc | ||
== Orchestrator Started == | == Orchestrator Started == |
Revision as of 16:49, 7 March 2024
This page has some error codes that may display as you are onboarding your nodes. Please review the examples, causes, and proposed solutions before reaching out on the IC Node Provider Matrix channel.
If you need Dell to service your machine, then these links will assist in retrieving a Dell TSR Log and in resetting the iDRAC password.
If you encounter an error not listed here, please capture a screenshot and detail when it happened, which stage in onboarding you were at, the status of any lights on the server, and any other relevant details. Post your issue and accompanying screenshots in the IC Node Provider Matrix channel.
Node won't register to the IC
Example Error
You successfully installed a node and you don't see any errors, but your node ID is not visible on the dashboard.
Common Causes
The node has installed and launched successfully, but is unable to join the network. This could be due to an out-of-date IC-OS installation image, trouble contacting the NNS, or node installation limits on the network.
Suggested Solutions
Please verify that a recent IC-OS installation image version is being used, and check https://dashboard.internetcomputer.org/ to see how many nodes are currently registered under your Node Provider. If there are more nodes listed than expected, or if there are multiple nodes overlapping, please have any extra nodes removed from the network before attempting to install again. This can be caused if multiple installations have been performed on the same hardware, without cleaning up the records from the network.
Additionally, verify that there is no external network filtering (external firewalls, packet filters, rate limiters).
Getting a shell during Node (SetupOS) installation, to troubleshoot a failure
- Hit enter until you see a login prompt
- Log in with user
root
and empty password - Now you have root access for diagnostics, etc
Orchestrator Started
This message is not an error, nor is it confirmation that the node is running properly.
- Check the dashboard to check the status of that particular node. (Status explanations are here.) Use the principal ID that was assigned to the node when it was onboarded to identify it.
- If the node is not visible on the dashboard then it has not registered with the Internet Computer.
- If you have recently installed a current IC-OS image, then you can try inserting the HSM and/or a reboot to see if it joins. This would work if the IC-OS installation was successful and only the registration and joining was interrupted.
- If you have not recently installed a current IC-OS image, then do not insert the HSM. You do not want the node to rejoin with an old IC-OS image, as it will only fail again. Instead, you should consider upgrading the firmware if it is running on old versions, and then redeploy the node with a fresh/current IC-OS image (which will assign a new principal to the node so that you can identify it in the dashboard.)
General Troubleshooting
During the IC-OS installation, you may hit enter to obtain console access to troubleshoot any issues you are encountering. You can also hit enter at the error page in order to access the console.
Once you have console access, in order to stop the IC-OS installation service, enter:
$ systemctl stop setupos
Missing Drives
Example Error
-------------------------------------------------------------------------------- INTERNET COMPUTER - SETUP - FAILED -------------------------------------------------------------------------------- Please contact the Node Provider Matrix channel for support. -------------------------------------------------------------------------------- ERROR -------------------------------------------------------------------------------- Not enough drives found. Are all drives correctly installed? -------------------------------------------------------------------------------- ERROR --------------------------------------------------------------------------------
Another version of it might say "Aggregate Disk size does not meet requirements"
Common Causes
This error means that the IC-OS installation medium could not detect all required drives. This is a common issue, even if you believe that all drives are installed correctly. Some of them may not be functioning properly, or may not be fully seated into the chassis.
Suggested Solutions
Check that all drives are fully seated and installed correctly, or install the required number of drives. You may be able to check the drives for indication LEDs to see which may not be installed or functioning correctly.
Invalid CPU Configuration
Example Error
-------------------------------------------------------------------------------- INTERNET COMPUTER - SETUP - FAILED -------------------------------------------------------------------------------- Please contact the Node Provider Matrix channel for support. -------------------------------------------------------------------------------- ERROR -------------------------------------------------------------------------------- Number of threads (16/32) does NOT meet system requirements. -------------------------------------------------------------------------------- ERROR --------------------------------------------------------------------------------
Common Causes
Issues related to CPU capability usually mean that the CPUs are not configured correctly in the system BIOS.
Suggested Solutions
Please check that BIOS settings are configured correctly. It may be helpful to reset all settings to factory defaults, and go through the BIOS configuration again.
Unable to Reach Internet
Example Error
-------------------------------------------------------------------------------- INTERNET COMPUTER - SETUP - FAILED -------------------------------------------------------------------------------- Please contact the Node Provider Matrix channel for support. -------------------------------------------------------------------------------- ERROR -------------------------------------------------------------------------------- Unable to ping IPv6 gateway. -------------------------------------------------------------------------------- ERROR --------------------------------------------------------------------------------
Common Causes
This error means that the node is not able to communicate with the network properly. This can be due to a misconfigured network configuration, or due to issues somewhere between the node and the rest of the internet.
Suggested Solutions
Please try to capture any output that is displayed before this error shows. For example:
* Printing user defined network settings... IPv6 Prefix : XXX IPv6 Subnet : XXX IPv6 Gateway: XXX * Printing system's network settings... IPv6 Prefix : XXX IPv6 Subnet : XXX IPv6 Gateway: XXX * Printing IPv6 addresses... SetupOS: XXX HostOS : XXX GuestOS: XXX
Please compare this, and the initial configuration, to what you expect. If this configuration does not match, please update the initial configuration, and try again.
If this does match the expected configuration, please attempt to diagnose any machines between this node and the rest of the internet. This could be due to improper firewall configuration, or an issue with the data center’s network. If all configuration looks correct, please attempt to reboot any machines between this node and the rest of the Internet. In most cases, this would be a firewall. Rebooting the firewall - even if it seems to be operating correctly - has resolved this issue many times.
Unable to setup PV
Example Error
-------------------------------------------------------------------------------- INTERNET COMPUTER - SETUP - FAILED -------------------------------------------------------------------------------- Please contact the Node Provider Matrix channel for support. -------------------------------------------------------------------------------- ERROR -------------------------------------------------------------------------------- Unable to setup PV on drive '/dev/nvme8n1'. -------------------------------------------------------------------------------- ERROR --------------------------------------------------------------------------------
Common Causes
This error means that the node is able to recognize that a drive is installed, but is unable to write to it. This could indicate that there is a hardware issue with the drive.
Suggested Solutions
Please try to remove and re-install all drives, before attempting to install the node again. It may be helpful to independently verify that each drive is functioning correctly.