Difference between revisions of "Troubleshooting Node Deployment Errors"

From Internet Computer Wiki
Jump to: navigation, search
m
m
 
(40 intermediate revisions by 5 users not shown)
Line 1: Line 1:
This page has some error codes that may display as you are onboarding your nodes. Please review the examples, causes, and proposed solutions before contacting support.  
+
This page has some error codes that may display as you are onboarding your nodes. Please review this guide in its entirety before reaching out on the IC Node Provider Matrix channel.  
  
If you encounter an error not listed here, please capture a screenshot and detail when it happened, which stage in onboarding you were at, the status of any lights on the server, and any other relevant details. Send this information to [email protected] to open a ticket.
+
If you need Dell to service your machine, then these links will assist in [[Retrieving a Dell TSR Log|retrieving a Dell TSR Log]] and in resetting the iDRAC password.
  
= Missing Drives =
+
== General troubleshooting steps ==
 +
'''Please complete ALL these steps before messaging in the Matrix channel.'''
 +
 
 +
# <span class="s1"></span>Make sure you are using '''[https://dashboard.internetcomputer.org/releases the latest IC-OS release].''' If you are not sure if you are using the latest release, download the latest release and retry your node deployment.
 +
# <span class="s1"></span>Make sure you are using the proper Node deployment guide:
 +
#*[[Node Deployment Guide (with an HSM)|Legacy (Gen-1) Node Deployment Guide (with an HSM)]]
 +
#* [[Node Deployment Guide|Current (Gen-2) Node Deployment Guide (without an HSM)]]
 +
# <span class="s1"></span>Reread '''all''' the directions in your node deployment guide to make sure you aren’t missing something. The directions are precise, and they do change slightly over time.
 +
# <span class="s1"></span>Reread the [[Node Provider Networking Guide]]. Make sure you aren’t violating anything in the networking “[[Node Provider Networking Guide#What NOT to do|What NOT to do]]” section
 +
# <span class="s1"></span>Restart the node deployment process from the very beginning. Try to reproduce the error you are encountering.
 +
# <span class="s1"></span>Try to deploy to a different node machine. Try to reproduce the error on multiple node machines.
 +
# <span class="s1"></span>Take extra care to make sure you [[Node Provider Onboarding#5.%20Choose%20onboarding%20path%20.28HSM%20vs%20no%20HSM.29|set up you Node Operator keys correctly.]]<span class="s1"></span>
 +
 
 +
=== Support request information requirements ===
 +
If you are still encountering deployment issues, '''read the rest of this guide'''. If you still can't successfully deploy your nodes, post a support request message in the [https://app.element.io/#/room/#ic-node-providers:matrix.org IC Node Provider Matrix channel] containing '''ALL''' the following information:
 +
* A screenshot of your issue
 +
* The stage of the onboarding in which you are failing
 +
* What deployment method you are using: 
 +
**[[Node Deployment Guide (with an HSM)|Legacy (Gen-1) Node Deployment Guide (with an HSM)]]
 +
**[[Node Deployment Guide|Current (Gen-2) Node Deployment Guide (without an HSM)]]
 +
* Is this your first time performing an IC-OS installation?
 +
* Is this your first time performing an IC-OS installation ''in this data center''?
 +
* Is this your first time performing an IC-OS installation ''with this Node Operator Key?''
 +
* Can you reproduce this issue?
 +
* Machine hardware details (Gen1 / Gen2, server brand)
 +
* '''A confirmation that you ran through the above general troubleshooting steps'''
 +
* Any other details you see as relevant
 +
 
 +
If you post a support request message that doesn't include '''ALL''' the above information, you will be asked to do so.
 +
 
 +
== Node registration failure ==
 +
==== Example Error ====
 +
You successfully installed your node without errors, but now, your node is failing to register with the IC and your Node ID is not visible on the [https://dashboard.internetcomputer.org/ dashboard].
 +
 
 +
==== Common Causes ====
 +
The node has installed and launched successfully, but is unable to join the network. This could be due to an out-of-date IC-OS installation image, trouble contacting the NNS, or node installation limits on the network.
 +
 
 +
==== Suggested Solutions ====
 +
Please verify that a recent [https://dashboard.internetcomputer.org/releases IC-OS installation] image version is being used, and check https://dashboard.internetcomputer.org/ to see how many nodes are currently registered under your Node Provider. If there are more nodes listed than expected, or if there are multiple nodes overlapping, please have any extra nodes removed from the network before attempting to install again. This can be caused if multiple installations have been performed on the same hardware, without cleaning up the records from the network.
 +
 
 +
Additionally, [https://wiki.internetcomputer.org/wiki/Node_Provider_Networking_Guide#What_NOT_to_do verify that there is no external network filtering] (external firewalls, packet filters, rate limiters).
 +
 
 +
== IC-OS installation failure: Missing Drives ==
 
==== Example Error ====
 
==== Example Error ====
 
  --------------------------------------------------------------------------------
 
  --------------------------------------------------------------------------------
Line 11: Line 53:
 
   
 
   
 
   
 
   
         Please contact the Internet Computer Association (ICA) support.
+
         Please contact the Node Provider Matrix channel for support.
 
   
 
   
 
    
 
    
Line 26: Line 68:
 
                                     ERROR
 
                                     ERROR
 
  --------------------------------------------------------------------------------
 
  --------------------------------------------------------------------------------
 +
 +
'''Another version of it might say  "Aggregate Disk size does not meet requirements"'''
  
 
==== Common Causes ====
 
==== Common Causes ====
This error means that the SetupOS installation medium could not detect all required drives. This is a common issue, even if you believe that all drives are installed correctly. Some of them may not be functioning properly, or may not be fully seated into the chassis.
+
This error means that the IC-OS installation medium could not detect all required drives. This is a common issue, even if you believe that all drives are installed correctly. Some of them may not be functioning properly, or may not be fully seated into the chassis.
  
 
==== Suggested Solutions ====
 
==== Suggested Solutions ====
 
Check that all drives are fully seated and installed correctly, or install the required number of drives. You may be able to check the drives for indication LEDs to see which may not be installed or functioning correctly.
 
Check that all drives are fully seated and installed correctly, or install the required number of drives. You may be able to check the drives for indication LEDs to see which may not be installed or functioning correctly.
  
 
+
== IC-OS installation failure: Invalid CPU Configuration ==
= Invalid CPU Configuration =
 
 
==== Example Error ====
 
==== Example Error ====
  
Line 43: Line 86:
 
   
 
   
 
   
 
   
         Please contact the Internet Computer Association (ICA) support.
+
         Please contact the Node Provider Matrix channel for support.
 
 
   
 
   
 
   
 
   
Line 55: Line 97:
 
  --------------------------------------------------------------------------------
 
  --------------------------------------------------------------------------------
 
                                     ERROR
 
                                     ERROR
  --------------------------------------------------------------------------------
+
  --------------------------------------------------------------------------------<br />
 
 
 
==== Common Causes ====
 
==== Common Causes ====
 
Issues related to CPU capability usually mean that the CPUs are not configured correctly in the system BIOS.
 
Issues related to CPU capability usually mean that the CPUs are not configured correctly in the system BIOS.
Line 62: Line 103:
 
==== Suggested Solutions ====
 
==== Suggested Solutions ====
 
Please check that BIOS settings are configured correctly. It may be helpful to reset all settings to factory defaults, and go through the BIOS configuration again.
 
Please check that BIOS settings are configured correctly. It may be helpful to reset all settings to factory defaults, and go through the BIOS configuration again.
 
+
== IC-OS installation failure: Unable to Reach Internet ==
 
 
= Unable to Reach Internet =
 
 
==== Example Error ====
 
==== Example Error ====
  
Line 73: Line 112:
 
   
 
   
 
   
 
   
         Please contact the Internet Computer Association (ICA) support.
+
         Please contact the Node Provider Matrix channel for support.
 
    
 
    
 
   
 
   
Line 111: Line 150:
 
Please compare this, and the initial configuration, to what you expect. If this configuration does not match, please update the initial configuration, and try again.
 
Please compare this, and the initial configuration, to what you expect. If this configuration does not match, please update the initial configuration, and try again.
  
If this does match the expected configuration, please attempt to diagnose any machines between this node and the rest of the internet. This could be due to improper firewall configuration, or an issue with the data center’s network. If all configuration looks correct, please attempt to reboot any machines between this node and the rest of the internet. In most cases, this would be a firewall. Rebooting the firewall - even if it seems to be operating correctly - has resolved this issue many times.
+
If this does match the expected configuration, please attempt to diagnose any machines between this node and the rest of the internet. This could be due to improper firewall configuration, or an issue with the data center’s network. If all configuration looks correct, please attempt to reboot any machines between this node and the rest of the Internet. In most cases, this would be a firewall. Rebooting the firewall - even if it seems to be operating correctly - has resolved this issue many times.
  
 
+
== IC-OS installation failure: Unable to setup PV ==
= Long Wait on Node Join =
 
 
==== Example Error ====
 
==== Example Error ====
Orchestrator started.
 
Starting node registration.
 
Attaching HSM.
 
Sending add_node request.
 
  
But not:
+
--------------------------------------------------------------------------------
  Join request successful!
+
                      INTERNET COMPUTER - SETUP - FAILED
  You may now safely remove the HSM.
+
--------------------------------------------------------------------------------
 +
   
 +
   
 +
 +
        Please contact the Node Provider Matrix channel for support.
 +
 
 +
 +
 +
--------------------------------------------------------------------------------
 +
                                    ERROR
 +
--------------------------------------------------------------------------------
 +
 
 +
 
 +
  Unable to setup PV on drive '/dev/nvme8n1'.
 +
 
 +
 
 +
--------------------------------------------------------------------------------
 +
                                    ERROR
 +
--------------------------------------------------------------------------------
  
 
==== Common Causes ====
 
==== Common Causes ====
The node has installed and launched successfully, but is unable to join the network. This could be due to an out of date SetupOS, trouble contacting the NNS, or node installation limits on the network.
+
This error means that the node is able to recognize that a drive is installed, but is unable to write to it. This could indicate that there is a hardware issue with the drive.
  
 
==== Suggested Solutions ====
 
==== Suggested Solutions ====
Please verify that a recent version of SetupOS from the wiki is being used for the installation, and check https://dashboard.internetcomputer.org/ to see how many nodes are currently registered under your Node Provider. If there are more nodes listed than expected, or if there are multiple nodes overlapping, please have any extra nodes removed from the network before attempting to install again. This can be caused if multiple installations have been performed on the same hardware, without cleaning up the records from the network. The current installation does not need to be done again, but it will require re-insertion of the HSM before it will join successfully.
+
Please try to remove and re-install all drives, before attempting to install the node again. It may be helpful to independently verify that each drive is functioning correctly.
 
 
  
* [[Internet Computer wiki|Return to Wiki Home]]
+
== Troubleshooting IC-OS installation failure: Getting a shell ==
* [[IC OS Installation Runbook - Dell Poweredge|Return to Node Onboarding]]
+
* During the IC-OS installation, you may hit enter to obtain console access to troubleshoot any issues you are encountering. You can also hit enter at the error page in order to access the console. Hit enter until you see a login prompt
 +
* Log in with user <code>root</code> and empty password
 +
* Now you have root access for diagnostics, etc

Latest revision as of 15:19, 27 March 2024

This page has some error codes that may display as you are onboarding your nodes. Please review this guide in its entirety before reaching out on the IC Node Provider Matrix channel.

If you need Dell to service your machine, then these links will assist in retrieving a Dell TSR Log and in resetting the iDRAC password.

General troubleshooting steps

Please complete ALL these steps before messaging in the Matrix channel.

  1. Make sure you are using the latest IC-OS release. If you are not sure if you are using the latest release, download the latest release and retry your node deployment.
  2. Make sure you are using the proper Node deployment guide:
  3. Reread all the directions in your node deployment guide to make sure you aren’t missing something. The directions are precise, and they do change slightly over time.
  4. Reread the Node Provider Networking Guide. Make sure you aren’t violating anything in the networking “What NOT to do” section
  5. Restart the node deployment process from the very beginning. Try to reproduce the error you are encountering.
  6. Try to deploy to a different node machine. Try to reproduce the error on multiple node machines.
  7. Take extra care to make sure you set up you Node Operator keys correctly.

Support request information requirements

If you are still encountering deployment issues, read the rest of this guide. If you still can't successfully deploy your nodes, post a support request message in the IC Node Provider Matrix channel containing ALL the following information:

  • A screenshot of your issue
  • The stage of the onboarding in which you are failing
  • What deployment method you are using:
  • Is this your first time performing an IC-OS installation?
  • Is this your first time performing an IC-OS installation in this data center?
  • Is this your first time performing an IC-OS installation with this Node Operator Key?
  • Can you reproduce this issue?
  • Machine hardware details (Gen1 / Gen2, server brand)
  • A confirmation that you ran through the above general troubleshooting steps
  • Any other details you see as relevant

If you post a support request message that doesn't include ALL the above information, you will be asked to do so.

Node registration failure

Example Error

You successfully installed your node without errors, but now, your node is failing to register with the IC and your Node ID is not visible on the dashboard.

Common Causes

The node has installed and launched successfully, but is unable to join the network. This could be due to an out-of-date IC-OS installation image, trouble contacting the NNS, or node installation limits on the network.

Suggested Solutions

Please verify that a recent IC-OS installation image version is being used, and check https://dashboard.internetcomputer.org/ to see how many nodes are currently registered under your Node Provider. If there are more nodes listed than expected, or if there are multiple nodes overlapping, please have any extra nodes removed from the network before attempting to install again. This can be caused if multiple installations have been performed on the same hardware, without cleaning up the records from the network.

Additionally, verify that there is no external network filtering (external firewalls, packet filters, rate limiters).

IC-OS installation failure: Missing Drives

Example Error

--------------------------------------------------------------------------------
                      INTERNET COMPUTER - SETUP - FAILED
--------------------------------------------------------------------------------



       Please contact the Node Provider Matrix channel for support.

 

--------------------------------------------------------------------------------
                                    ERROR
--------------------------------------------------------------------------------


Not enough drives found. Are all drives correctly installed?
 

--------------------------------------------------------------------------------
                                    ERROR
--------------------------------------------------------------------------------

Another version of it might say "Aggregate Disk size does not meet requirements"

Common Causes

This error means that the IC-OS installation medium could not detect all required drives. This is a common issue, even if you believe that all drives are installed correctly. Some of them may not be functioning properly, or may not be fully seated into the chassis.

Suggested Solutions

Check that all drives are fully seated and installed correctly, or install the required number of drives. You may be able to check the drives for indication LEDs to see which may not be installed or functioning correctly.

IC-OS installation failure: Invalid CPU Configuration

Example Error

--------------------------------------------------------------------------------
                      INTERNET COMPUTER - SETUP - FAILED
--------------------------------------------------------------------------------



       Please contact the Node Provider Matrix channel for support.


--------------------------------------------------------------------------------
                                    ERROR
--------------------------------------------------------------------------------

Number of threads (16/32) does NOT meet system requirements.

--------------------------------------------------------------------------------
                                    ERROR
--------------------------------------------------------------------------------

Common Causes

Issues related to CPU capability usually mean that the CPUs are not configured correctly in the system BIOS.

Suggested Solutions

Please check that BIOS settings are configured correctly. It may be helpful to reset all settings to factory defaults, and go through the BIOS configuration again.

IC-OS installation failure: Unable to Reach Internet

Example Error

--------------------------------------------------------------------------------
                      INTERNET COMPUTER - SETUP - FAILED
--------------------------------------------------------------------------------



       Please contact the Node Provider Matrix channel for support.
 


--------------------------------------------------------------------------------
                                    ERROR
--------------------------------------------------------------------------------
 
 
 Unable to ping IPv6 gateway.
 
 
--------------------------------------------------------------------------------
                                    ERROR
--------------------------------------------------------------------------------

Common Causes

This error means that the node is not able to communicate with the network properly. This can be due to a misconfigured network configuration, or due to issues somewhere between the node and the rest of the internet.

Suggested Solutions

Please try to capture any output that is displayed before this error shows. For example:

* Printing user defined network settings...
 IPv6 Prefix : XXX
 IPv6 Subnet : XXX
 IPv6 Gateway: XXX
 
* Printing system's network settings...
 IPv6 Prefix : XXX
 IPv6 Subnet : XXX
 IPv6 Gateway: XXX
 
* Printing IPv6 addresses...
 SetupOS: XXX
 HostOS : XXX
 GuestOS: XXX

Please compare this, and the initial configuration, to what you expect. If this configuration does not match, please update the initial configuration, and try again.

If this does match the expected configuration, please attempt to diagnose any machines between this node and the rest of the internet. This could be due to improper firewall configuration, or an issue with the data center’s network. If all configuration looks correct, please attempt to reboot any machines between this node and the rest of the Internet. In most cases, this would be a firewall. Rebooting the firewall - even if it seems to be operating correctly - has resolved this issue many times.

IC-OS installation failure: Unable to setup PV

Example Error

--------------------------------------------------------------------------------
                      INTERNET COMPUTER - SETUP - FAILED
--------------------------------------------------------------------------------



       Please contact the Node Provider Matrix channel for support.
 


--------------------------------------------------------------------------------
                                    ERROR
--------------------------------------------------------------------------------
 
 
 Unable to setup PV on drive '/dev/nvme8n1'.
 
 
--------------------------------------------------------------------------------
                                    ERROR
--------------------------------------------------------------------------------

Common Causes

This error means that the node is able to recognize that a drive is installed, but is unable to write to it. This could indicate that there is a hardware issue with the drive.

Suggested Solutions

Please try to remove and re-install all drives, before attempting to install the node again. It may be helpful to independently verify that each drive is functioning correctly.

Troubleshooting IC-OS installation failure: Getting a shell

  • During the IC-OS installation, you may hit enter to obtain console access to troubleshoot any issues you are encountering. You can also hit enter at the error page in order to access the console. Hit enter until you see a login prompt
  • Log in with user root and empty password
  • Now you have root access for diagnostics, etc