Node Provider Networking Troubleshooting Guide

From Internet Computer Wiki
Revision as of 20:34, 7 August 2023 by Andrew.battat (talk | contribs) (Created page with "This page is designed to guide you through common Node Provider Networking issues and processes. == Checking the port status of a deployed node/server == To verify the por...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This page is designed to guide you through common Node Provider Networking issues and processes.


Checking the port status of a deployed node/server

To verify the port status of the deployed node on the switch, follow these steps:

  1. Identify the switch: Determine the switch to which the node is connected.
  2. Access the switch: Use a console cable or remote management interface (SSH, Telnet, etc.) to connect to the switch.
  3. Log in to the switch: Enter the appropriate credentials (username and password) to access the switch's command line interface (CLI).
  4. Identify the port: Determine the port on the switch to which the node is connected. This information may be provided during the deployment or can be obtained by physically tracing the network cable.
  5. Check the port status: Use the below commands to check the status of the specific port on the switch depending on the platform you use.
  6. Analyze the output: The command output will provide details about the port's status, including its operational state, link status, speed, duplex mode, and any error or drop counters. Look for the following key information:
    1. Operational state: It should be "up" or "connected" for the port to be active.
    2. Link status: It should indicate "up" for the port to have a functional connection.
    3. Speed and duplex: Verify that the configured speed and duplex settings match the expected values.
    4. Error counters: If there are a high number of errors or drops, it may indicate issues with the connection.
  7. Troubleshooting: If the port status is not as expected or indicates any issues, you can perform further troubleshooting steps. Some common troubleshooting actions include checking the physical cable connections, restarting the node and switch, verifying VLAN configurations, and ensuring the switch port configuration matches the requirements of the node.


Command examples:

Cisco Nexus

switch# show interface status

--------------------------------------------------------------------------------
Port          Name               Status    Vlan      Duplex  Speed   Type
--------------------------------------------------------------------------------
mgmt0         --                 connected routed    full    1000    --

--------------------------------------------------------------------------------
Port          Name               Status    Vlan      Duplex  Speed   Type
--------------------------------------------------------------------------------
Eth1/1        Server:WAN         connected 1         full    10G     10Gbase-SR
Eth1/2        Server:WAN         connected 1         full    10G     10Gbase-SR
Eth1/3        Server:WAN         connected 1         full    10G     10Gbase-SR
Eth1/4        Server:WAN         connected 1         full    10G     10Gbase-SR
Eth1/5        Server:WAN         connected 1         full    10G     10Gbase-SR
Eth1/6        Server:WAN         connected 1         full    10G     10Gbase-SR
Eth1/7        Server:WAN         connected 1         full    10G     10Gbase-SR
Eth1/8        Server:WAN         connected 1         full    10G     10Gbase-SR
Eth1/9        Server:WAN         connected 1         full    10G     10Gbase-SR
Eth1/10       Server:WAN         connected 1         full    10G     10Gbase-SR
Eth1/11       Server:WAN         connected 1         full    10G     10Gbase-SR
Eth1/12       Server:WAN         connected 1         full    10G     10Gbase-SR
Eth1/13       Server:WAN         connected 1         full    10G     10Gbase-SR
Eth1/14       Server:WAN         connected 1         full    10G     10Gbase-SR
..
switch# show interface ethernet 1/1
Ethernet1/1 is up
admin state is up, Dedicated Interface
  Hardware: 1000/10000 Ethernet, address: 0cb4.0000.0101 (bia 0cb4.0000.0101)
  Description: Server:WAN
  MTU 1500 bytes, BW 10000000 Kbit , DLY 10 usec
  reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA, medium is broadcast
  Port mode is access
  full-duplex, 10 Gb/s, media type is 10G
  Beacon is turned off
  Input flow-control is off, output flow-control is off
  Rate mode is dedicated
  Switchport monitor is off
  EtherType is 0x8100
  Last link flapped 00:01:14
  Last clearing of "show interface" counters never
  4 interface resets
  Load-Interval #1: 30 seconds
    30 seconds input rate 0 bits/sec, 0 packets/sec
    30 seconds output rate 296 bits/sec, 0 packets/sec
    input rate 0 bps, 0 pps; output rate 296 bps, 0 pps
  Load-Interval #2: 5 minute (300 seconds)
    300 seconds input rate 0 bits/sec, 0 packets/sec
    300 seconds output rate 200 bits/sec, 0 packets/sec
    input rate 0 bps, 0 pps; output rate 200 bps, 0 pps
  RX
    0 unicast packets  0 multicast packets  0 broadcast packets
    0 input packets  0 bytes
    0 jumbo packets  0 storm suppression packets
    0 runts  0 giants  0 CRC  0 no buffer
    0 input error  0 short frame  0 overrun   0 underrun  0 ignored
    0 watchdog  0 bad etype drop  0 bad proto drop  0 if down drop
    0 input with dribble  0 input discard
    0 Rx pause
  TX
    125 unicast packets  127 multicast packets  110 broadcast packets
    362 output packets  72269 bytes
    0 jumbo packets
    0 output error  0 collision  0 deferred  0 late collision
    0 lost carrier  0 no carrier  0 babble  0 output discard
    0 Tx pause

Dell OS10

OS10# show interface status

--------------------------------------------------------------------------------------------------
Port            Description     Status   Speed    Duplex   Mode Vlan Tagged-Vlans
--------------------------------------------------------------------------------------------------
Eth 1/1/1       Server:WAN      up       10G      full     A    1    -
Eth 1/1/2       Server:WAN      up       10G      full     A    1    -
Eth 1/1/3       Server:WAN      up       10G      full     A    1    -
Eth 1/1/4       Server:WAN      up       10G      full     A    1    -
Eth 1/1/5       Server:WAN      up       10G      full     A    1    -
Eth 1/1/6       Server:WAN      up       10G      full     A    1    -
Eth 1/1/7       Server:WAN      up       10G      full     A    1    -
Eth 1/1/8       Server:WAN      up       10G      full     A    1    -
Eth 1/1/9       Server:WAN      up       10G      full     A    1    -
Eth 1/1/10      Server:WAN      up       10G      full     A    1    -
Eth 1/1/11      Server:WAN      up       10G      full     A    1    -
Eth 1/1/12      Server:WAN      up       10G      full     A    1    -
Eth 1/1/13      Server:WAN      up       10G      full     A    1    -
Eth 1/1/14      Server:WAN      up       10G      full     A    1    -
Eth 1/1/15      Server:WAN      up       10G      full     A    1    -
...
OS10# show interface ethernet 1/1/1
Ethernet 1/1/1 is up, line protocol is up
Description: Server:WAN
Hardware is Eth, address is 0c:a6:36:d9:00:01
    Current address is 0c:a6:36:d9:00:01
Pluggable media present, RJ45 type is 10GBASE-T-RJ45
    Wavelength is 0
Interface index is 16
Internet address is not set
Mode of IPv4 Address Assignment: not set
Interface IPv6 oper status: Disabled
MTU 9216 bytes, IP MTU 9184 bytes
LineSpeed 10G, Auto-Negotiation on
Flowcontrol rx on tx off
ARP type: ARPA, ARP Timeout: 60
Tag Protocol IDentifier (TPID) value: 0x8100
Last clearing of "show interface" counters: 00:06:49
Queuing strategy: fifo
Input statistics:
     0 packets, 0 octets
     0 64-byte pkts, 0 over 64-byte pkts, 0 over 127-byte pkts
     0 over 255-byte pkts, 0 over 511-byte pkts, 0 over 1023-byte pkts
     0 Multicasts, 0 Broadcasts, 0 Unicasts
     0 runts, 0 giants, 0 throttles
     0 CRC, 0 overrun, 0 discarded
Output statistics:
     0 packets, 0 octets
     0 64-byte pkts, 0 over 64-byte pkts, 0 over 127-byte pkts
     0 over 255-byte pkts, 0 over 511-byte pkts, 0 over 1023-byte pkts
     0 Multicasts, 0 Broadcasts, 0 Unicasts
     0 throttles, 0 discarded, 0 Collisions,  wred drops
Rate Info(interval 30 seconds):
     Input 0 Mbits/sec, 0 packets/sec, 0% of line rate
     Output 0 Mbits/sec, 0 packets/sec, 0% of line rate
Time since last interface status change: 00:01:37

Cumulus

cumulus@cumulus:mgmt:~$ net show interface
State  Name     Spd  MTU    Mode       LLDP                           Summary
-----  -------  ---  -----  ---------  -----------------------------  --------------------------
UP     lo       N/A  65536  Loopback                                  IP: 127.0.0.1/8
       lo                                                             IP: ::1/128
UP     eth0     1G   1500   Mgmt                                      Master: mgmt(UP)
       eth0                                                           IP: 192.168.1.10/24(DHCP)
UP     swp1     10G  9216   Access/L2  host-xxxxxxxxxxxx (ens2f1np1)  Master: bridge(UP)       
UP     swp2     10G  9216   Access/L2  host-xxxxxxxxxxxx (ens2f1np1)  Master: bridge(UP)
UP     swp3     10G  9216   Access/L2  host-xxxxxxxxxxxx (ens2f1np1)  Master: bridge(UP)
UP     swp4     10G  9216   Access/L2  host-xxxxxxxxxxxx (ens2f1np1)  Master: bridge(UP)
UP     swp5     10G  9216   Access/L2  host-xxxxxxxxxxxx (ens2f1np1)  Master: bridge(UP)
UP     swp6     10G  9216   Access/L2  host-xxxxxxxxxxxx (ens2f1np1)  Master: bridge(UP)
UP     swp7     10G  9216   Access/L2  host-xxxxxxxxxxxx (ens2f1np1)  Master: bridge(UP)
UP     swp8     10G  9216   Access/L2  host-xxxxxxxxxxxx (ens2f1np1)  Master: bridge(UP)
UP     swp9     10G  9216   Access/L2  host-xxxxxxxxxxxx (ens2f1np1)  Master: bridge(UP)
UP     swp10    10G  9216   Access/L2  host-xxxxxxxxxxxx (ens2f1np1)  Master: bridge(UP)
UP     swp11    10G  9216   Access/L2  host-xxxxxxxxxxxx (ens2f1np1)  Master: bridge(UP)
...
UP     bridge   N/A  9216   Bridge/L2
UP     mgmt     N/A  65536  VRF                                       IP: 127.0.0.1/8
cumulus@cumulus:mgmt:~$ net show interface swp2
    Name  MAC                Speed  MTU   Mode
--  ----  -----------------  -----  ----  ---------
UP  swp2  0c:e1:54:56:00:02  10G    9216  Access/L2

All VLANs on L2 Port
--------------------
1

Untagged
--------
1

cl-netstat counters
-------------------
RX_OK  RX_ERR  RX_DRP  RX_OVR  TX_OK  TX_ERR  TX_DRP  TX_OVR
-----  ------  ------  ------  -----  ------  ------  ------
    1       0       0       0     36       0       0       0

LLDP Details
------------
LocalPort  RemotePort(RemoteHost)
---------  ----------------------------
swp2       ens2f1np1(host-xxxxxxxxxxxx)

Routing
-------
  Interface swp2 is up, line protocol is up
  Link ups:       1    last: 2023/07/14 07:04:29.71
  Link downs:     0    last: (never)
  PTM status: disabled
  vrf: default
  index 4 metric 0 mtu 9216 speed 1000
  flags: <UP,BROADCAST,RUNNING,MULTICAST>
  Type: Ethernet
  HWaddr: 0c:e1:54:56:00:02
  Interface Type Other
  Master interface: bridge
  protodown: off





Checking if the Mac address of the Server is learned on the switch port

To check if the MAC address of the server is learned on the switch port, you can use the following steps: Access the switch, login, and identify the port: Please use steps from the above guide View MAC address table: Use the below command to display the MAC address table on switch Check for the server's MAC address: Look for the MAC address of the server in the output of the previous command. The MAC address should be associated with the switch port where the server is connected. Verify MAC address learning: If you do not see the server's MAC address in the MAC address table, it means that the switch has not learned the MAC address from the server yet. In such cases, you can try the following: Ensure the server is powered on and connected to the correct switch port. Check the physical network connection, including the Ethernet/Fiber cable. Verify if the server's network interface is functioning properly. If the server is configured with a static MAC address, ensure it matches the expected MAC address. Further troubleshooting: If you encounter any issues, you can perform additional troubleshooting steps. This may involve checking the server's network configuration, examining the switch port configuration, verifying VLAN assignments, or investigating any network connectivity problems.

Command examples:

Cisco Nexus

switch# show mac address-table interface ethernet 1/1 Legend:

       * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
       age - seconds since last seen,+ - primary entry using vPC Peer-Link,
       (T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
  VLAN     MAC Address      Type      age     Secure NTFY Ports

+-----------------+--------+---------+------+----+------------------

  • 1 3612.407f.41d6 dynamic 0 F F Eth1/1

Dell OS10

OS10# show mac address-table interface ethernet 1/1/1 Codes: pv <vlan-id> - private vlan where the mac is originally learnt VlanId Mac Address Type Interface 1 36:12:40:7f:41:d6 dynamic ethernet1/1/1

Cumulus

cumulus@cumulus:mgmt:~$ net show bridge macs dynamic

VLAN Master Interface MAC TunnelDest State Flags LastSeen


------ --------- ----------------- ---------- ----- ----- --------

  1  bridge  swp2       36:12:40:7f:41:d6                            00:01:24

How to verify the IPv6 Neighbors from the gateway

To verify IPv6 neighbors from the gateway on Cisco, Dell OS10, and Cumulus network devices, you can use the following commands:

Access the switch/router and log in Use the following command to view the IPv6 neighbors: The command output will display the IPv6 neighbors along with their IPv6 addresses, MAC addresses, and associated interfaces.

Command examples:

Cisco Nexus

switch# show ipv6 neighbor

Flags: # - Adjacencies Throttled for Glean

      G - Adjacencies of vPC peer with G/W bit
      R - Adjacencies learnt remotely
      CP - Added via L2RIB, Control plane Adjacencies
      PS - Added via L2RIB, Peer Sync
      RO - Re-Originated Peer Sync Entry
      CC - Consistency check pending

IPv6 Adjacency Table for VRF default Total number of entries: 6 Address Age MAC Address Pref Source Interface Flags 2a00:fb01:400:100::1

               00:03:25  0c94.ad2c.0000  50   icmpv6     Ethernet1/1

fe80::e94:adff:fe2c:0

               00:03:20  0c94.ad2c.0000  50   icmpv6     Ethernet1/1

2a00:fb01:400:200:2c31:77ff:fe28:1996

               00:00:39  2e31.7728.1996  50   icmpv6     Vlan10

2a00:fb01:400:200:949a:afff:fe31:b3d7

               00:00:24  969a.af31.b3d7  50   icmpv6     Vlan10

fe80::2c31:77ff:fe28:1996

               00:01:11  2e31.7728.1996  50   icmpv6     Vlan10

fe80::949a:afff:fe31:b3d7

               00:00:59  969a.af31.b3d7  50   icmpv6     Vlan10

Dell OS10

OS10# show ipv6 neighbors Codes: pv <vlan-id> - private vlan where the mac is originally learnt IPv6 Address Hardware Address State Interface Egress Int


2a00:fb01:400:100::1 0c:94:ad:2c:00:00 reachable ethernet1/1/1 2a00:fb01:400:200:2c31:77ff:fe28:1996 2e:31:77:28:19:96 reachable vlan10 ethernet1/1/2 2a00:fb01:400:200:949a:afff:fe31:b3d7 96:9a:af:31:b3:d7 reachable vlan10 ethernet1/1/3 fe80::e94:adff:fe2c:0 0c:94:ad:2c:00:00 reachable ethernet1/1/1 fe80::2c31:77ff:fe28:1996 2e:31:77:28:19:96 reachable vlan10 ethernet1/1/2 fe80::949a:afff:fe31:b3d7 96:9a:af:31:b3:d7 reachable vlan10 ethernet1/1/3

Cumulus

cumulus@cumulus:mgmt:~$ net show neighbor ipv6 Neighbor MAC Interface AF STATE


----------------- --------- ---- ---------

fe80::e94:adff:fe2c:0 0c:94:ad:2c:00:00 vlan1 IPv6 STALE 2a00:fb01:400:100::1 0c:94:ad:2c:00:00 swp4 IPv6 STALE 2a00:fb01:400:200:949a:afff:fe31:b3d7 96:9a:af:31:b3:d7 vlan1 IPv6 REACHABLE 2a00:fb01:400:200:2c31:77ff:fe28:1996 2e:31:77:28:19:96 vlan1 IPv6 REACHABLE fe80::e94:adff:fe2c:0 0c:94:ad:2c:00:00 swp4 IPv6 STALE fe80::949a:afff:fe31:b3d7 96:9a:af:31:b3:d7 vlan1 IPv6 REACHABLE fe80::2c31:77ff:fe28:1996 2e:31:77:28:19:96 vlan1 IPv6 REACHABLE cumulus@cumulus:mgmt:~$


Verify the connectivity using a server/laptop: ping the gateway, outside IPv6 and try to resolve DNS names

By performing these steps, you can verify the connectivity using a server or laptop, ensuring the ability to ping the gateway, reach external IPv6 addresses, and resolve hostnames via DNS this will confirm that your setup is ready for deployment.

To verify connectivity using a server or laptop, follow these steps: Connect the server or laptop to the network: Ensure that the server or laptop is connected to the network where the gateway is located. This can be done by connecting an Ethernet cable to the same switch where the IC nodes will be connected. Obtain IPv6 address information: Configure the server or laptop with an IPv6 address, either through manual configuration or automatic assignment SLAAC. Ensure that the IPv6 address is within the same subnet as the gateway. Ping the gateway: Use the following command to ping the IPv6 address of the gateway: ping6 <gateway IPv6 address> Replace <gateway IPv6 address> with the actual IPv6 address of the gateway. This command will send ICMPv6 echo requests to the gateway and wait for a response. If you receive successful replies, it indicates that the server or laptop can reach the gateway over IPv6. If you encounter "Destination unreachable" or "Request timed out" messages, it suggests that there may be connectivity issues between the server or laptop and the gateway. Check the network configuration, ensure the gateway is reachable, and verify firewall settings. Ping an external IPv6 address: Test connectivity to an external IPv6 address to verify connectivity beyond the gateway. Use the following command: ping6 <external IPv6 address>

As an example you can use Google and Cloudflare DNS IPv6: ping6 2001:4860:4860:0:0:0:0:8888 ping6 2606:4700:4700::1111

Replace <external IPv6 address> with the IPv6 address of a known external host, such as a public IPv6 address or another device on the internet. This will help determine if there is end-to-end IPv6 connectivity from the server or laptop. If you receive successful replies, it confirms that the server or laptop can communicate with external IPv6 addresses. If you encounter issues or failures, check for any firewall rules, routing problems, or potential network configuration issues that may be affecting connectivity.

Resolve the NNS nodes: To test DNS resolution, attempt to resolve the hostname "icp0.io, icp-api.io and ic0.app" to its IPv6 address. Use the following command: nslookup -query=AAAA icp0.io nslookup -query=AAAA icp-api.io nslookup -query=AAAA ic0.app This command queries the DNS server for the AAAA record (IPv6 address) of the above domains. If the resolution is successful, it will display the corresponding IPv6 address. If you receive the IPv6 address, it confirms that the DNS resolution is functioning correctly. If the resolution fails. verify the DNS configuration, check for any Firewall block, check routing, or consider checking the host file or DNS caching on the server or laptop.

How to verify the IPv6 Routing from the gateway

To verify routing on the switch/gateway for IPv6, you can follow these steps: Access the switch/gateway: Connect to the switch/gateway using a console cable or remote management interface (SSH, Telnet, etc.), and log in with the appropriate credentials. Identify the routing table: Determine the command or method to view the IPv6 routing table on the specific switch/gateway platform. This can vary depending on the device and its operating system. Here are some command examples: Cisco Nexus

switch# show ipv6 route vrf all IPv6 Routing Table for VRF "default" '*' denotes best ucast next-hop '**' denotes best mcast next-hop '[x/y]' denotes [preference/metric]

0::/0, ubest/mbest: 1/0

   *via 2a00:fb01:400:100::1/128, [1/0], 00:01:01, static

2a00:fb01:400:100::/126, ubest/mbest: 1/0, attached

   *via 2a00:fb01:400:100::3, Eth1/1, [0/0], 00:01:02, direct,

2a00:fb01:400:100::3/128, ubest/mbest: 1/0, attached

   *via 2a00:fb01:400:100::3, Eth1/1, [0/0], 00:01:02, local

2a00:fb01:400:200::/64, ubest/mbest: 1/0, attached

   *via 2a00:fb01:400:200::1, Vlan10, [0/0], 00:05:39, direct,

2a00:fb01:400:200::1/128, ubest/mbest: 1/0, attached

   *via 2a00:fb01:400:200::1, Vlan10, [0/0], 00:05:39, local

Dell OS10 OS10# show ipv6 route Codes: C - connected

      S - static
      B - BGP, IN - internal BGP, EX - external BGP, EV - EVPN BGP
      O - OSPF, IA - OSPF inter area, N1 - OSPF NSSA external type 1,
      N2 - OSPF NSSA external type 2, E1 - OSPF external type 1,
      E2 - OSPF external type 2, * - candidate default,
      + - summary route, > - non-active route

Gateway of last resort is via 2a00:fb01:400:100::1 to network ::/0

      Destination                                 Gateway                                                   Dist/Metric   Last Change

 *S    ::/0                                  via 2a00:fb01:400:100::1                ethernet1/1/1           1/0           00:02:44
 C     2a00:fb01:400:100::/126               via 2a00:fb01:400:100::3                ethernet1/1/1           0/0           00:00:23
 C     2a00:fb01:400:200::/64                via 2a00:fb01:400:200::1                vlan10                  0/0           00:02:56


Cumulus cumulus@cumulus:mgmt:~$ net show route ipv6 Codes: K - kernel route, C - connected, S - static, R - RIPng,

      O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
      v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR,
      f - OpenFabric,
      > - selected route, * - FIB route, q - queued route, r - rejected route

S>* ::/0 [1/0] via 2a00:fb01:400:100::1, swp4, weight 1, 00:01:11 C>* 2a00:fb01:400:100::/126 is directly connected, swp4, 00:01:11 C>* 2a00:fb01:400:200::/64 is directly connected, vlan1, 00:01:08 C * fe80::/64 is directly connected, vlan1, 00:01:08 C * fe80::/64 is directly connected, bridge, 00:01:09 C>* fe80::/64 is directly connected, swp4, 00:01:11 cumulus@cumulus:mgmt:~$


Consult the device documentation or vendor resources for the exact command to view the IPv6 routing table on your specific device. Examine the routing table: Analyze the output of the routing table command to verify the presence of IPv6 routes. Look for routes that have an IPv6 destination address and the associated next-hop or outgoing interface. If there are specific destination IPv6 networks listed in the routing table, it indicates that the switch/gateway has routing information for those networks. If the routing table is empty or does not include the expected IPv6 routes, it suggests that there might be an issue with routing configuration or connectivity. Verify default route: Check if there is a default route present in the routing table. The default route, represented as "::/0", is used for forwarding IPv6 traffic when no specific matching routes are found. Ensure that there is a valid next-hop or outgoing interface associated with the default route. If a valid default route is present, it ensures that the switch/gateway has a way to forward IPv6 traffic to destinations outside of its directly connected networks. If there is no default route or an incorrect default route, it can cause connectivity issues for IPv6 traffic that doesn't match any specific routes. Troubleshoot routing issues: If there are any routing issues, you can perform troubleshooting steps such as: Check the configuration of IPv6 routing protocols (e.g., OSPFv3, BGP) if used. Verify that the switch/gateway has the necessary interfaces configured with IPv6 addresses. Ensure that the routing table entries are correct and reflect the expected network topology. Verify that neighboring routers have the appropriate IPv6 routing information.


Please note that the commands provided are general examples and may differ slightly depending on the specific device model and software version. Refer to the documentation or vendor resources for more precise command syntax and options for your particular network device.