Node Provider Networking Guide

From Internet Computer Wiki
Jump to: navigation, search

This guide is designed to provide an overview of the networking requirements and guide Node Providers through setting up their servers into a rack with functioning networking.

Configuring networks is not trivial. You should be familiar with IP networking, network equipment and network cabling.

Resources to learn about networking:

DFINITY does not provide support for network configuration.

If you hire technical assistance, keep decentralization and security in mind. Use a local technician you personally know and carefully monitor their work.

Requirements

To join your servers to the Internet Computer (IC) you will need:

  • 10G Network equipment
    • SFP+ or Ethernet
    • Switch(es)
    • Cabling
    • Quantity determined by number of nodes deployed
  • Rackspace in a data center
  • Internet connection
    • Bandwidth
      • ~300Mbps per node
      • Ingress/egress ratio is currently 1:1. We expect egress (serving responses to client queries) to increase faster than ingress in the future.
      • This should guide how many servers to deploy and the appropriate ISP connection speed
      • E.g. a 1Gbps connection will support up to 3 IC nodes.
    • One IPv6 /64 subnet - each node gets multiple IPv6 addresses
    • Two IPv4 addresses per data center - All Node Providers are requested to deploy two nodes with IPv4 for every data center they operate in. Node Providers should deploy IPv4 to the first two nodes in their first rack.
    • All IP addresses are assigned statically and automatically by IC-OS

Network Cabling

When racking and stacking your servers, ensure the at least one 10G network port on each server is connected to the 10G switch. SFP+ and Ethernet are supported.

screenshot

For example, on a Supermicro 1U server, the 10G ports are in a cluster as seen above. Vendors differ.

Connect the 10G switch to the ISP endpoint - this could be the Top Of Rack (TOR) switch or other box.

Network Configuration

Node machines require:

  • The ability to acquire a public static IPv6 address on a /64 subnet
  • An IPv6 gateway to communicate with other nodes on the broad internet
  • Unfiltered internet access

Two nodes per data center require:

  • The ability to acquire a public static IPv4 address
  • An IPv4 gateway to communicate with other nodes on the broad internet
  • Unfiltered internet access

Note: IPv4 should be deployed to the first two nodes in the first rack.


There are many many ways to configure the network and some details depend on the ISP and data center. Here are some Example Network Configuration Scenarios.

See the Node Provider Networking Troubleshooting Guide for help.

BMC Setup Recommendations

What’s a BMC?

The Baseboard Management Controller (BMC) grants control of the underlying server hardware.

BMC’s have notoriously poor security. Vendors may name their implementation differently (Dell -> iDRAC, HPE -> iLO, etc.).

Recommendations

Change the password

BMC’s usually come with a common password. Log in via crash cart, KVM or the web interface and change it to something strong.

No broad internet access

It is highly recommended: do not expose your BMC to the broad internet. This is a safety precaution against attackers.

Options:

  • Don’t connect the BMC to the internet.
    • Maintenance or node recovery will require physical access in this case.
    • Any BMC activities occur via SSH on the host (unreliable on many mainboard vendors) or via crash cart.
  • Connect the BMC to a separate dumb switch, and the dumb switch connects to a Rack Mounted Unit (RMU).
  • Connect the BMC to a managed switch, and create a separate VLAN

This can get complicated. It’s outside the scope of this document to explain how to do this.

Resources:

Network monitoring

SNMP-based Network Monitoring:

  • Device Compatibility: Make sure your network devices support and enable SNMP agents. Choose the version of SNMP that aligns with your security needs as different versions offer varying security levels and functionality.
  • Secure Configuration: Implement SNMPv3 to enhance security through authentication and encryption, protecting against unauthorized access and data interception.
  • Monitoring Points: Select specific network parameters critical for performance. (such as bandwidth utilization, CPU usage, and memory usage). Set up SNMP polling for these parameters.
  • Thresholds and Alerts: Predefine alerts when monitored parameters exceed limits to identify issues proactively and take corrective actions.
  • Data Retention: Establish data retention policies for storing SNMP data for trend and capacity analysis.
  • Regular Review: It is important to regularly review SNMP monitoring configurations and thresholds to ensure that they are up-to-date and aligned with the changing network environment..

GNMI/gRPC-based Network Monitoring:

  • Protocol Familiarity: Get familiar with GNMI data models for your network devices and understand how they use gRPC (Remote Procedure Call) for network management.
  • Device Support: Verify that your network devices support GNMI, which is more commonly found in modern networking equipment that supports programmability.
  • Authentication and Encryption: Implement TLS for gRPC security to protect communication between the monitoring system and devices.
  • Model Definitions: Make sure you either have access to or create GNMI data models for the devices you're monitoring. These models define the structure and hierarchy of the data that is accessible through GNMI.
  • Data Subscription: GNMI allows for real-time updates through subscriptions. Set up subscriptions for relevant data points to receive continuous updates without frequent polling.
  • Streaming Mode: Use gNMI's streaming mode for efficient real-time data transfer.

Server monitoring

SNMP-based Server Hardware Monitoring:

  • Determine SNMP Compatibility: Before configuring SNMP monitoring, make sure you enable SNMP agents on your servers. Also, verify the compatibility of the SNMP version with your monitoring system.
  • Secure Configuration: Implement SNMPv3 to enhance security through authentication and encryption, protecting against unauthorized access and data interception.
  • Monitoring Parameters: It's important to monitor the CPU utilization to ensure that the performance is optimal and to identify any potential bottlenecks. Keeping track of the memory usage is crucial to prevent resource exhaustion. It's also important to check the network interface traffic to identify any bandwidth bottlenecks. Finally, monitoring the server temperatures and hardware health indicators can help detect any hardware issues.
  • SNMP Polling: SNMP polling should be set up regularly to collect data on critical parameters.- Thresholds and Alerts: Set Thresholds: Define appropriate thresholds for each monitored parameter. These thresholds determine when alerts should be triggered.
  • Data Retention and Trend Analysis: Retain historical SNMP data for trend analysis, capacity planning and performance identification.
  • Regular Review: It is important to regularly check the SNMP monitoring configurations and thresholds in order to ensure that they are appropriate for the environment. This helps to maintain proper alignment and accuracy in monitoring.

What NOT to do

Don’t use external firewalls, packet filters, rate limiters

Don’t block or interfere with any traffic to the node machines. This can disrupt node machine functionality. Occasionally ports are opened for incoming (and outgoing) connections when new versions of node software are deployed.

What about network security?

IC-OS manages its own software firewalls and rate limiters strictly and is designed with security as a primary principle.

Don't configure the switch to use LACP bonding

This feature is on the roadmap for investigation but IC nodes do not support LACP bonding at the moment. Configuring it on the switch may cause problems with nodes.

How DFINITY manages its servers

See reference DFINITY data center runbook.

Final Checklist

  • Did you deploy a 10G switch?
  • Is at least one 10G port on each server plugged into the 10G switch?
  • Do you have one IPv6 /64 prefix allocated from your ISP?
  • Do you have two IPv4 addresses allocated for each data center you plan to operate in?
  • Do you have one domain name for each node you plan to configure with an IPv4 address?
  • Does each node have ~300Mbps bandwidth?
  • Is your BMC inaccessible from the broad internet?

References