Difference between revisions of "Node Provider Networking Guide"
Gary.mcelroy (talk | contribs) (→Requirements: Distinguish Gen1 vs Gen2 ipv4 req's) |
Gary.mcelroy (talk | contribs) (→Appendix 1: Number of IPv4 Addresses Required: Clarify table - #'s imply nothing about ipv4 configuration) |
||
Line 151: | Line 151: | ||
==Appendix 1: Number of IPv4 Addresses Required == | ==Appendix 1: Number of IPv4 Addresses Required == | ||
− | '''(For Gen2 Node Providers - Gen1 Node Providers will receive different requirements)''' | + | '''(For Gen2 Node Providers - Gen1 Node Providers will receive different requirements)''' |
+ | |||
+ | This table refers to quantities - the same IPv4 address should not be reused on different nodes. | ||
{| class="wikitable" | {| class="wikitable" | ||
− | |''' | + | |'''How many nodes you have''' |
− | |''' | + | |'''IPv4 Addresses needed for the whole DC''' |
|- | |- | ||
|1 to 4 | |1 to 4 |
Revision as of 18:26, 20 February 2024
This guide is designed to provide an overview of the networking requirements and guide Node Providers through setting up their servers into a rack with functioning networking.
Configuring networks is not trivial. You should be familiar with IP networking, network equipment and network cabling.
Resources to learn about networking:
- CCNA Study Materials
- Kevin Wallace YouTube Training Videos
DFINITY does not provide support for network configuration.
If you hire technical assistance, keep decentralization and security in mind. Use a local technician you personally know and carefully monitor their work.
Requirements
To join your servers to the Internet Computer (IC) you will need:
- 10G Network equipment
- Rackspace in a data center
- Internet connection
- Bandwidth
- ~300Mbps per node
- Ingress/egress ratio is currently 1:1. We expect egress (serving responses to client queries) to increase faster than ingress in the future.
- This should guide how many servers to deploy and the appropriate ISP connection speed
- E.g. a 1Gbps connection will support up to 3 IC nodes.
- One IPv6 /64 subnet - each node gets multiple IPv6 addresses
- (For Gen2 Node Providers - Gen1 Node Providers will receive different requirements) One IPv4 address allocated for every 4 nodes in a given data center per node provider (IPv4 addresses cannot be shared between node providers). See Appendix 1 for table.
- Additionally, one domain name for each node configured with an IPv4 address. See Node Provider Domain Name Guide for details.
- All IP addresses are assigned statically and automatically by IC-OS
- This is configured in the IC-OS Installation Runbook
- Bandwidth
Network Cabling
When racking and stacking your servers, ensure the at least one 10G network port on each server is connected to the 10G switch. SFP+ and Ethernet are supported.
For example, on a Supermicro 1U server, the 10G ports are in a cluster as seen above. Vendors differ.
Connect the 10G switch to the ISP endpoint - this could be the Top Of Rack (TOR) switch or other box.
Network Configuration
Node machines require:
- The ability to acquire a public static IPv6 address on a /64 subnet
- An IPv6 gateway to communicate with other nodes on the broad internet
- Unfiltered internet access
(For Gen2 Node Providers - Gen1 Node Providers will receive different requirements) One of every four nodes requires:
- The ability to acquire a public static IPv4 address
- An IPv4 gateway to communicate with other nodes on the broad internet
- Unfiltered internet access
There are many many ways to configure the network and some details depend on the ISP and data center. Here are some Example Network Configuration Scenarios.
See the Node Provider Networking Troubleshooting Guide for help.
BMC Setup Recommendations
What’s a BMC?
The Baseboard Management Controller (BMC) grants control of the underlying server hardware.
BMC’s have notoriously poor security. Vendors may name their implementation differently (Dell -> iDRAC, HPE -> iLO, etc.).
Recommendations
Change the password
BMC’s usually come with a common password. Log in via crash cart, KVM or the web interface and change it to something strong.
No broad internet access
It is highly recommended: do not expose your BMC to the broad internet. This is a safety precaution against attackers.
Options:
- Don’t connect the BMC to the internet.
- Maintenance or node recovery will require physical access in this case.
- Any BMC activities occur via SSH on the host (unreliable on many mainboard vendors) or via crash cart.
- Connect the BMC to a separate dumb switch, and the dumb switch connects to a Rack Mounted Unit (RMU).
- Connect the BMC to a managed switch, and create a separate VLAN
This can get complicated. It’s outside the scope of this document to explain how to do this.
Resources:
- StackExchange - Best practice for accessing management port of firewall
- Supermicro Guidance
- Unicom Guidance
Network monitoring
SNMP-based Network Monitoring:
- Device Compatibility: Make sure your network devices support and enable SNMP agents. Choose the version of SNMP that aligns with your security needs as different versions offer varying security levels and functionality.
- Secure Configuration: Implement SNMPv3 to enhance security through authentication and encryption, protecting against unauthorized access and data interception.
- Monitoring Points: Select specific network parameters critical for performance. (such as bandwidth utilization, CPU usage, and memory usage). Set up SNMP polling for these parameters.
- Thresholds and Alerts: Predefine alerts when monitored parameters exceed limits to identify issues proactively and take corrective actions.
- Data Retention: Establish data retention policies for storing SNMP data for trend and capacity analysis.
- Regular Review: It is important to regularly review SNMP monitoring configurations and thresholds to ensure that they are up-to-date and aligned with the changing network environment..
GNMI/gRPC-based Network Monitoring:
- Protocol Familiarity: Get familiar with GNMI data models for your network devices and understand how they use gRPC (Remote Procedure Call) for network management.
- Device Support: Verify that your network devices support GNMI, which is more commonly found in modern networking equipment that supports programmability.
- Authentication and Encryption: Implement TLS for gRPC security to protect communication between the monitoring system and devices.
- Model Definitions: Make sure you either have access to or create GNMI data models for the devices you're monitoring. These models define the structure and hierarchy of the data that is accessible through GNMI.
- Data Subscription: GNMI allows for real-time updates through subscriptions. Set up subscriptions for relevant data points to receive continuous updates without frequent polling.
- Streaming Mode: Use gNMI's streaming mode for efficient real-time data transfer.
Server monitoring
SNMP-based Server Hardware Monitoring:
- Determine SNMP Compatibility: Before configuring SNMP monitoring, make sure you enable SNMP agents on your servers. Also, verify the compatibility of the SNMP version with your monitoring system.
- Secure Configuration: Implement SNMPv3 to enhance security through authentication and encryption, protecting against unauthorized access and data interception.
- Monitoring Parameters: It's important to monitor the CPU utilization to ensure that the performance is optimal and to identify any potential bottlenecks. Keeping track of the memory usage is crucial to prevent resource exhaustion. It's also important to check the network interface traffic to identify any bandwidth bottlenecks. Finally, monitoring the server temperatures and hardware health indicators can help detect any hardware issues.
- SNMP Polling: SNMP polling should be set up regularly to collect data on critical parameters.- Thresholds and Alerts: Set Thresholds: Define appropriate thresholds for each monitored parameter. These thresholds determine when alerts should be triggered.
- Data Retention and Trend Analysis: Retain historical SNMP data for trend analysis, capacity planning and performance identification.
- Regular Review: It is important to regularly check the SNMP monitoring configurations and thresholds in order to ensure that they are appropriate for the environment. This helps to maintain proper alignment and accuracy in monitoring.
What NOT to do
Don’t use external firewalls, packet filters, rate limiters
Don’t block or interfere with any traffic to the node machines. This can disrupt node machine functionality. Occasionally ports are opened for incoming (and outgoing) connections when new versions of node software are deployed.
What about network security?
IC-OS manages its own software firewalls and rate limiters strictly and is designed with security as a primary principle.
Don't configure the switch to use LACP bonding
This feature is on the roadmap for investigation but IC nodes do not support LACP bonding at the moment. Configuring it on the switch may cause problems with nodes.
How DFINITY manages its servers
See reference DFINITY data center runbook.
Final Checklist
- Did you deploy a 10G switch?
- Is at least one 10G port on each server plugged into the 10G switch?
- Do you have one IPv6 /64 prefix allocated from your ISP?
- (Gen2) Do you have at least one IPv4 address for every four nodes allocated?
- Does each node have ~300Mbps bandwidth?
- Is your BMC inaccessible from the broad internet?
References
- Gen2 Network Requirements - more detailed, possibly out of date.
Appendix 1: Number of IPv4 Addresses Required
(For Gen2 Node Providers - Gen1 Node Providers will receive different requirements)
This table refers to quantities - the same IPv4 address should not be reused on different nodes.
How many nodes you have | IPv4 Addresses needed for the whole DC |
1 to 4 | 1 |
5 to 8 | 2 |
9 to 12 | 3 |
13 to 16 | 4 |
17 to 20 | 5 |
21 to 24 | 6 |
25 to 28 | 7 |