Difference between revisions of "Node Provider Networking Guide"
Gary.mcelroy (talk | contribs) (→Appendix 1: Number of IPv4 Addresses Required: Clarify table - #'s imply nothing about ipv4 configuration) |
|||
(One intermediate revision by the same user not shown) | |||
Line 28: | Line 28: | ||
*** E.g. a 1Gbps connection will support up to 3 IC nodes. | *** E.g. a 1Gbps connection will support up to 3 IC nodes. | ||
** One IPv6 /64 subnet - each node gets multiple IPv6 addresses | ** One IPv6 /64 subnet - each node gets multiple IPv6 addresses | ||
− | ** | + | ** Two IPv4 addresses per data center - All Node Providers are requested to deploy two nodes with IPv4 for every data center they operate in. Node Providers should deploy IPv4 to the first two nodes in their first rack. |
***Additionally, one domain name for each node configured with an IPv4 address. See [[Node Provider Domain Name Guide]] for details. | ***Additionally, one domain name for each node configured with an IPv4 address. See [[Node Provider Domain Name Guide]] for details. | ||
**'''All IP addresses are assigned statically''' and automatically by IC-OS | **'''All IP addresses are assigned statically''' and automatically by IC-OS | ||
Line 49: | Line 49: | ||
*Unfiltered internet access | *Unfiltered internet access | ||
− | + | Two nodes per data center require: | |
− | |||
*The ability to acquire a public static IPv4 address | *The ability to acquire a public static IPv4 address | ||
− | * An IPv4 gateway to communicate with other nodes on the broad internet | + | *An IPv4 gateway to communicate with other nodes on the broad internet |
*Unfiltered internet access | *Unfiltered internet access | ||
+ | |||
+ | ''Note: IPv4 should be deployed to the first two nodes in the first rack.'' | ||
Line 68: | Line 69: | ||
BMC’s have notoriously poor security. Vendors may name their implementation differently (Dell -> iDRAC, HPE -> iLO, etc.). | BMC’s have notoriously poor security. Vendors may name their implementation differently (Dell -> iDRAC, HPE -> iLO, etc.). | ||
− | === Recommendations=== | + | ===Recommendations=== |
====Change the password==== | ====Change the password==== | ||
BMC’s usually come with a common password. Log in via crash cart, KVM or the web interface and change it to something [https://krebsonsecurity.com/password-dos-and-donts/ strong]. | BMC’s usually come with a common password. Log in via crash cart, KVM or the web interface and change it to something [https://krebsonsecurity.com/password-dos-and-donts/ strong]. | ||
− | ====No broad internet access==== | + | ==== No broad internet access==== |
It is highly recommended: '''do not expose your BMC''' to the broad internet. This is a safety precaution against attackers. | It is highly recommended: '''do not expose your BMC''' to the broad internet. This is a safety precaution against attackers. | ||
Options: | Options: | ||
− | *Don’t connect the BMC to the internet. | + | * Don’t connect the BMC to the internet. |
**Maintenance or node recovery will require physical access in this case. | **Maintenance or node recovery will require physical access in this case. | ||
**Any BMC activities occur via SSH on the host (unreliable on many mainboard vendors) or via crash cart. | **Any BMC activities occur via SSH on the host (unreliable on many mainboard vendors) or via crash cart. | ||
− | * Connect the BMC to a separate dumb switch, and the dumb switch connects to a Rack Mounted Unit (RMU). | + | *Connect the BMC to a separate dumb switch, and the dumb switch connects to a Rack Mounted Unit (RMU). |
*Connect the BMC to a managed switch, and create a separate VLAN | *Connect the BMC to a managed switch, and create a separate VLAN | ||
Line 92: | Line 93: | ||
*[https://www.unicomengineering.com/blog/ipmi-best-practices/ Unicom Guidance] | *[https://www.unicomengineering.com/blog/ipmi-best-practices/ Unicom Guidance] | ||
− | == Network monitoring== | + | ==Network monitoring== |
− | ===SNMP-based Network Monitoring: === | + | ===SNMP-based Network Monitoring:=== |
− | * Device Compatibility: Make sure your network devices support and enable SNMP agents. Choose the version of SNMP that aligns with your security needs as different versions offer varying security levels and functionality. | + | *Device Compatibility: Make sure your network devices support and enable SNMP agents. Choose the version of SNMP that aligns with your security needs as different versions offer varying security levels and functionality. |
*Secure Configuration: Implement SNMPv3 to enhance security through authentication and encryption, protecting against unauthorized access and data interception. | *Secure Configuration: Implement SNMPv3 to enhance security through authentication and encryption, protecting against unauthorized access and data interception. | ||
*Monitoring Points: Select specific network parameters critical for performance. (such as bandwidth utilization, CPU usage, and memory usage). Set up SNMP polling for these parameters. | *Monitoring Points: Select specific network parameters critical for performance. (such as bandwidth utilization, CPU usage, and memory usage). Set up SNMP polling for these parameters. | ||
*Thresholds and Alerts: Predefine alerts when monitored parameters exceed limits to identify issues proactively and take corrective actions. | *Thresholds and Alerts: Predefine alerts when monitored parameters exceed limits to identify issues proactively and take corrective actions. | ||
− | * Data Retention: Establish data retention policies for storing SNMP data for trend and capacity analysis. | + | *Data Retention: Establish data retention policies for storing SNMP data for trend and capacity analysis. |
*Regular Review: It is important to regularly review SNMP monitoring configurations and thresholds to ensure that they are up-to-date and aligned with the changing network environment.. | *Regular Review: It is important to regularly review SNMP monitoring configurations and thresholds to ensure that they are up-to-date and aligned with the changing network environment.. | ||
Line 106: | Line 107: | ||
*Protocol Familiarity: Get familiar with GNMI data models for your network devices and understand how they use gRPC (Remote Procedure Call) for network management. | *Protocol Familiarity: Get familiar with GNMI data models for your network devices and understand how they use gRPC (Remote Procedure Call) for network management. | ||
− | * Device Support: Verify that your network devices support GNMI, which is more commonly found in modern networking equipment that supports programmability. | + | *Device Support: Verify that your network devices support GNMI, which is more commonly found in modern networking equipment that supports programmability. |
*Authentication and Encryption: Implement TLS for gRPC security to protect communication between the monitoring system and devices. | *Authentication and Encryption: Implement TLS for gRPC security to protect communication between the monitoring system and devices. | ||
*Model Definitions: Make sure you either have access to or create GNMI data models for the devices you're monitoring. These models define the structure and hierarchy of the data that is accessible through GNMI. | *Model Definitions: Make sure you either have access to or create GNMI data models for the devices you're monitoring. These models define the structure and hierarchy of the data that is accessible through GNMI. | ||
Line 112: | Line 113: | ||
*Streaming Mode: Use gNMI's streaming mode for efficient real-time data transfer. | *Streaming Mode: Use gNMI's streaming mode for efficient real-time data transfer. | ||
− | == Server monitoring== | + | ==Server monitoring== |
===SNMP-based Server Hardware Monitoring:=== | ===SNMP-based Server Hardware Monitoring:=== | ||
Line 118: | Line 119: | ||
*Determine SNMP Compatibility: Before configuring SNMP monitoring, make sure you enable SNMP agents on your servers. Also, verify the compatibility of the SNMP version with your monitoring system. | *Determine SNMP Compatibility: Before configuring SNMP monitoring, make sure you enable SNMP agents on your servers. Also, verify the compatibility of the SNMP version with your monitoring system. | ||
*Secure Configuration: Implement SNMPv3 to enhance security through authentication and encryption, protecting against unauthorized access and data interception. | *Secure Configuration: Implement SNMPv3 to enhance security through authentication and encryption, protecting against unauthorized access and data interception. | ||
− | * Monitoring Parameters: It's important to monitor the CPU utilization to ensure that the performance is optimal and to identify any potential bottlenecks. Keeping track of the memory usage is crucial to prevent resource exhaustion. It's also important to check the network interface traffic to identify any bandwidth bottlenecks. Finally, monitoring the server temperatures and hardware health indicators can help detect any hardware issues. | + | *Monitoring Parameters: It's important to monitor the CPU utilization to ensure that the performance is optimal and to identify any potential bottlenecks. Keeping track of the memory usage is crucial to prevent resource exhaustion. It's also important to check the network interface traffic to identify any bandwidth bottlenecks. Finally, monitoring the server temperatures and hardware health indicators can help detect any hardware issues. |
*SNMP Polling: SNMP polling should be set up regularly to collect data on critical parameters.- Thresholds and Alerts: Set Thresholds: Define appropriate thresholds for each monitored parameter. These thresholds determine when alerts should be triggered. | *SNMP Polling: SNMP polling should be set up regularly to collect data on critical parameters.- Thresholds and Alerts: Set Thresholds: Define appropriate thresholds for each monitored parameter. These thresholds determine when alerts should be triggered. | ||
*Data Retention and Trend Analysis: Retain historical SNMP data for trend analysis, capacity planning and performance identification. | *Data Retention and Trend Analysis: Retain historical SNMP data for trend analysis, capacity planning and performance identification. | ||
Line 137: | Line 138: | ||
See reference DFINITY [[Gen-2 Data Center runbook|data center runbook]]. | See reference DFINITY [[Gen-2 Data Center runbook|data center runbook]]. | ||
− | == Final Checklist== | + | ==Final Checklist== |
*Did you deploy a 10G switch? | *Did you deploy a 10G switch? | ||
*Is at least '''one 10G port''' on each server plugged into the 10G switch? | *Is at least '''one 10G port''' on each server plugged into the 10G switch? | ||
* Do you have '''one IPv6 /64 prefix''' allocated from your ISP? | * Do you have '''one IPv6 /64 prefix''' allocated from your ISP? | ||
− | *''' | + | *Do you have '''two IPv4 addresses allocated for each data center you plan to operate in'''? |
− | *Does each node have ~300Mbps bandwidth? | + | * Do you have '''one domain name for each node you plan to configure with an IPv4 address'''? |
+ | * Does each node have ~300Mbps bandwidth? | ||
*Is your '''BMC inaccessible''' from the broad internet? | *Is your '''BMC inaccessible''' from the broad internet? | ||
− | ==References== | + | == References== |
*[[Gen-2 Network Requirements|Gen2 Network Requirements]] - more detailed, possibly out of date. | *[[Gen-2 Network Requirements|Gen2 Network Requirements]] - more detailed, possibly out of date. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Latest revision as of 15:37, 23 February 2024
This guide is designed to provide an overview of the networking requirements and guide Node Providers through setting up their servers into a rack with functioning networking.
Configuring networks is not trivial. You should be familiar with IP networking, network equipment and network cabling.
Resources to learn about networking:
- CCNA Study Materials
- Kevin Wallace YouTube Training Videos
DFINITY does not provide support for network configuration.
If you hire technical assistance, keep decentralization and security in mind. Use a local technician you personally know and carefully monitor their work.
Requirements
To join your servers to the Internet Computer (IC) you will need:
- 10G Network equipment
- Rackspace in a data center
- Internet connection
- Bandwidth
- ~300Mbps per node
- Ingress/egress ratio is currently 1:1. We expect egress (serving responses to client queries) to increase faster than ingress in the future.
- This should guide how many servers to deploy and the appropriate ISP connection speed
- E.g. a 1Gbps connection will support up to 3 IC nodes.
- One IPv6 /64 subnet - each node gets multiple IPv6 addresses
- Two IPv4 addresses per data center - All Node Providers are requested to deploy two nodes with IPv4 for every data center they operate in. Node Providers should deploy IPv4 to the first two nodes in their first rack.
- Additionally, one domain name for each node configured with an IPv4 address. See Node Provider Domain Name Guide for details.
- All IP addresses are assigned statically and automatically by IC-OS
- This is configured in the IC-OS Installation Runbook
- Bandwidth
Network Cabling
When racking and stacking your servers, ensure the at least one 10G network port on each server is connected to the 10G switch. SFP+ and Ethernet are supported.
For example, on a Supermicro 1U server, the 10G ports are in a cluster as seen above. Vendors differ.
Connect the 10G switch to the ISP endpoint - this could be the Top Of Rack (TOR) switch or other box.
Network Configuration
Node machines require:
- The ability to acquire a public static IPv6 address on a /64 subnet
- An IPv6 gateway to communicate with other nodes on the broad internet
- Unfiltered internet access
Two nodes per data center require:
- The ability to acquire a public static IPv4 address
- An IPv4 gateway to communicate with other nodes on the broad internet
- Unfiltered internet access
Note: IPv4 should be deployed to the first two nodes in the first rack.
There are many many ways to configure the network and some details depend on the ISP and data center. Here are some Example Network Configuration Scenarios.
See the Node Provider Networking Troubleshooting Guide for help.
BMC Setup Recommendations
What’s a BMC?
The Baseboard Management Controller (BMC) grants control of the underlying server hardware.
BMC’s have notoriously poor security. Vendors may name their implementation differently (Dell -> iDRAC, HPE -> iLO, etc.).
Recommendations
Change the password
BMC’s usually come with a common password. Log in via crash cart, KVM or the web interface and change it to something strong.
No broad internet access
It is highly recommended: do not expose your BMC to the broad internet. This is a safety precaution against attackers.
Options:
- Don’t connect the BMC to the internet.
- Maintenance or node recovery will require physical access in this case.
- Any BMC activities occur via SSH on the host (unreliable on many mainboard vendors) or via crash cart.
- Connect the BMC to a separate dumb switch, and the dumb switch connects to a Rack Mounted Unit (RMU).
- Connect the BMC to a managed switch, and create a separate VLAN
This can get complicated. It’s outside the scope of this document to explain how to do this.
Resources:
- StackExchange - Best practice for accessing management port of firewall
- Supermicro Guidance
- Unicom Guidance
Network monitoring
SNMP-based Network Monitoring:
- Device Compatibility: Make sure your network devices support and enable SNMP agents. Choose the version of SNMP that aligns with your security needs as different versions offer varying security levels and functionality.
- Secure Configuration: Implement SNMPv3 to enhance security through authentication and encryption, protecting against unauthorized access and data interception.
- Monitoring Points: Select specific network parameters critical for performance. (such as bandwidth utilization, CPU usage, and memory usage). Set up SNMP polling for these parameters.
- Thresholds and Alerts: Predefine alerts when monitored parameters exceed limits to identify issues proactively and take corrective actions.
- Data Retention: Establish data retention policies for storing SNMP data for trend and capacity analysis.
- Regular Review: It is important to regularly review SNMP monitoring configurations and thresholds to ensure that they are up-to-date and aligned with the changing network environment..
GNMI/gRPC-based Network Monitoring:
- Protocol Familiarity: Get familiar with GNMI data models for your network devices and understand how they use gRPC (Remote Procedure Call) for network management.
- Device Support: Verify that your network devices support GNMI, which is more commonly found in modern networking equipment that supports programmability.
- Authentication and Encryption: Implement TLS for gRPC security to protect communication between the monitoring system and devices.
- Model Definitions: Make sure you either have access to or create GNMI data models for the devices you're monitoring. These models define the structure and hierarchy of the data that is accessible through GNMI.
- Data Subscription: GNMI allows for real-time updates through subscriptions. Set up subscriptions for relevant data points to receive continuous updates without frequent polling.
- Streaming Mode: Use gNMI's streaming mode for efficient real-time data transfer.
Server monitoring
SNMP-based Server Hardware Monitoring:
- Determine SNMP Compatibility: Before configuring SNMP monitoring, make sure you enable SNMP agents on your servers. Also, verify the compatibility of the SNMP version with your monitoring system.
- Secure Configuration: Implement SNMPv3 to enhance security through authentication and encryption, protecting against unauthorized access and data interception.
- Monitoring Parameters: It's important to monitor the CPU utilization to ensure that the performance is optimal and to identify any potential bottlenecks. Keeping track of the memory usage is crucial to prevent resource exhaustion. It's also important to check the network interface traffic to identify any bandwidth bottlenecks. Finally, monitoring the server temperatures and hardware health indicators can help detect any hardware issues.
- SNMP Polling: SNMP polling should be set up regularly to collect data on critical parameters.- Thresholds and Alerts: Set Thresholds: Define appropriate thresholds for each monitored parameter. These thresholds determine when alerts should be triggered.
- Data Retention and Trend Analysis: Retain historical SNMP data for trend analysis, capacity planning and performance identification.
- Regular Review: It is important to regularly check the SNMP monitoring configurations and thresholds in order to ensure that they are appropriate for the environment. This helps to maintain proper alignment and accuracy in monitoring.
What NOT to do
Don’t use external firewalls, packet filters, rate limiters
Don’t block or interfere with any traffic to the node machines. This can disrupt node machine functionality. Occasionally ports are opened for incoming (and outgoing) connections when new versions of node software are deployed.
What about network security?
IC-OS manages its own software firewalls and rate limiters strictly and is designed with security as a primary principle.
Don't configure the switch to use LACP bonding
This feature is on the roadmap for investigation but IC nodes do not support LACP bonding at the moment. Configuring it on the switch may cause problems with nodes.
How DFINITY manages its servers
See reference DFINITY data center runbook.
Final Checklist
- Did you deploy a 10G switch?
- Is at least one 10G port on each server plugged into the 10G switch?
- Do you have one IPv6 /64 prefix allocated from your ISP?
- Do you have two IPv4 addresses allocated for each data center you plan to operate in?
- Do you have one domain name for each node you plan to configure with an IPv4 address?
- Does each node have ~300Mbps bandwidth?
- Is your BMC inaccessible from the broad internet?
References
- Gen2 Network Requirements - more detailed, possibly out of date.