Difference between revisions of "Validation of Candidate Node Machines"
Sven.fischer (talk | contribs) (First Edit. Validation of Candidate Node Machines) |
(minor grammar fixes) |
||
(7 intermediate revisions by 3 users not shown) | |||
Line 3: | Line 3: | ||
In order to improve the decentralization network, an optimization model has been proposed on the forum (see [https://forum.dfinity.org/t/ic-topology-series-node-diversification-part-i/23402 node diversification part 1] and [https://forum.dfinity.org/t/ic-topology-node-diversification-part-ii/23553 node diversification part 2]) and also approved (see [https://dashboard.internetcomputer.org/proposal/125367 proposal 125367]). Given a certain target topology, the model optimizes between node rewards (onboarding of additional new nodes and rewards for existing nodes) and decentralization, calculating the minimum number of additional node machines required in order to achieve specific decentralization targets. | In order to improve the decentralization network, an optimization model has been proposed on the forum (see [https://forum.dfinity.org/t/ic-topology-series-node-diversification-part-i/23402 node diversification part 1] and [https://forum.dfinity.org/t/ic-topology-node-diversification-part-ii/23553 node diversification part 2]) and also approved (see [https://dashboard.internetcomputer.org/proposal/125367 proposal 125367]). Given a certain target topology, the model optimizes between node rewards (onboarding of additional new nodes and rewards for existing nodes) and decentralization, calculating the minimum number of additional node machines required in order to achieve specific decentralization targets. | ||
− | The basis for the optimization model is a target IC topology for the next 6 to | + | The basis for the optimization model is a target IC topology for the next 6 to 12 months which may extend into Q1 2025 for contracts ending for Gen1 node machines. This target topology has been proposed on the forum in [https://forum.dfinity.org/t/ic-topology-node-diversification-part-ii/23553 node diversification part 2], and approved by the community in [https://dashboard.internetcomputer.org/proposal/125549 proposal 125549] on 12th November 2023. This model sets targets for the number of Gen1 nodes and Gen2 nodes per subnet and the decentralization coefficients (Nakamato coefficients) per subnet. |
+ | [[File:Validation Candidate Nodes - figure 1.png|center|thumb|800x800px|Validation of Candidate Node Machines - figure 1|alt=]] | ||
− | Running the optimization tool with the current topology will produce a graph like above, the red blocks showing the number of additional required node machines in order to reach the decentralization targets set in the IC Target Topology. The number of additional node machines required to reach the decentralization targets is visible in the <code>ObjectiveValue</code> | + | Running the optimization tool with the current topology will produce a graph like above, with the red blocks showing the number of additional required node machines in order to reach the decentralization targets set in the IC Target Topology. The number of additional node machines required to reach the decentralization targets is visible in the <code>ObjectiveValue</code> example graph is 68 additional Gen 2 node machines. |
With the optimization model and the IC Target Topology, the goal is to implement a transparent, objective and reproducible approach for node provider onboarding. Running the model with the existing IC topology and the number of new node machines intended to be onboarded, the model allows the proposer: | With the optimization model and the IC Target Topology, the goal is to implement a transparent, objective and reproducible approach for node provider onboarding. Running the model with the existing IC topology and the number of new node machines intended to be onboarded, the model allows the proposer: | ||
Line 20: | Line 21: | ||
If target IC Topology is reached - meaning the <code>Objective Value</code> in the optimization model has reached the value of zero, no new node machines should be onboarded. Hence, NNS proposals for increasing the Node Allowance or defining a new Node Allowance should be rejected by the community. Once a new IC Target topology is defined or (other) Node Providers have reduced their Node Allowance, the <code>Objective Value</code> might increase again which would allow for new proposals to be submitted and approved. | If target IC Topology is reached - meaning the <code>Objective Value</code> in the optimization model has reached the value of zero, no new node machines should be onboarded. Hence, NNS proposals for increasing the Node Allowance or defining a new Node Allowance should be rejected by the community. Once a new IC Target topology is defined or (other) Node Providers have reduced their Node Allowance, the <code>Objective Value</code> might increase again which would allow for new proposals to be submitted and approved. | ||
− | |||
− | |||
== Steps to follow to assess node relevance == | == Steps to follow to assess node relevance == | ||
Assessing the relevance of adding new node machines using the optimization tool requires you to follow four basic steps: | Assessing the relevance of adding new node machines using the optimization tool requires you to follow four basic steps: | ||
− | # Installing and running the latest version of the optimization tool | + | # Installing and running the latest version of the optimization tool. |
− | # Determining your candidate node configuration | + | # Determining your candidate node configuration. |
− | # Updating your candidate nodes in the optimization tooling | + | # Updating your candidate nodes in the optimization tooling. |
# If adding new node machines is relevant, submitting the Node Operator proposal with the necessary background description. | # If adding new node machines is relevant, submitting the Node Operator proposal with the necessary background description. | ||
Each of these steps is discussed in detail below. | Each of these steps is discussed in detail below. | ||
− | === Step 1 | + | === Step 1: Downloading and installing the optimization tool === |
The optimization tool can be found in the following Github repository: https://github.com/dfinity/decentralization/. The repository is open sourced to allow the community to help improve the tooling with additional functionality and visualizations. | The optimization tool can be found in the following Github repository: https://github.com/dfinity/decentralization/. The repository is open sourced to allow the community to help improve the tooling with additional functionality and visualizations. | ||
Line 48: | Line 47: | ||
*** You can subsequently rerun the model and adjust parameters using the interface of Jupyter notebook. | *** You can subsequently rerun the model and adjust parameters using the interface of Jupyter notebook. | ||
* Carefully read the forum posts on [https://forum.dfinity.org/t/ic-topology-series-node-diversification-part-i/23402 node diversification part 1] and [https://forum.dfinity.org/t/ic-topology-node-diversification-part-ii/23553 node diversification part 2] to understand the working of the tooling. | * Carefully read the forum posts on [https://forum.dfinity.org/t/ic-topology-series-node-diversification-part-i/23402 node diversification part 1] and [https://forum.dfinity.org/t/ic-topology-node-diversification-part-ii/23553 node diversification part 2] to understand the working of the tooling. | ||
− | * Check the outcome of the model without making any updates to the model, in particular the <code>ObjectiveValue</code> as described above. As per 23 | + | * Check the outcome of the model without making any updates to the model, in particular the <code>ObjectiveValue</code> as described above. As per 23 November 2023, the Objective Value should be similar to the graph shown above, with an additional 72 nodes needed to reach the target topology. |
− | === Step 2 | + | === Step 2 : Determine your candidate node configuration === |
− | The | + | The Internet Computer dashboard shows the worldwide distribution of node machines. Based on this distribution, as a potential new Node Provider you can identify potential locations and data center providers for new node machines. |
For the Node Provider economics and understanding the costs and rewards for running a node machine, please look at the detailed NP documentation on the Internet Computer wiki pages, in particular those on [[Node Provider Machine Hardware Guide|hardware configuration]], [[Node Provider Data Center and ISP Guide|data center requirements]], and [[Node Provider Remuneration|node rewards]]. | For the Node Provider economics and understanding the costs and rewards for running a node machine, please look at the detailed NP documentation on the Internet Computer wiki pages, in particular those on [[Node Provider Machine Hardware Guide|hardware configuration]], [[Node Provider Data Center and ISP Guide|data center requirements]], and [[Node Provider Remuneration|node rewards]]. | ||
Line 64: | Line 63: | ||
** The number of nodes you intend to onboard | ** The number of nodes you intend to onboard | ||
− | === Step 3 | + | === Step 3: Update the candidate node configuration in the optimization tooling === |
The optimization tool includes a function that allows candidate nodes to be added to the node configuration, and to run the optimization with including these candidate nodes. With the information collected in step 2, the following steps allow for the optimization tool to be run including the candidate nodes: | The optimization tool includes a function that allows candidate nodes to be added to the node configuration, and to run the optimization with including these candidate nodes. With the information collected in step 2, the following steps allow for the optimization tool to be run including the candidate nodes: | ||
* Updated the latest topology in the optimization tooling | * Updated the latest topology in the optimization tooling | ||
− | ** Find the df_candidate_nodes function in the <code>main.py</code> of the ic_topology directory. | + | ** Find the <code>df_candidate_nodes</code> function in the <code>main.py</code> of the <code>ic_topology</code> directory. |
** You will see the following example entry: <code>df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi',data_center ='Buenos Aires',data_center_provider ='Perron Corporation',country = 'AR',is_sev = True,no_nodes = 0)</code> | ** You will see the following example entry: <code>df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi',data_center ='Buenos Aires',data_center_provider ='Perron Corporation',country = 'AR',is_sev = True,no_nodes = 0)</code> | ||
− | ** Replace ‘Lionel Messi’ with the Node Provider name | + | ** Replace ‘Lionel Messi’ with the Node Provider name. |
− | ** Replace ‘Buenos Aires’ with the data center name | + | ** Replace ‘Buenos Aires’ with the data center name. |
− | ** Replace ‘Perron Corporation’ with the data center provider name | + | ** Replace ‘Perron Corporation’ with the data center provider name. |
− | ** Replace ‘AR’ with the country name | + | ** Replace ‘AR’ with the country name. |
** Replace “0” in no_nodes with the number of nodes intended to be onboarded. | ** Replace “0” in no_nodes with the number of nodes intended to be onboarded. | ||
* Run the model and determine the <code>ObjectiveValue</code>: | * Run the model and determine the <code>ObjectiveValue</code>: | ||
Line 79: | Line 78: | ||
** If the ObjectiveValue is the same compared to the <code>ObjectiveValue</code> without making changes to the df_candidate_nodes, it means that adding one or more new nodes does not improve the decentralization of the IC. | ** If the ObjectiveValue is the same compared to the <code>ObjectiveValue</code> without making changes to the df_candidate_nodes, it means that adding one or more new nodes does not improve the decentralization of the IC. | ||
− | === Step 4 | + | === Step 4: Submitting a Node Operator proposal === |
If the conclusion from step 3 is that adding node machines helps decentralization of the IC network, the final step is to prepare to submit a proposal for onboarding these nodes. Following are the steps to take in order to submit this proposal: | If the conclusion from step 3 is that adding node machines helps decentralization of the IC network, the final step is to prepare to submit a proposal for onboarding these nodes. Following are the steps to take in order to submit this proposal: | ||
Line 85: | Line 84: | ||
* If you are intending to onboard node machines in a new data center, follow the subsequent steps on the same [[Node Provider Onboarding|NP onboarding wiki page]] to submit a Data Center Proposal. | * If you are intending to onboard node machines in a new data center, follow the subsequent steps on the same [[Node Provider Onboarding|NP onboarding wiki page]] to submit a Data Center Proposal. | ||
* Follow the steps on the same [[Node Provider Onboarding|NP onboarding wiki page]] to submit a Node Operator Proposal. The Node Operator Proposal should include the following input in the summary text: | * Follow the steps on the same [[Node Provider Onboarding|NP onboarding wiki page]] to submit a Node Operator Proposal. The Node Operator Proposal should include the following input in the summary text: | ||
− | ** The number of nodes planned to be onboarded | + | ** The number of nodes planned to be onboarded. |
− | ** Node provider name (as per the public dashboard) | + | ** Node provider name (as per the public dashboard). |
− | ** The data center and country where nodes are to be onboarded | + | ** The data center and country where nodes are to be onboarded. |
− | ** Evidence of running the optimization model and providing reproducible feedback on | + | ** Evidence of running the optimization model and providing reproducible feedback on decrease in <code>ObjectiveValue</code>. This can be done in the following way: |
*** Upload a pdf with screenshots of running the optimization model to the wiki, on the page of your specific Node Provider self-declaration documentation. | *** Upload a pdf with screenshots of running the optimization model to the wiki, on the page of your specific Node Provider self-declaration documentation. | ||
*** Include the link to this pdf document and the hash of this document in the Node Operator Proposal. | *** Include the link to this pdf document and the hash of this document in the Node Operator Proposal. | ||
− | *** Include/describe exactly the used input data and config the optimization model was run with. | + | *** Include/describe exactly the used input data and config the optimization model was run with. |
− | *** Clearly state the ObjectiveValue before adding the candidate nodes | + | *** Clearly state the <code>ObjectiveValue</code> before adding the candidate nodes. |
− | *** Clearly state the ObjectiveValue after adding the candidate nodes | + | *** Clearly state the <code>ObjectiveValue</code> after adding the candidate nodes. |
* Complete the steps of the [[Node Provider Onboarding|NP onboarding wiki page]]. | * Complete the steps of the [[Node Provider Onboarding|NP onboarding wiki page]]. | ||
== Example == | == Example == | ||
− | + | The example below shows how to use the optimization model to validate whether candidate nodes improve the decentralization of the IC network. In the optimization model code in the Github library, you will find a function <code>df_candidate_nodes</code> that is defined as follows: | |
df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi', data_center ='Buenos Aires', data_center_provider ='Perron Corporation', country = 'AR', is_sev = True, no_nodes = 0) | df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi', data_center ='Buenos Aires', data_center_provider ='Perron Corporation', country = 'AR', is_sev = True, no_nodes = 0) | ||
− | This function uses a hypothetical new Node Provider, called Lionel Messi, that is interested in setting up new node machines in the Buenos Aires data center. Lionel Messi uses the data center entity Perron Corporation as the data center name. Note that all nodes are marked as | + | This function uses a hypothetical new Node Provider, called Lionel Messi, that is interested in setting up new node machines in the Buenos Aires data center. Lionel Messi uses the data center entity Perron Corporation as the data center name. Note that all nodes are marked as <code>is_sev = True</code> which means that Gen2 node machines (with SEV SNP enabled) will be set up, and that <code>no_nodes = 0</code> so when running the current optimization tooling zero nodes will be added; the optimization tool will only use the existing IC topology to calculate the number of required Gen2 Nodes, defined as <code>ObjectiveValue</code>. |
− | |||
− | no_nodes = 0 so when running the current optimization tooling zero nodes will be added; the optimization tool will only use the existing IC topology to calculate the number of required Gen2 Nodes, defined as <code>ObjectiveValue</code>. | ||
Now let’s add one node machine to the candidate node list of Lionel Messi. For this, the function is updated as follows: | Now let’s add one node machine to the candidate node list of Lionel Messi. For this, the function is updated as follows: | ||
df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi', data_center ='Buenos Aires', data_center_provider ='Perron Corporation', country = 'AR', is_sev = True, no_nodes = 1) | df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi', data_center ='Buenos Aires', data_center_provider ='Perron Corporation', country = 'AR', is_sev = True, no_nodes = 1) | ||
− | Re-running the optimization model will generate (amongst others) the following output. As can be seen from the Node Provider topology Matrix, Lionel Messi is added as a node Provider with one node (which is the top orange block in the graph, and the bottom name in the Node Provider | + | Re-running the optimization model will generate (amongst others) the following output. As can be seen from the Node Provider topology Matrix, Lionel Messi is added as a node Provider with one node (which is the top orange block in the graph, and the bottom name in the Node Provider legend on the right). |
+ | [[File:Validation of Candidate Node Machines - figure 2.png|alt=Validation of Candidate Node Machines - figure 2|center|thumb|800x800px|Validation of Candidate Node Machines - figure 2]] | ||
− | The output also shows the data center being added to the Data Center topology. Again, the node is shown on the top (as the block block on second to the top) and the data center name is shown at the bottom of the data center | + | The output also shows the data center being added to the Data Center topology. Again, the node is shown on the top (as the block block on second to the top) and the data center name is shown at the bottom of the data center legend as Perron Corporation. |
+ | [[File:Candidate Node Machine Validation - figure 3.png|center|thumb|800x800px|Validation of Candidate Node Machines - figure 3|alt=]] | ||
More importantly, the node allocation per subnet is now showing an updated <code>ObjectiveValue</code> with a value of 67, hence 67 more Gen2 node machines are required to reach the IC target topology. Compared to running the model without Lionel Messi’s candidate node machine, the <code>ObjectiveValue</code> is reduced from 68 (see beginning of this article) to 67, hence adding one extra node machine in Buenos Aires support the decentralization of the IC network. | More importantly, the node allocation per subnet is now showing an updated <code>ObjectiveValue</code> with a value of 67, hence 67 more Gen2 node machines are required to reach the IC target topology. Compared to running the model without Lionel Messi’s candidate node machine, the <code>ObjectiveValue</code> is reduced from 68 (see beginning of this article) to 67, hence adding one extra node machine in Buenos Aires support the decentralization of the IC network. | ||
− | It would be interesting to validate whether adding additional node machines in Buenos Aires further support the decentralization of the IC. Let’s increase the number of candidate nodes from 1 to 4 by | + | [[File:Validation of Canidate Node Machines - figure 4.png|center|thumb|800x800px|Validation of Candidate Node Machines - figure 4|alt=]] |
+ | |||
+ | It would be interesting to validate whether adding additional node machines in Buenos Aires further support the decentralization of the IC. Let’s increase the number of candidate nodes from 1 to 4 by updating the <code>df_candidate_nodes</code> function as follows: | ||
df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi', data_center ='Buenos Aires', data_center_provider ='Perron Corporation', country = 'AR', is_sev = True, no_nodes = 4) | df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi', data_center ='Buenos Aires', data_center_provider ='Perron Corporation', country = 'AR', is_sev = True, no_nodes = 4) | ||
− | The node allocation per subnet now shows an <code>ObjectiveValue</code> with a value of 64, hence 64 more Gen2 node machines are required to reach the decentralization targets of the IC network. This is reduction from the original 68 required node machines with 4 | + | The node allocation per subnet now shows an <code>ObjectiveValue</code> with a value of 64, hence 64 more Gen2 node machines are required to reach the decentralization targets of the IC network. This is a reduction from the original 68 required node machines with 4 nodes. Hence, the model shows that adding 4 candidate nodes in Buenos Aires still adds to the decentralization of the IC network. |
− | |||
− | |||
+ | Further increasing the number of candidate nodes to 6 shows that decentralization of the IC network still improves. By adding 6 candidate nodes through the updated <code>df_candidate_nodes</code> function, the <code>ObjectiveValue</code> reduces to 62 from 68. | ||
+ | [[File:Validation of Candidate node Machines - figure 5.png|center|thumb|800x800px|Validation of Candidate Node Machines - figure 5]] | ||
Similarly, adding 8 candidate node machines still improves decentralization, as the <code>ObjectiveValue</code> reduced to 60. | Similarly, adding 8 candidate node machines still improves decentralization, as the <code>ObjectiveValue</code> reduced to 60. | ||
df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi', data_center ='Buenos Aires', data_center_provider ='Perron Corporation', country = 'AR', is_sev = True, no_nodes = 8) | df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi', data_center ='Buenos Aires', data_center_provider ='Perron Corporation', country = 'AR', is_sev = True, no_nodes = 8) | ||
+ | [[File:Validation of Candidate node Machines - figure 6.png|center|thumb|800x800px|Validation of Candidate Node Machines - figure 6]] | ||
However, if we increase the number of candidate node machines in Buenos Aires from 8 to 9, we notice that the <code>ObjectiveValue</code> does not reduce anymore. Hence, adding 8 node machines in Buenos Aires is the optimal solution for improving the decentralization targets. | However, if we increase the number of candidate node machines in Buenos Aires from 8 to 9, we notice that the <code>ObjectiveValue</code> does not reduce anymore. Hence, adding 8 node machines in Buenos Aires is the optimal solution for improving the decentralization targets. | ||
df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi', data_center ='Buenos Aires', data_center_provider ='Perron Corporation', country = 'AR', is_sev = True, no_nodes = 9) | df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi', data_center ='Buenos Aires', data_center_provider ='Perron Corporation', country = 'AR', is_sev = True, no_nodes = 9) | ||
+ | [[File:Validation of Candidate Node Machines - figure 7.png|center|thumb|800x800px|Validation of Candidate Node Machines - figure 7]] | ||
+ | |||
+ | |||
+ | |||
This can be explained by the fact that most subnets (marked in blue in the Node Allocation subnet graph) are already optimized in terms of country limit, node provider limit, and data center limit. Only the subnets with blocks marked in red need additional node machines from unique countries, unique data centers and unique node providers. Adding 8 nodes achieves this. An additional ninth node machine cannot be added to any of the existing subnets in order to improve any of the country limits, data center limits and node provider limits. | This can be explained by the fact that most subnets (marked in blue in the Node Allocation subnet graph) are already optimized in terms of country limit, node provider limit, and data center limit. Only the subnets with blocks marked in red need additional node machines from unique countries, unique data centers and unique node providers. Adding 8 nodes achieves this. An additional ninth node machine cannot be added to any of the existing subnets in order to improve any of the country limits, data center limits and node provider limits. | ||
+ | |||
+ | == Q&A == | ||
+ | |||
+ | * What is the maximum number of nodes for one country ? | ||
+ | ** As per the agreed target topology ([https://dashboard.internetcomputer.org/proposal/125549 proposal] link) the country subnet limit ranges between 2 and 3. Given that there are currently 40 subnets, once the proposed topology is fully implemented, each country will have the capacity to host between 80 and 120 nodes in total. This range is calculated by multiplying the number of subnets (40) by the per-subnet country limit (2 to 3). |
Latest revision as of 16:03, 12 July 2024
Background
In order to improve the decentralization network, an optimization model has been proposed on the forum (see node diversification part 1 and node diversification part 2) and also approved (see proposal 125367). Given a certain target topology, the model optimizes between node rewards (onboarding of additional new nodes and rewards for existing nodes) and decentralization, calculating the minimum number of additional node machines required in order to achieve specific decentralization targets.
The basis for the optimization model is a target IC topology for the next 6 to 12 months which may extend into Q1 2025 for contracts ending for Gen1 node machines. This target topology has been proposed on the forum in node diversification part 2, and approved by the community in proposal 125549 on 12th November 2023. This model sets targets for the number of Gen1 nodes and Gen2 nodes per subnet and the decentralization coefficients (Nakamato coefficients) per subnet.
Running the optimization tool with the current topology will produce a graph like above, with the red blocks showing the number of additional required node machines in order to reach the decentralization targets set in the IC Target Topology. The number of additional node machines required to reach the decentralization targets is visible in the ObjectiveValue
example graph is 68 additional Gen 2 node machines.
With the optimization model and the IC Target Topology, the goal is to implement a transparent, objective and reproducible approach for node provider onboarding. Running the model with the existing IC topology and the number of new node machines intended to be onboarded, the model allows the proposer:
- to verify whether adding additional nodes actually improves the decentralization, and
- allows the community to verify this improvement and vote on the proposal to add additional nodes.
Below the steps are described to run the optimization model and verify any new proposal. An example is also given for a specific (fictional) proposal.
Basic starting points
There are a few basic starting points that everybody should be aware of when using the optimization tool and the IC Target Topology. First of all, using the model requires some basic knowledge of Python and Github, so please make sure to familiarize yourself with these before starting.
In addition, it is important to note that the IC Target Topology is not fixed. It is voted in by the community as the target to be achieved within a certain timeframe. Proposal 125549 describes the target topology for the next half year/year, estimated on the current growth of the IC network. If expected growth should change, the community can decide to vote on an updated IC Target Topology with either more node machines or less node machines. Note that the Target IC Topology is intended to assess effectiveness of adding additional nodes, and identify potential nodes that are not relevant from a decentralization perspective. It does not assess or propose any changes in node rewards, which is a topic that the community discusses and votes upon separately.
If target IC Topology is reached - meaning the Objective Value
in the optimization model has reached the value of zero, no new node machines should be onboarded. Hence, NNS proposals for increasing the Node Allowance or defining a new Node Allowance should be rejected by the community. Once a new IC Target topology is defined or (other) Node Providers have reduced their Node Allowance, the Objective Value
might increase again which would allow for new proposals to be submitted and approved.
Steps to follow to assess node relevance
Assessing the relevance of adding new node machines using the optimization tool requires you to follow four basic steps:
- Installing and running the latest version of the optimization tool.
- Determining your candidate node configuration.
- Updating your candidate nodes in the optimization tooling.
- If adding new node machines is relevant, submitting the Node Operator proposal with the necessary background description.
Each of these steps is discussed in detail below.
Step 1: Downloading and installing the optimization tool
The optimization tool can be found in the following Github repository: https://github.com/dfinity/decentralization/. The repository is open sourced to allow the community to help improve the tooling with additional functionality and visualizations.
To run the model. Please follow the following steps:
- Find the repository on https://github.com/dfinity/decentralization/
- Either follow one of the following two approaches:
- Command Line approach - Follow the instructions as described in the README file, i.e.:
- Clone the repository to you computer
- Install Python Poetry
- Run the model in the command line as described in the README file
- Jupyter Notebook approach - Frequent Python users might have a preference for using Jupyter Notebook. If you want to use Jupyter notebook, you can follow the following steps:
- Copy the code from the repository files and delete the import commands and main() sections from the separate python files (
data_preparation.py
,helper_functions.py
,linear_solver.py
,visualization.py
) - Run each file in subsequent cells in Jupyter notebook
- You can subsequently rerun the model and adjust parameters using the interface of Jupyter notebook.
- Copy the code from the repository files and delete the import commands and main() sections from the separate python files (
- Command Line approach - Follow the instructions as described in the README file, i.e.:
- Carefully read the forum posts on node diversification part 1 and node diversification part 2 to understand the working of the tooling.
- Check the outcome of the model without making any updates to the model, in particular the
ObjectiveValue
as described above. As per 23 November 2023, the Objective Value should be similar to the graph shown above, with an additional 72 nodes needed to reach the target topology.
Step 2 : Determine your candidate node configuration
The Internet Computer dashboard shows the worldwide distribution of node machines. Based on this distribution, as a potential new Node Provider you can identify potential locations and data center providers for new node machines.
For the Node Provider economics and understanding the costs and rewards for running a node machine, please look at the detailed NP documentation on the Internet Computer wiki pages, in particular those on hardware configuration, data center requirements, and node rewards.
Once you have decided on potential data center locations and a potential number of new node machines that have viable economics, you can follow the following steps to add this candidate configuration to the optimization tool:
- Collect the following information:
- The node provider name (if you are an existing node provider, use your existing Node Provider name)
- The data center name (if you are intending to add node machines to an existing data center, use the existing data center name)
- The data center provider name (if you are intending to use an existing data center provider, use the existing data center provider name)
- The country name (if you are intending to add node machines to an existing country, use the existing country name)
- The number of nodes you intend to onboard
Step 3: Update the candidate node configuration in the optimization tooling
The optimization tool includes a function that allows candidate nodes to be added to the node configuration, and to run the optimization with including these candidate nodes. With the information collected in step 2, the following steps allow for the optimization tool to be run including the candidate nodes:
- Updated the latest topology in the optimization tooling
- Find the
df_candidate_nodes
function in themain.py
of theic_topology
directory. - You will see the following example entry:
df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi',data_center ='Buenos Aires',data_center_provider ='Perron Corporation',country = 'AR',is_sev = True,no_nodes = 0)
- Replace ‘Lionel Messi’ with the Node Provider name.
- Replace ‘Buenos Aires’ with the data center name.
- Replace ‘Perron Corporation’ with the data center provider name.
- Replace ‘AR’ with the country name.
- Replace “0” in no_nodes with the number of nodes intended to be onboarded.
- Find the
- Run the model and determine the
ObjectiveValue
:- If the
ObjectiveValue
is lower compared to theObjectiveValue
without making changes to the df_candidate_nodes, it means that adding one or more new nodes is increasing the decentralization of the IC. It is important to note that every single node machine should help 1:1 to reduce theObjectiveValue
. For, if theObjectiveValue
without adding any candidate nodes is 68, and 6 candidate nodes are added, theObjectiveValue
should reduce to 68 minus 6 is 62. If theObjectiveValue
is reduced to for example 66, only 2 candidate node machines will support further decentralization of the IC network, and no more than 2 candidate nodes should be added. - If the ObjectiveValue is the same compared to the
ObjectiveValue
without making changes to the df_candidate_nodes, it means that adding one or more new nodes does not improve the decentralization of the IC.
- If the
Step 4: Submitting a Node Operator proposal
If the conclusion from step 3 is that adding node machines helps decentralization of the IC network, the final step is to prepare to submit a proposal for onboarding these nodes. Following are the steps to take in order to submit this proposal:
- If you are not yet a Node Provider, please follow the steps on the NP onboarding wiki page to submit a Node Provider Proposal. Please make sure to submit the self-declaration form and the identity-document as part of the proposal, as described in the instructions.
- If you are intending to onboard node machines in a new data center, follow the subsequent steps on the same NP onboarding wiki page to submit a Data Center Proposal.
- Follow the steps on the same NP onboarding wiki page to submit a Node Operator Proposal. The Node Operator Proposal should include the following input in the summary text:
- The number of nodes planned to be onboarded.
- Node provider name (as per the public dashboard).
- The data center and country where nodes are to be onboarded.
- Evidence of running the optimization model and providing reproducible feedback on decrease in
ObjectiveValue
. This can be done in the following way:- Upload a pdf with screenshots of running the optimization model to the wiki, on the page of your specific Node Provider self-declaration documentation.
- Include the link to this pdf document and the hash of this document in the Node Operator Proposal.
- Include/describe exactly the used input data and config the optimization model was run with.
- Clearly state the
ObjectiveValue
before adding the candidate nodes. - Clearly state the
ObjectiveValue
after adding the candidate nodes.
- Complete the steps of the NP onboarding wiki page.
Example
The example below shows how to use the optimization model to validate whether candidate nodes improve the decentralization of the IC network. In the optimization model code in the Github library, you will find a function df_candidate_nodes
that is defined as follows:
df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi', data_center ='Buenos Aires', data_center_provider ='Perron Corporation', country = 'AR', is_sev = True, no_nodes = 0)
This function uses a hypothetical new Node Provider, called Lionel Messi, that is interested in setting up new node machines in the Buenos Aires data center. Lionel Messi uses the data center entity Perron Corporation as the data center name. Note that all nodes are marked as is_sev = True
which means that Gen2 node machines (with SEV SNP enabled) will be set up, and that no_nodes = 0
so when running the current optimization tooling zero nodes will be added; the optimization tool will only use the existing IC topology to calculate the number of required Gen2 Nodes, defined as ObjectiveValue
.
Now let’s add one node machine to the candidate node list of Lionel Messi. For this, the function is updated as follows:
df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi', data_center ='Buenos Aires', data_center_provider ='Perron Corporation', country = 'AR', is_sev = True, no_nodes = 1)
Re-running the optimization model will generate (amongst others) the following output. As can be seen from the Node Provider topology Matrix, Lionel Messi is added as a node Provider with one node (which is the top orange block in the graph, and the bottom name in the Node Provider legend on the right).
The output also shows the data center being added to the Data Center topology. Again, the node is shown on the top (as the block block on second to the top) and the data center name is shown at the bottom of the data center legend as Perron Corporation.
More importantly, the node allocation per subnet is now showing an updated ObjectiveValue
with a value of 67, hence 67 more Gen2 node machines are required to reach the IC target topology. Compared to running the model without Lionel Messi’s candidate node machine, the ObjectiveValue
is reduced from 68 (see beginning of this article) to 67, hence adding one extra node machine in Buenos Aires support the decentralization of the IC network.
It would be interesting to validate whether adding additional node machines in Buenos Aires further support the decentralization of the IC. Let’s increase the number of candidate nodes from 1 to 4 by updating the df_candidate_nodes
function as follows:
df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi', data_center ='Buenos Aires', data_center_provider ='Perron Corporation', country = 'AR', is_sev = True, no_nodes = 4)
The node allocation per subnet now shows an ObjectiveValue
with a value of 64, hence 64 more Gen2 node machines are required to reach the decentralization targets of the IC network. This is a reduction from the original 68 required node machines with 4 nodes. Hence, the model shows that adding 4 candidate nodes in Buenos Aires still adds to the decentralization of the IC network.
Further increasing the number of candidate nodes to 6 shows that decentralization of the IC network still improves. By adding 6 candidate nodes through the updated df_candidate_nodes
function, the ObjectiveValue
reduces to 62 from 68.
Similarly, adding 8 candidate node machines still improves decentralization, as the ObjectiveValue
reduced to 60.
df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi', data_center ='Buenos Aires', data_center_provider ='Perron Corporation', country = 'AR', is_sev = True, no_nodes = 8)
However, if we increase the number of candidate node machines in Buenos Aires from 8 to 9, we notice that the ObjectiveValue
does not reduce anymore. Hence, adding 8 node machines in Buenos Aires is the optimal solution for improving the decentralization targets.
df_candidate_nodes = create_candidate_node_dataframe(node_provider ='Lionel Messi', data_center ='Buenos Aires', data_center_provider ='Perron Corporation', country = 'AR', is_sev = True, no_nodes = 9)
This can be explained by the fact that most subnets (marked in blue in the Node Allocation subnet graph) are already optimized in terms of country limit, node provider limit, and data center limit. Only the subnets with blocks marked in red need additional node machines from unique countries, unique data centers and unique node providers. Adding 8 nodes achieves this. An additional ninth node machine cannot be added to any of the existing subnets in order to improve any of the country limits, data center limits and node provider limits.
Q&A
- What is the maximum number of nodes for one country ?
- As per the agreed target topology (proposal link) the country subnet limit ranges between 2 and 3. Given that there are currently 40 subnets, once the proposed topology is fully implemented, each country will have the capacity to host between 80 and 120 nodes in total. This range is calculated by multiplying the number of subnets (40) by the per-subnet country limit (2 to 3).