Difference between revisions of "Internet Computer performance"

From Internet Computer Wiki
Jump to: navigation, search
(Most recent performance numbers as measured on CD.)
 
(15 intermediate revisions by 3 users not shown)
Line 1: Line 1:
A key objective of the Internet Computer is to provide a public compute layer that replaces traditional IT. A natural concern is that this will cause an increase in global energy consumption, which is bad for the environment, because the Internet Computer is a blockchain network. However, this is not the case.
+
'''While having the security of Web3 blockchains, the performance of the Internet Computer (IC) is comparable to Web2 and cloud technology stacks. The IC far outperforms traditional blockchain protocols in efficiency.'''
 +
 
 +
== Performance goals==
 +
A key objective of the Internet Computer is to provide a public compute layer that replaces traditional IT. A natural concern is that this will cause far less efficient computation.
  
 
The Internet Computer works very differently to other blockchains, and is powered by advanced new cryptography. Internally, the network is able to strictly limit the replication of data and computation, while still providing the liveness and security guarantees expected of a blockchain. It also has the ability to assign different “trust levels” to units of blockchain code that it hosts (“smart contracts”), which changes the level of replication applied to their computations and data. In its current state of development, it is already orders of magnitude more efficient than other blockchains, but it is designed to eventually become more efficient that traditional IT too.
 
The Internet Computer works very differently to other blockchains, and is powered by advanced new cryptography. Internally, the network is able to strictly limit the replication of data and computation, while still providing the liveness and security guarantees expected of a blockchain. It also has the ability to assign different “trust levels” to units of blockchain code that it hosts (“smart contracts”), which changes the level of replication applied to their computations and data. In its current state of development, it is already orders of magnitude more efficient than other blockchains, but it is designed to eventually become more efficient that traditional IT too.
  
Like all blockchains, the Internet Computer network directly applies replication, in combination with advanced cryptography, to create a tamperproof platform with far better liveness guarantees than traditional IT. Yet, it also limits replication, while using the replication that occurs to drive efficiency, for example by scaling out “query” transactions. Systems and services built using traditional IT platform often heavily reply upon replication, but because replication is tagged-on, rather than being a core part of how the underlying platform works, we believe that in the long run the Internet Computer will be substantially more efficient.
+
Like all blockchains, the Internet Computer network directly applies replication, in combination with advanced cryptography, to create a tamperproof platform with better liveness guarantees than traditional IT. Yet, it also limits replication, while using the replication that occurs to drive efficiency, for example by scaling out “query” transactions.
  
 
For example, a large online service might be built on Amazon Web Services using a database in a master-slave configuration, Kubernetes instances of web workers, memcached instances for caching the results of database queries, and a CDN (content distribution network) that caches web content they serve on the edge of the network. This already creates a large amount of replication without creating a tamperproof platform, nor providing liveness guarantees. For example, each slave node of the database replicates its computations and data, and regular snapshots will also be taken as backups, data used by the web workers is replicated by the memcached instances, and each work will also cache data in its memory, while the product of web queries will be replicated all over the world on CDN nodes.
 
For example, a large online service might be built on Amazon Web Services using a database in a master-slave configuration, Kubernetes instances of web workers, memcached instances for caching the results of database queries, and a CDN (content distribution network) that caches web content they serve on the edge of the network. This already creates a large amount of replication without creating a tamperproof platform, nor providing liveness guarantees. For example, each slave node of the database replicates its computations and data, and regular snapshots will also be taken as backups, data used by the web workers is replicated by the memcached instances, and each work will also cache data in its memory, while the product of web queries will be replicated all over the world on CDN nodes.
Line 10: Line 13:
  
 
== Performance experiments ==  
 
== Performance experiments ==  
Here, we describe the DFINITY Foundation's performance evaluation of the Internet Computer. The [https://forum.dfinity.org/t/internet-computer-performance-dec-1-2021-load-testing/9240 current measurements] are from May 2022.
 
 
 
Scalability of the Internet Computer is facilitated by sharding the IC into subnet blockchains. Every subnet blockchain can process '''update calls''' (writes) from ingress messages independently from other subnets. The IC can scale up by adding more subnets at the cost of having more network traffic (as applications potentially need to communicate across subnets). In its current form, the IC should be able to scale out to hundreds of subnets.
 
Scalability of the Internet Computer is facilitated by sharding the IC into subnet blockchains. Every subnet blockchain can process '''update calls''' (writes) from ingress messages independently from other subnets. The IC can scale up by adding more subnets at the cost of having more network traffic (as applications potentially need to communicate across subnets). In its current form, the IC should be able to scale out to hundreds of subnets.
  
Line 20: Line 21:
 
The experiments were run concurrently against all subnets other than the NNS and some of the most utilized application subnets to avoid disturbance of active IC users.  
 
The experiments were run concurrently against all subnets other than the NNS and some of the most utilized application subnets to avoid disturbance of active IC users.  
 
The IC has a set of boundary nodes that route calls to the core nodes that host the subnets. The experiments sent loads against the subnets directly and are did not route traffic through the boundary nodes. Boundary nodes have additional rate limiting, which is currently set slightly more conservative compared to what the IC can handle and running against the boundary nodes would therefore be  unsuitable for performance evaluation.  
 
The IC has a set of boundary nodes that route calls to the core nodes that host the subnets. The experiments sent loads against the subnets directly and are did not route traffic through the boundary nodes. Boundary nodes have additional rate limiting, which is currently set slightly more conservative compared to what the IC can handle and running against the boundary nodes would therefore be  unsuitable for performance evaluation.  
The experiment targeted all nodes in every subnet concurrently, much the same as what boundary nodes would be doing if we would use them.
+
The experiment targeted all nodes in every subnet concurrently, much the same as what boundary nodes would be doing if they would be used.
  
 
The experiment consisted of installing one counter canister in every subnet. This counter canister is essentially a no-op canister. It only maintains a counter, which can be queried via query calls and incremented via update calls. The counter value is not using orthogonal persistence, so the overhead for the execution layer of the IC is minimal. Stressing the counter canister can be seen as a way to determine the system overhead or baseline performance.
 
The experiment consisted of installing one counter canister in every subnet. This counter canister is essentially a no-op canister. It only maintains a counter, which can be queried via query calls and incremented via update calls. The counter value is not using orthogonal persistence, so the overhead for the execution layer of the IC is minimal. Stressing the counter canister can be seen as a way to determine the system overhead or baseline performance.
Line 26: Line 27:
 
== Measurements ==
 
== Measurements ==
  
The following measurements were made on May 24, 2022, with 31 application subnets (having each 13 nodes) out of a total of 35 subnets (4 are system subnets such as the NNS and SNS subnets that have more nodes).
+
We evaluate the performance of the IC on a CD pipeline, which is running periodically. Those benchmarks target a single subnetwork with a configuration close to IC nodes on mainnet. We scale up those numbers to the current number of nodes and subnetworks on mainnet, which yields the following numbers:
  
=== Update calls ===
+
Query calls: '''3,196,225''' queries/s    (7,025 queries/s per node scaled up to 455 nodes in application subnetworks)
The Internet Computer sustained more than '''20'841 updates/second''' calls to application canisters for a period of four minute (averaging '''672 updates/second''' per subnet).
 
The update calls measured here are triggered from ingress messages sent from outside the IC.
 
  
=== Query calls ===
+
Update calls: '''33,749''' updates/s    (1,023 updates/s per subnetwork scaled up to 33 application subnetworks)
Arguably more important are query calls, since they contribute to more than 90% of the traffic observed on the IC.
 
The Internet Computer processed '''1'125'982 queries per second''' calls to application canisters (averaging '''2'792 queries per second''' per node).
 
During the experiment each load is increased incrementally and run for a period of 5 minutes.
 
  
== Energy consumption ==
+
Above calculation is based on measurements from: '''2023-11-22'''.
The following is an approximation of mainnet power consumption.
 
The average power consumption of an Internet Computer node is 700 W.
 
If we assume a power usage effectiveness (PUE)  [https://en.wikipedia.org/wiki/Power_usage_effectiveness 1], [https://energyinnovation.org/2020/03/17/how-much-energy-do-data-centers-really-use/ 2],  of 2.33 that leads to a total power consumption of 1631.0 W including cooling and other data center operations costs.
 
Given a total of 518 nodes and 11 boundary nodes in mainnet, resulting in a worst case of 862799W to operate all IC nodes for mainnet (including also system subnets).
 
This is a worst case analysis for power consumption of nodes as we would normally expect them to throttle when not fully utilized and thereby reducing power consumption.
 
  
Given the maximum rate of updates and queries that we can currently support in the IC, one update call would consume 38.95 J (Joules) and one query call 0.59 J. These figures are for a hypothetically fully utilized IC.
+
All benchmark run against a small number of canister that simply return, as the goal of this benchmark is to measure throughput of the messaging subsystem and to determine runtime overhead of message processing.
With the current approximate rate of 3300 transactions/s, the IC uses 261.45 J per transaction.
 
  
In the future, the energy consumption will be much lower as the overhead of the system subnets will be comparatively smaller, boundary nodes will contain caching, and the replica software much more optimised.
+
Canister code can be (almost) arbitrarily complex and therefor significantly lower the throughput if canister execution is becoming the bottleneck (and not messaging).
  
 +
===Previous measurements ===
 +
The following measurements were made on '''May 24, 2022''', with 31 application subnets (having each 13 nodes) out of a total of 35 subnets (4 are system subnets such as the NNS and SNS subnets that have more nodes). Benchmarks where executed by simultaneously stressing all subnetworks on mainnet.
  
== Putting this in context ==  
+
====Update calls====
We can see that even with conservative estimations, the energy consumption of the Internet Computer is substantially lower than competing blockchain projects, but also existing (highly optimized) web2 tech. See the table below to put IC performance in perspective.
+
The Internet Computer sustained more than '''20'841 updates/second''' calls to application canisters for a period of four minutes (averaging '''672 updates/second''' per subnet).
{| class="wikitable"
+
The update calls measured here are triggered from ingress messages sent from outside the IC.
|+ Energy consumption comparison
 
|-
 
! Source !! Cost (measured in Joules (J))
 
|-
 
| One Internet Computer transaction || 261 J
 
|-
 
| One Google search || 1'080&thinsp;J<ref>https://store.chipkin.com/articles/did-you-know-it-takes-00003-kwh-per-google-search-and-more</ref>
 
|-
 
| One Solana transaction || 1'837&thinsp;J<ref>https://solana.com/news/solana-energy-usage-report-november-2021#ref1</ref>
 
|-
 
| One Ethereum 2 transaction || 126'000&thinsp;J<ref>https://blog.ethereum.org/2021/05/18/country-power-no-more/</ref>
 
|-
 
| One Cardano transaction || 1'972'440&thinsp;J<ref>https://www.trgdatacenters.com/most-environment-friendly-cryptocurrencies/</ref>
 
|-
 
| One Ethereum transaction || 692'820'000&thinsp;J<ref>https://digiconomist.net/ethereum-energy-consumption/</ref>
 
|-
 
| One Bitcoin transaction || 6'995'592'000&thinsp;J<ref>https://digiconomist.net/bitcoin-energy-consumption</ref>
 
|}
 
  
 +
====Query calls====
 +
Arguably more important are query calls, since they contribute to more than 90% of the traffic observed on the IC.
 +
The Internet Computer processed '''1'125'982 queries per second''' calls to application canisters (averaging '''2'792 queries per second''' per node).
 +
During the experiment each load is increased incrementally and run for a period of 5 minutes.
  
  
== Conclusion and next steps ==
+
==Conclusion and next steps==
 
The Internet Computer today already shows impressive performance. On top of that, it should be possible to further scale out the IC using:
 
The Internet Computer today already shows impressive performance. On top of that, it should be possible to further scale out the IC using:
* More subnets: This will immediately increase the query and update call throughput. While adding subnets might eventually lead to other scalability problems, the IC in its current shape should be able to support hundreds of subnets.
+
*More subnets: This will immediately increase the query and update call throughput. While adding subnets might eventually lead to other scalability problems, the IC in its current shape should be able to support hundreds of subnets.
* Performance improvements: Performance can also be improved by better single machine, network and consensus performance tuning. Increasing the performance by at least an order of magnitude is plausible.
+
*Performance improvements: Performance can also be improved by better single machine, network and consensus performance tuning. Increasing the performance by at least an order of magnitude is plausible.
  
 
==See Also==
 
==See Also==
 
+
*'''The Internet Computer project website (hosted on the IC): [https://internetcomputer.org/ internetcomputer.org]'''
* [https://medium.com/dfinity/the-internet-computers-transaction-speed-and-finality-outpace-other-l1-blockchains-8e7d25e4b2ef The Internet Computer’s Transaction Speed and Finality Outpace Other L1 Blockchains]
+
*[https://medium.com/dfinity/the-internet-computers-transaction-speed-and-finality-outpace-other-l1-blockchains-8e7d25e4b2ef The Internet Computer’s Transaction Speed and Finality Outpace Other L1 Blockchains]
* [https://forum.dfinity.org/t/internet-computer-performance-dec-1-2021-load-testing/9240 Internet Computer Performance - Dec 1, 2021 Load testing]
+
*[https://forum.dfinity.org/t/internet-computer-performance-dec-1-2021-load-testing/9240 Internet Computer Performance - Dec 1, 2021 Load testing]
=== References ===  
+
===References===  
 
<references />
 
<references />

Latest revision as of 10:47, 22 November 2023

While having the security of Web3 blockchains, the performance of the Internet Computer (IC) is comparable to Web2 and cloud technology stacks. The IC far outperforms traditional blockchain protocols in efficiency.

Performance goals

A key objective of the Internet Computer is to provide a public compute layer that replaces traditional IT. A natural concern is that this will cause far less efficient computation.

The Internet Computer works very differently to other blockchains, and is powered by advanced new cryptography. Internally, the network is able to strictly limit the replication of data and computation, while still providing the liveness and security guarantees expected of a blockchain. It also has the ability to assign different “trust levels” to units of blockchain code that it hosts (“smart contracts”), which changes the level of replication applied to their computations and data. In its current state of development, it is already orders of magnitude more efficient than other blockchains, but it is designed to eventually become more efficient that traditional IT too.

Like all blockchains, the Internet Computer network directly applies replication, in combination with advanced cryptography, to create a tamperproof platform with better liveness guarantees than traditional IT. Yet, it also limits replication, while using the replication that occurs to drive efficiency, for example by scaling out “query” transactions.

For example, a large online service might be built on Amazon Web Services using a database in a master-slave configuration, Kubernetes instances of web workers, memcached instances for caching the results of database queries, and a CDN (content distribution network) that caches web content they serve on the edge of the network. This already creates a large amount of replication without creating a tamperproof platform, nor providing liveness guarantees. For example, each slave node of the database replicates its computations and data, and regular snapshots will also be taken as backups, data used by the web workers is replicated by the memcached instances, and each work will also cache data in its memory, while the product of web queries will be replicated all over the world on CDN nodes.

Because replication is at the core of the design of the Internet Computer, it can derive powerful security, liveness and other properties from replication, while also applying it more efficiently. For example, because the Internet Computer is a single logical blockchain and platform, as it grows larger, the utilization of the underlying node hardware upon which it runs can be made higher than, say, a standalone server machine in a data center. A key objective of the Internet Computer is, over time, to provide a public compute platform that provides a more power efficient way for the world to build systems and services.

Performance experiments

Scalability of the Internet Computer is facilitated by sharding the IC into subnet blockchains. Every subnet blockchain can process update calls (writes) from ingress messages independently from other subnets. The IC can scale up by adding more subnets at the cost of having more network traffic (as applications potentially need to communicate across subnets). In its current form, the IC should be able to scale out to hundreds of subnets.

Query calls (reads) can be processed locally by nodes in a subnet. The response to a query call can therefore have low latency since the query just needs a response by a single node and does not require inter-node communication or agreement. The more nodes a subnet has, the more query calls it can handle; and the more nodes the IC has, the more query calls it can handle.

Test setup

The experiments were run concurrently against all subnets other than the NNS and some of the most utilized application subnets to avoid disturbance of active IC users. The IC has a set of boundary nodes that route calls to the core nodes that host the subnets. The experiments sent loads against the subnets directly and are did not route traffic through the boundary nodes. Boundary nodes have additional rate limiting, which is currently set slightly more conservative compared to what the IC can handle and running against the boundary nodes would therefore be unsuitable for performance evaluation. The experiment targeted all nodes in every subnet concurrently, much the same as what boundary nodes would be doing if they would be used.

The experiment consisted of installing one counter canister in every subnet. This counter canister is essentially a no-op canister. It only maintains a counter, which can be queried via query calls and incremented via update calls. The counter value is not using orthogonal persistence, so the overhead for the execution layer of the IC is minimal. Stressing the counter canister can be seen as a way to determine the system overhead or baseline performance.

Measurements

We evaluate the performance of the IC on a CD pipeline, which is running periodically. Those benchmarks target a single subnetwork with a configuration close to IC nodes on mainnet. We scale up those numbers to the current number of nodes and subnetworks on mainnet, which yields the following numbers:

Query calls: 3,196,225 queries/s (7,025 queries/s per node scaled up to 455 nodes in application subnetworks)

Update calls: 33,749 updates/s (1,023 updates/s per subnetwork scaled up to 33 application subnetworks)

Above calculation is based on measurements from: 2023-11-22.

All benchmark run against a small number of canister that simply return, as the goal of this benchmark is to measure throughput of the messaging subsystem and to determine runtime overhead of message processing.

Canister code can be (almost) arbitrarily complex and therefor significantly lower the throughput if canister execution is becoming the bottleneck (and not messaging).

Previous measurements

The following measurements were made on May 24, 2022, with 31 application subnets (having each 13 nodes) out of a total of 35 subnets (4 are system subnets such as the NNS and SNS subnets that have more nodes). Benchmarks where executed by simultaneously stressing all subnetworks on mainnet.

Update calls

The Internet Computer sustained more than 20'841 updates/second calls to application canisters for a period of four minutes (averaging 672 updates/second per subnet). The update calls measured here are triggered from ingress messages sent from outside the IC.

Query calls

Arguably more important are query calls, since they contribute to more than 90% of the traffic observed on the IC. The Internet Computer processed 1'125'982 queries per second calls to application canisters (averaging 2'792 queries per second per node). During the experiment each load is increased incrementally and run for a period of 5 minutes.


Conclusion and next steps

The Internet Computer today already shows impressive performance. On top of that, it should be possible to further scale out the IC using:

  • More subnets: This will immediately increase the query and update call throughput. While adding subnets might eventually lead to other scalability problems, the IC in its current shape should be able to support hundreds of subnets.
  • Performance improvements: Performance can also be improved by better single machine, network and consensus performance tuning. Increasing the performance by at least an order of magnitude is plausible.

See Also

References