IC Smart Contract Memory

Overall Architecture

Figure 1. The two memories that can be accessed by the canister smart contracts.

Canister smart contracts running on the Internet Computer (IC) store data just like most other programs would. To this end, the IC offers developers two types of memory where data can be stored, as depicted in Figure 1. The first is the regular heap memory that is exposed as the Web Assembly virtual machine heap. This should be used as a scratch, temporary memory that will be cleared after any canister upgrade. The second type of memory is the stable memory, which is a larger memory used for permanent data storage.

Orthogonal Persistence

use ic_cdk_macros::{query, update};
use std::{cell::RefCell, collections::HashMap};

thread_local! {
    static STORE: RefCell<HashMap<String, u64>> = RefCell::default();
}

#[update]
fn insert(key: String, value: u64) {
    STORE.with(|store| store.borrow_mut().insert(key, value));
}

#[query]
fn lookup(key: String) -> u64 {
    STORE.with(|store| *store.borrow().get(&key).unwrap_or(&0))
}

The IC offers orthogonal persistence, an illusion given to programs to run forever: the heap of each canister is automatically preserved and restored the next time it is called. For that, the execution environment needs to determine efficiently which memory pages have been dirtied during message execution so that the modified pages are tracked and periodically persisted to disk. The listing above shows an example key-value store that illustrates how easy it is to use orthogonal persistence. The key-value store in this case is backed by a simple Rust HashMap stored on the heap by means of a thread-local variable. A RefCell is used to provide interior mutability. The example would also be possible without it, but mutating the thread-local variable would be unsafe in that case, as the Rust compiler cannot guarantee exclusive access to it.

Heap Memory

Canisters running on the IC are programmed either in Rust or Motoko. The canisters are then compiled down to web assembly (Wasm). All the variables and data structures defined in these higher-level languages are then stored in the Wasm heap. All accesses to data structures and variables defined in the higher-level languages are then translated to memory copy operations in Wasm (e.g., load, store, copy, grow). The Wasm heap memory is a 4GiB, 32-bit address space that backs the Wasm programs. Due to possible changes in data structures and in Wasm (and high-level language) compilers, the heap should not be used as a permanent memory, but rather as a (faster) scratch, temporary memory. This is because during canister upgrades, the heap layout might change (i.e., data structure layouts) which could leave the canister in an unusable state. However, in-between updates the heap memory is persisted thanks to orthogonal persistence.

Stable Memory

Next to the heap memory, canister developers can make use of the stable memory. This is an additional 64-bit addressable memory, which is currently 48GiB in size, with plans to increase it further in the future. Programs written in either Rust or Motoko need to explicitly use stable memory by using the API. This API offers primitives to copy memory back and forth between the Wasm heap and the stable memory. An alternative to using this lower level API directly is to use the stable structures API, which offers developers a collection of Rust data structures (e.g., B-trees) that operate directly in stable memory. Next to using the stable memory through stable data structures, a pattern often used by developers is to persist heap state between canister upgrades. This is achieved via serializing heap memory (or data structures), saving it to stable memory and applying the opposite operations (copying back and deserializing) when the upgrade is done.

Behind the scenes: Implementation

To serve memory contents to canister smart contracts, the IC software stack has the following design. First, it is important to mention that every N (consensus) rounds, canister state (heap, stable memory and other data structures) are checkpointed on disk. This is called a checkpoint file. Whenever a canister executes messages after a checkpoint, all its memory resides in the checkpoint file. Therefore, all memory requested will be served from the checkpoint file. Memory modifications (i.e., dirtied pages in terms of operating systems) are saved in a data structure called the heap delta. The following paragraphs describe how this design enables orthogonal persistence.

Figure 2. The memory faulting architecture which encompasses the checkpoint file and the heap delta. .

Any implementation of orthogonal persistence has to solve two problems: (1) How to map the persisted memory into the Wasm memory?; and (2) How to keep track of all modifications in the Wasm memory so that they can be persisted later. Page protection is used to solve both problems.The entire address space of the Wasm memory is divided into 4KiB pages. All pages are initially marked as inaccessible using the page protection flags of the operating system.

The first memory access triggers a page fault, pauses the execution, and invokes a signal handler. The signal handler then fetches the corresponding page from persisted memory and marks the page as read-only. Subsequent read accesses to that page will succeed without any help from the signal handler. The first write access will trigger another page fault, however, and allow the signal handler to remember the page as modified and mark the page as readable and writable. All subsequent accesses to that page (both r/w) will succeed without invoking the signal handler.

Invoking a signal handler and changing page protection flags are expensive operations. Messages that read or write large chunks of memory cause a storm of such operations, degrading performance of the whole system. This can cause severe slowdowns under heavy load.

Versioning: Heap Delta and Checkpoint Files

A canister executes update messages sequentially, one by one. Queries, in contrast, can run concurrently to each other and to update messages. The support for concurrent execution makes the memory implementation much more challenging. Imagine that a canister is executing an update message at (blockchain) block height H. At the same time, there could still be a previous long-running query that started earlier, at block height H-K. This means the same canister can have multiple versions of its memory active at the same time; this is used for the parallel execution of queries and update calls.

A naive solution to this problem would be to copy the entire memory after each update message. That would be slow and use too much storage. Thus, our implementation takes a different route. It keeps track of the modified memory pages in a persistent tree data-structure called Heap Delta that is based on Fast Mergeable Integer Maps. At a regular interval (i.e., every N rounds), there is a checkpoint event that commits the modified pages into the checkpoint file after cloning the file to preserve its previous version. Figure 2 shows how the Wasm memory is constructed from Heap Delta and the checkpoint file.

Memory-related performance optimizations

Optimization 1: Memory mapping the checkpoint file pages. This reduces the memory usage by sharing the pages between multiple calls being executed concurrently. This optimization also improves performance by avoiding page copying on read accesses. The number of signal handler invocations remains the same as before, so the issue of signal storms is still open after this optimization.

Optimization 2: Page Tracking in Queries All pages dirtied by a query are discarded after execution. This means that the signal handler does not have to keep track of modified pages for query calls. As opposed to update calls, queries saw the introduction of a fast path that marks pages as readable and writable on the first access. This low-hanging fruit optimization made queries 1.5x-2x faster on average.

Optimization 3: Amortized Prefetching of Pages The idea behind the most impactful optimization is simple: to reduce the number of page faults, more work is needed per signal handler invocation. Instead of fetching a single page at a time, the signal handler tries to speculatively prefetch pages. The right balance is required here because prefetching too many pages may degrade performance of small messages that access only a few pages. The optimization computes the largest contiguous range of accessed pages immediately preceding the current page. It uses the size of the range as a hint for prefetching more pages. This way the cost of prefetching is amortized by previously accessed pages. As a result, the optimization reduces the number of page faults in memory intensive messages by an order of magnitude.

A downside of this approach is that prefetched page content needs to be compared with previous content after message execution to determine if a page was modified instead of relying on tracking write accesses via signal handlers.

These optimizations bring substantial benefits for the performance of the memory faulting component of the execution environment. The optimizations allow the IC to improve its throughput for memory-intensive workloads.