HTTP asset certification

From Internet Computer Wiki
Jump to: navigation, search

Motivation

A user interacting with the Internet Computer needs to be able to confirm that the responses they receive are actually coming from the Internet Computer and have not been tampered with. Traditionally, on the Internet, this problem is solved using public-key cryptography. The server running the service has a secret key and uses that to sign all its responses. A user can then verify the signature on the response using the server’s public key.

Just like a web server in Web2 maintains a public-key/secret-key pair, the Internet Computer blockchain as a whole maintains a public-key/secret-key pair. Additionally, each individual subnet in the Internet Computer also maintains its own public-key/secret-key pair. When a new subnet is formed, the NNS issues a certificate for the subnet which contains a signature of the subnet's public key with the Internet Computer's public key. When the subnet responds to a user's message, the response contains a certificate chain, which includes a signature on the response by the subnet's public key and the certificate issued by the NNS to the subnet. The user can verify the certificate chain using the Internet Computer's public key similar to verifying a certificate chain in Web2.

Each blockchain node shares only a piece of its subnet secret key. As a result, each node is incapable of signing a message by itself. But if at least 2/3rd of the nodes of a subnet agree on a message, they together can combine their secret key pieces to sign the message. The signed message can be verified easily using the subnet's public key. If the verification succeeds, it means that at least 2/3rd of the blockchain nodes running the canister agreed to deliver that message. The technology used by the Internet Computer to generate and maintain the secret key shares, and sign messages using the secret key shares is called chain-key technology.

The Internet Computer supports two types of messages: Query calls and Update calls. Query calls are similar to HTTP GET requests and do not modify the state of the Internet Computer. The query calls do not go through the consensus protocol. The user can make a query call to any blockchain node in the subnet, and only that (possibly malicious) blockchain node answers the query. As generating a certificate requires consensus from at least 2/3rd of the nodes of the subnet, the Internet Computer doesn't issue a certificate when responding to query calls.

As the query calls have low latency, the canisters deliver web pages to the client via query calls. However, as the client needs to verify the received content, the Internet Computer introduces the notion of Certified Variables/Certified Data. In a nutshell, a canister can a-priori choose to create a certificate for a piece of data and store it in the replicated state. Any user can later access the data along with its certificate via query calls. The user can use the IC public key to authenticate the body of the response. The notion of certified data can be used to certify all the assets (HTML, CSS, Javascript files, images, videos, etc.) of an app a-priori.

When a canister issues a response along with its certificate, a HTTP Gateway can be used to verify the certificate before passing on the response to the client.

Certified Data

In every round of the Internet Computer Protocol, the message routing layer generates a new per-round system tree. This tree is then Merkelized and the root hash is computed. The nodes in the subnet then engage in a protocol to create a certificate for the root hash of the system tree. This per-round system tree amongst other information also contains the "certified data" of each canister. The system tree looks as follows.

*root*
└── canisters
    ├── <canister id>
              ├── metadata
              ├── module_hash
              ├── controllers
              └── certified_data
                          └── <blob data>
    ├── <canister id>
       ...

The above image highlights the path of the certified data in the system state tree. The leaf storing certified data of a canister can be at most 32 bytes long. In case the canister would like to certify more than 32 bytes of information, then the canister has to hash the data before certifying it.

A canister can manipulate its certified data by calling the below System API methods. Please look at the interface spec for more details.

ic0.certified_data_set : (src: i32, size : i32) -> ()
ic0.data_certificate_present : () -> i32
ic0.data_certificate_size : () -> i32
ic0.data_certificate_copy : (dst: i32, offset: i32, size: i32) -> ()

Motoko base library includes a module called CertifiedData (documentation) which contains the below wrappers for the System API methods.

let set : (data : Blob) -> ()
let getCertificate : () -> ?Blob

Rust Canister Development Kit ic-cdk provides the below wrappers for the System API methods. Please refer to github for more details on their implementation.

pub fn set_certified_data(data: &[u8])
pub fn data_certificate() -> Option<Vec<u8>>

A certificate for the certified data consists of

  • Certificate for the root hash of the system tree.
  • Witness/Merkle proof to prove that the certified data belongs to a tree that hashes to the above root hash.

If the certified data is a hash of a few assets, then the certificate for a particular asset additionally contains a Merkle proof that the asset belongs to a tree that hashes to the certified data. Refer to #IC-Certificate header for more details.

Canister protocol

A canister must follow the following protocol to certify assets:

  • Construct a hash tree that maps paths of HTTP resources to SHA-256 hashes of their bodies. An example of such a tree:
    *root*
    └── http_assets
        ├── index.html -> SHA256(body)
        ├── ...
        └── /css/styles.css -> SHA256(body)
    
  • Compute the root hash of the tree and call ic0.certified_data_set with the bytes of the hash as the argument.
  • Add a #IC-Certificate header to each certified HTTP response.

Certifying Assets

A canister developer can certify the assets in the following 2 ways.

  • The canister developer can explicitly write code to manage and certify all the assets. In this case, the developer need to construct a tree containing all the assets, merkelize the tree and compute its root hash. To certify the root hash,
    • In Rust, the set_certified_data method provided by ic-cdk library needs to be called with the root hash as input.
    • In Motoko, the CertifiedData.set method needs to be called with the root hash as input (example on github).

The developer can also take the help of ic-certified-assets library (github) which contains many methods to maintain a tree and certify it.

  • The canister developer can alternately create an "asset canister", by creating a canister with type set to "asset" and specifying the folder containing all the assets. The asset canister is a regular canister, except that the boilerplate code for managing and certifying all the assets is taken care of for us. For example, refer to dfx.json file of the Hello World project (github). The developer can use this method to host even large web projects coded in frameworks such as React, Angular and Svelte on the Internet Computer with very little code. The developer just have to create an asset canister and specify the source folder of the web project. All the assets will be automatically uploaded to the Internet Computer and certified.

Generating a HTTP Response

The Internet Computer supports a built-in "query" method called http_request. The method takes the information related to a HTTP Request as input and outputs a HTTP Response. Specifically, the output contains status, headers and body. If a developer wants his canister to serve HTTP Requests, he should implement this method appropriately. When a client makes a HTTP Request to a canister, the boundary node (icx-proxy) converts the HTTP Request to a canister call to the http_request method and returns the canister's response. Please refer to interface spec for more details.

When the canister would like to return a certified asset, the response body should contain the asset and the response headers should include a header with name IC-Certificate and value equal to the certificate of the asset (example on github).

Validator protocol

The validator follows the following steps to validate the certificate of resource at path PATH served by canister CANISTER_ID:

  • Hash the body of the HTTP response, obtaining hash DATA_HASH.
  • Check that the response contains the IC-Certificate header.
  • Decode the certificate and the tree from the value of the IC-Certificate header.
  • Check the validity of the certificate as described in the Interface Specification: Certification. This step requires knowing the IC root key.
  • Check that lookup(/http_assets/PATH, tree) = Found(DATA_HASH). This check verifies that the path /http_assets/PATH in tree tree contains a leaf with value DATA_HASH. In other words, this check verifies that the asset (with hash equals DATA_HASH) is part of the tree specified in the certificate.
  • Check that lookup(/canister/CANISTER_ID/certified_data, certificate.tree) = Found(reconstruct(tree)). This check verifies that the path /canister/CANISTER_ID/certified_data in tree certificate.tree contains a leaf with value reconstruct(tree). In other words, this check verifies that the root hash of the asset tree is certified.

IC-Certificate header

The IC-Certificate header is a Structure Header (as per RFC proposal) is a dictionary with members certificate and tree, both of which are Byte Sequences:

IC-Certificate: certificate=:<base64(c)>:, tree=:<base64(t)>:

where

The certificate must be a valid Internet Specification: Certificate with

lookup(/canister/<canister_id>/certified_data, certificate.tree)
    = Found (reconstruct(tree))

The tree exposes the relevant nodes in the /http_assets subtree to allow the client to lookup the request path to get the expected body hash.

Example

For this example, /index.html of the Internet Identity canister (canister id rdmx6-jaaaa-aaaaa-aaadq-cai) available at https://rdmx6-jaaaa-aaaaa-aaadq-cai.raw.ic0.app/index.html was fetched. The SHA-256 hash of the resource at the moment of fetching is 478afb8206ca0b566a7f138e623accd169fa822602d2f6d717fb67d1045f4f0d. The response contained the following header:

IC-Certificate: certificate=:2dn3omR0cmVlgwGDAYMBgwJIY2FuaXN0ZXKDAYMBggRYIIudikoDwH1gRK637olblUhMUX3HlE0Dihj8MTACxGzHgwGCBFggyCRYc8M/ugt8G7C8RPYayn+l4sdBj8gvFotzJELnQ32DAYIEWCA1/+UHZ9SF67w4ssjOi+Jv3ch7WQNzezGmhtuvB+RDpYMCSgAAAAAAAAAHAQGDAYMBgwJOY2VydGlmaWVkX2RhdGGCA1ggWUt10wjWinx0aAWyrNEi/0R7VeuhalDMjGDErzIbZzqCBFgg/VtZRZdYyK/sr3KF2jWeS1rblF+4ajwfDv2ZbCGpaTiCBFggSoI5JS0pCusHP4nh6h780ebr961E0lVnFkFwzF5pZaeCBFggcKidPEGiPoFMPYfEyNGsDRYWmry1iGX0HNUEoKhIATeCBFggR0zdKUZOMcm5EHNl5Tee3XWqbq1gArwUGzZ2FH4rWtmCBFggTkwJcNrh0eJ9FutJcn6th9eCbM2KXnloxed0acxmQNeDAYIEWCA6SNH8IT1JMHEDEE99csK1kw7bqHh7kGMfNDs6popfCoMCRHRpbWWCA0nFnbXrts/65xZpc2lnbmF0dXJlWDCkXN2tcvH5b+xFCzfkuJMqrZDcplfW8vDziJwzx08WOPI4rh2TIGYZ3R6dgQTF0CA=:, tree=:2dn3gwGDAktodHRwX2Fzc2V0c4MBggRYIDgtAGcz5VvevwiEwwZB9zpkt17C9LE6o/O37bEwQUawgwGDAksvaW5kZXguaHRtbIIDWCBHivuCBsoLVmp/E45iOszRafqCJgLS9tcX+2fRBF9PDYIEWCCx2L8SfJwOydBkUxjc8tKXDVUeoiw8qEYI+8b+HRWIWYIEWCAqZ+3yoFSA9s+jbLFbtcVz+wi0HF9x51Kx38qPcBhiDA==:

The following data can be extracted from the header value:

ROOT HASH: 0b2d843df534ac8ed2331fe2782deb71d23a08d9b4019a8fa695ec7fde93de36
TREE HASH: 594b75d308d68a7c746805b2acd122ff447b55eba16a50cc8c60c4af321b673a
SIGNATURE: a45cddad72f1f96fec450b37e4b8932aad90dca657d6f2f0f3889c33c74f1638f238ae1d93206619dd1e9d8104c5d020
CERTIFICATE TIME: 2022-02-02T08:23:24.851277509+00:00
CERTIFICATE TREE:
HashTree {
    root: Fork(
        Fork(
            Fork(
                Label("canister", Fork(
                    Fork(
                        Pruned(8b9d8a4a03c07d6044aeb7ee895b95484c517dc7944d038a18fc313002c46cc7),
                        Fork(
                            Pruned(c8245873c33fba0b7c1bb0bc44f61aca7fa5e2c7418fc82f168b732442e7437d),
                            Fork(
                                Pruned(35ffe50767d485ebbc38b2c8ce8be26fddc87b5903737b31a686dbaf07e443a5),
                                Label(0x00000000000000070101, Fork(
                                    Fork(
                                        Label("certified_data", Leaf(0x594b75d308d68a7c746805b2acd122ff447b55eba16a50cc8c60c4af321b673a)),
                                        Pruned(fd5b59459758c8afecaf7285da359e4b5adb945fb86a3c1f0efd996c21a96938),
                                    ),
                                    Pruned(4a8239252d290aeb073f89e1ea1efcd1e6ebf7ad44d25567164170cc5e6965a7),
                                )),
                            ),
                        ),
                    ),
                    Pruned(70a89d3c41a23e814c3d87c4c8d1ac0d16169abcb58865f41cd504a0a8480137),
                )),
                Pruned(474cdd29464e31c9b9107365e5379edd75aa6ead6002bc141b3676147e2b5ad9),
            ),
            Pruned(4e4c0970dae1d1e27d16eb49727ead87d7826ccd8a5e7968c5e77469cc6640d7),
        ),
        Fork(
            Pruned(3a48d1fc213d49307103104f7d72c2b5930edba8787b90631f343b3aa68a5f0a),
            Label("time", Leaf(0xc59db5ebb6cffae716)),
        ),
    ),
}
TREE:
HashTree {
    root: Fork(
        Label("http_assets", Fork(
            Pruned(382d006733e55bdebf0884c30641f73a64b75ec2f4b13aa3f3b7edb1304146b0),
            Fork(
                Label("/index.html", Leaf(0x478afb8206ca0b566a7f138e623accd169fa822602d2f6d717fb67d1045f4f0d)),
                Pruned(b1d8bf127c9c0ec9d0645318dcf2d2970d551ea22c3ca84608fbc6fe1d158859),
            ),
        )),
        Pruned(2a67edf2a05480f6cfa36cb15bb5c573fb08b41c5f71e752b1dfca8f7018620c),
    ),
}

Limitations

  • The protocol supports only one resource per path. This does not work well with content negotiation protocol.
  • The protocol does not support certification of HTTP statuses and headers. Only resource bodies can be certified.

Serving HTTP Requests Dynamically

This wiki article describes how a canister can a-priori create certificates for assets and then serve these assets to the user. What if a canister has to dynamically generate a HTTP response based on the input of http_request method? In this case, the HTTP response doesn't include a certificate and cannot be trusted. To improve the trust in this case, a HTTP response includes an upgrade tag. If this tag is set, then the HTTP gateway processing the HTTP response would send the request again to an "update" method called http_request_update. The http_request_update method is a built-in method that also serves HTTP requests. As the responses to update calls are certified by the Internet Computer, this mechanism can be used to serve HTTP requests dynamically in a trustworthy way.

More details of the protocol in the interface spec.

Canisters using HTTP asset certification

Validators