App Streaming: a New Development Paradigm for Environments with Constrained Resources
This blogpost describes a new paradigm, app streaming, to execute apps without memory constraints imposed by the hardware it runs on. This new paradigm is described in the context of Ledger hardware wallets, but the concepts can be applied more broadly to any embedded system with limited resources.
Ledger devices make use of tamper-resistant processor chips called Secure Elements (SE) to run crypto-currencies apps. Secure Elements aren’t general purpose CPUs and have limited resources. In this context, limited resources basically means low amount of memory, as shown in the following table:
The Nintendo 64 video game console, released in 1996 and discontinued in 2002, has more RAM and Flash memory than any of the Secure Elements used by these hardware wallets.
As a consequence, the development experience on these embedded systems is unusual to say the least. Code is often written in C and every single statement has to be written with memory constraints in mind. That is: no heap, limited stack and almost no room for static data. Starting from this assessment, almost every development limitation comes from these memory constraints and removing them would dramatically change the way apps are developed on Ledger hardware wallets.
External Memory
Usually, Secure Elements run a dedicated Operating System (OS) which handles hardware interaction, as well as the installation and execution of apps. Due to the limited amount of memory, apps are limited in size and only a limited number can be installed on a device.
In order to get rid of the aforementioned memory constraints, one could consider externalizing memory outside of the Secure Element with the introduction of 2 components:
- A Virtual Machine (VM) ran by the Secure Element.
- A cooperative companion on a different CPU such as a smartphone or a laptop.
Apps are developed in a standard way and compiled against the VM architecture.
The OS now runs a VM which launches authorized apps without requiring them to be fully loaded on flash memory. The app is streamed to the VM on-demand. Every single bit of data stored outside of the Secure Element by the companion is encrypted and authenticated.
The VM
An initial idea involved software instrumentation instead of a VM to catch memory accesses, but it resulted in an over-complex, unstable and non-standard solution heavily relying on OS internals. The RISC-V architecture was eventually chosen because of its simplicity (47 instructions only, which can be implemented in a few hundred lines of code) and support by GCC and LLVM. Since there’s an LLVM backend, plenty of languages can be compiled into RISC-V, especially Rust.
The VM executes RISC-V program instruction by instruction (eg. XOR
, JAL
, etc.) and updates registers (SP
, PC
, A0
, A1
, etc.) and memory (heap, stack, code, data) accordingly. Each memory access to uncached addresses is forwarded to the companion which acts as an external memory storage.
The ECALL
instruction is used to make a request to the supporting execution environment. Examples of ECALLs are send()
or recv()
to exchange data with the outside world, draw()
to display messages on a screen, or even exit()
to make the app exit. ECALLS are implemented through syscalls by the VM.
The Companion
The companion is a third party library running on a different machine, which stores a dictionary of pages (data) indexed by their addresses. This dictionary is initialized with the app code and data. It waits for messages from the Secure Element to either:
- Send a page given its address (page request);
- Store a page given its address (page commit).
The companion isn’t trusted by the Secure Element and considered malicious. We’ll see later how cryptographic mechanisms are used to ensure the desired security properties.
The Apps
In the context of app streaming, apps are standard RISC-V ELF binaries executed by the VM. A manifest is associated with an app, storing metadata such as its name and version, ELF information (entrypoint, data and code addresses and sizes), stack address and size, etc.
Ensuring the Security of App Streaming
The main security objective is to guarantee the authenticity, integrity and confidentiality of the memory (stack, heap, data) required by an app during its execution.
An app is a static 32-bit RISC-V ELF binary with only 2 sections: code (read-only) and data (read-write). These sections are divided into pages of 256 bytes. A page has an address (4 bytes since ELF are 32-bit binaries) and a counter (4 bytes) initialized to 0.
The companion (ie. the PC or the smartphone) isn’t trusted and has no knowledge over the subsequently described keys.
Please note that a few cryptographic mechanisms (eg. AES-CBC and HMAC) could be replaced with modern ones. However, there are limitations over the algorithms supported by the SE and the OS, hence these choices which might sound old-fashioned.
Pages Encryption and Authentication
There are 2 sets of keys for AES encryption/decryption and HMAC authentication:
- A static set (
DevHMACKey
) used to authenticate the read-only pages (code) of the app. - A dynamic set (
DynamicKeyAES
,DynamicKeyHMAC
) initialized randomly by the VM each time an app is launched and used to encrypt and authenticate writable pages (heap, stack, data).
AES-256-CBC is used for encryption. The page data (256 bytes) is encrypted. The IV is addr || counter || '\x00' * 8
where addr
and counter
are 4 bytes each, encoded in little-endian. The key is DynamicKeyAES
.
HMAC-SHA256 is used for authentication in an “encrypt-then-mac” pattern. The following message is authenticated: encrypted_data || addr || counter
where encrypted_data
is the page data (256 bytes) encrypted using AES followed by addr
and counter
which are 4 bytes each and encoded in little-endian. The key is either DevHMACKey
or DynamicKeyHMAC
, respectively for read-only and writable pages. Note that initially, every page is authenticated using DevHMACKey
. Each time a writable page is committed after having been modified, its counter is incremented and the key used is DynamicKeyHMAC
.
App code and read-only data were initially encrypted in the first versions of this project. While it’s technically feasible, it isn’t useful in our scenarios and we dropped this requirement. Everything else is encrypted and authenticated, and can’t be decrypted or tampered with by the companion. It guarantees that secrets can be manipulated and transit through the app memory without being revealed and that the expected behavior of the app or the data on which the app operates cannot be modified without it being detected.
Anti-Replay
Thanks to authentication, an attacker can’t forge or temper with pages. However the VM must keep an attacker from replaying valid pages that were later rightfully modified.
Each writable page is part of a Merkle tree whose root hash is kept by the VM. The Merkle tree is initialized with the writable pages from the app. Each time the VM commits a page, this page is either inserted in the Merkle tree (for a new stack page or heap page) or the corresponding node in the Merkle tree is updated.
The nodes of this Merkle tree are made of 8 bytes of data associated to a page: addr || counter
. It’s guaranteed that there’s a unique node in the tree associated with an address. To prevent second preimage attacks – as designed by Certificate Transparency, a 0x00
byte is prepended to the hash data of leaf node hashes, while 0x01
is prepended when computing internal node hashes.
A Merkle tree path for an address contains the shortest list of additional nodes in the tree required to compute the root hash for that tree. A node is either prefixed by the character L
if it’s the left child of its parent or R
otherwise.
When the VM receives a writable page from the host, a Merkle tree path is used to ensure that the node made of the associated address and counter is actually part of the tree. It guarantees that the counter associated with the page is valid.
The following graph shows an example of a Merkle tree stored by the host. Each leaf, in green, is a couple of (addr, counter)
. The existence of a leaf can be proved thanks to the root hash, in red, along the associated Merkle tree proof composed of a few hashes.
Exchanges
With the exception of the manifest initially sent by the host, there are 2 kind of exchanges, always initiated by the VM:
- Requesting a page, given its address. The page data, counter, HMAC and Merkle tree proof are returned by the host. Note that for read-only pages, there is no need for a Merkle tree proof since the page data will not change during the execution and thus the counter cannot be something else than zero.
- Committing a writable page. The page data, counter and HMAC are sent by the VM. Committing a page always increments its associated counter by 1. The host replies with the Merkle tree proof of the old page to allow the VM to recompute the new Merkle root.
Protection Against Cache Attacks
Uncached memory accesses are forwarded to the companion which act as an external memory storage. Addresses are transmitted in clear to the companion, allowing an attacker to identify memory access patterns.
It doesn’t matter because cryptographic operations are implemented through VM syscalls. Cryptographic secrets are stored in the VM memory which is never exposed outside of the device. An additional countermeasure, not implemented, could be the introduction of a mechanism to forbid specific pages from being uncached.
Apps Distribution
The memory encryption and authentication mechanisms described above ensure confidentiality, authenticity and integrity on the memory exported outside of the Secure Element. Care has also been taken to avoid replay attacks.
In this section, we describe how the VM ensures that only legit apps are executed, that is apps that are signed by a trusted authority, here a Ledger HSM (Hardware Security Module), a secure server designed to hold sensitive keys.
Manifest
The manifest contains:
- the entrypoint address
- the code, stack and data sections addresses
- the application name and version
- the Merkle tree root hash, size and last entry
Ledger Signature
Each official app is signed by a Ledger HSM. Given an ELF file, the HSM generates a .zip archive where the app’s manifest is signed by the Ledger HSM using ECDSA. The manifest contains the SHA-256 hash of the app (code_start || code_end || data_start || data_end || code.bin || data.bin
) as well as ELF addresses and the initial Merkle tree.
Note that code.bin
and data.bin
aren’t encrypted.
This .zip archive is generated once and available publicly for download by clients such as Ledger Live.
Signed Manifests
The HSM public key is embedded in the VM code, which allows the VM to verify app manifest signature.
The VM is restricted to only run apps signed by the device itself. To produce this signature, each app first needs to be transmitted to the device with the manifest signed by a Ledger HSM. The device checks this signature with its embdedded Ledger HSM public key and produces 3 files stored by the companion: the code pages HMACs, the data pages HMACs and the ECDSA signature of the manifest by the device.
We chose to make each device sign apps before launching them (instead of the HSM) to prevent a scenario where a malicious VM app would dump symmetric HMAC and AES keys. In that case, it would only impact the app and the device for which this manifest was generated.
Key Derivation
In order to guarantee the uniqueness of keys used to authenticate each apps, 2 random seeds of 32 bytes are generated during the first launch of the VM by the device:
DevSigSeed
: derives the ECDSA key (DevSigKey
) to sign the manifestDevHMACSeed
: derives theDevHMACKey
key
Secrets are derived from these seeds using SHA256(seed || app_hash)
where app_hash
is the hash of the application. BIP32 derivation could also be used to derive these secrets, especially on hardware wallets, to make them persistent across reinstallations.
This derivation mechanism allows the VM to derive the same set of keys for a given app_hash
.
Code and Data HMACs
Once the device has verified that the manifest’s signature is valid, a random and temporary AES key (TmpAES
) is generated by the device. This key is used to encrypt data sent to the host and will eventually be transmitted to the host if and only if the signature over pages is valid.
Computing HMACs of code and data pages is done through the following steps. For each code and each data page:
- The host sends the page;
- The VM computes the HMAC-SHA256
page_data || addr || counter
where the key isDevHMACKey
andcounter
is0
; - The VM encrypts this HMAC using AES-256-CBC with the key
TmpAES
. - The VM sends the encrypted HMAC to the host.
- The VM updates the SHA-256 context used to update
app_hash
.
Once each page has been received, the VM is able to compute app_hash
. If the SHA-256 digest is equal to the one in the manifest, it means that the code and data pages sent by the host are valid. The VM sends the AES key TmpAES
to the host, which eventually decrypts every HMAC.
The VM finally signs the manifest using DevSigKey
and sends the signature to the host.
Manifest Signature
The host is now able to decrypt all HMACs received from the device and generate the final .zip archive. As shown below, the .zip archive has now 3 additional files in the device
folder: manifest.device.sig
, code.mac.bin
and data.mac.bin
.
The app can now be streamed to the device.
Alternative Approach: Getting Rid of HMACs
In this section, we will not discuss the encryption that is used to ensure the confidentiality of the data but we will only look at an alternative to the authentication mechanism described before. Authentication can be summarized as the property allowing an entity to prove the integrity of a message toward a verifier. This verifier can be public in the case of digital signature or someone sharing a secret with the prover in the case of Message Authentication Codes (MAC).
When outsourcing memory, the device need to be able to prove the integrity of a message to itself in the future. One way to achieve this is indeed to use a MAC for each page of data or code, as explained above. In the case of HMACs, each MAC requires the computation of two hashes. Additionally, before the first execution of the app, a rather complex setup needs to be performed in order to ensure the app comes from a trusted source, by generating MACs for every page.
How the fact that there is only one entity (the device) at play during the execution of an app can be used to design an alternative approach to authentication?
By computing and storing the hash (and not a MAC) of the page that the device wants to outsource, the device can keep a fingerprint of the page before sending it to the external memory. Then, to verify the authenticity of an incoming page, the device has to hash the page once and compare this to the stored hash to detect any alteration of the data. If the internal storage of the device cannot be tampered with, then this approach guarantees that the outsourced data cannot be modified by the external memory. However, it requires for the device to store a state (the hash of the page) for every page. Thus, the memory needed in the device is still linear in the number of page and while it reduces the memory footprint on the device by a factor, this limits the interest of having an external memory. Also, using individual hash for each page does not make it simpler to verify that the app is genuine: either the device has already installed the hash (which defeats the purpose of this work) or at the beginning of every execution, the hashes should be recomputed and their authenticity verified (which does not make it simpler than what is already done with MACs).
Our proposal is to focus on the use of a tool specifically designed to achieve a memory-time trade-off: Merkle trees. Instead of authenticating and storing all the hashes, the device only needs to authenticate and store the Merkle root. This approach using only Merkle trees allows to get rid of every MAC: during the execution, the authenticity of the outsourced pages are guaranteed by checking Merkle proofs and updating the Merkle tree accordingly.
For data page, it does not increase the time nor communication complexity during the execution because a Merkle tree is already used and Merkle proofs are already checked to prevent replay attacks. The major drawback is for code pages: in the previously described approach, after the first setup, only the verification of HMACs (which are computed by computing two hashes) is required by the device to verify an incoming page; in the pure Merkle tree approach, the Merkle proof is required to be checked, which costs log(n) hashes to compute.
This approach was implemented and proves to be a solid alternative to HMACs. It shows that several cryptographic approaches can be chosen to guarantee the same security properties. A deeper analysis of the impact on performance as well as usability and maintainability have to be conducted before making a definitive choice over the one that will be kept for future versions of this project.
Apps Examples
A few apps were developed as examples:
- This app computes the sha256sum over an arbitrary amount of data. It’s developed in C, calls functions from the libc (eg.
malloc()
) and makes a standard usage of the heap. - This app implements a few features of the swap app and allows swapping BTC and ETH. This is a Rust proof-of-concept which highlights how usual development can be done. Notice the usage of Protobuf, standard libraries, tests, etc.
The EIP-712 specification is particularly difficult to implement with limited resources because large JSON messages do not fit in memory and recursive algorithms can lead to stack overflows. These issues don’t exist with the app-streaming concept.
The app is streamed to the device during the whole time by the companion software running on the PC (not shown in the video). After having decoded the request, the JSON message is parsed by the app and relevant fields are displayed to the user. Finally, the user agrees to sign the transaction and the signature is computed and sent to the PC.
Conclusion
This new paradigm allows developing apps running on embedded device but with memory only limited by the one of the higer-end device fulfilling the role of the companion in other architecture (with a limit of 4Gb of RAM, heap and code size due to the definition of the address space). Moreover, it requires no specific knowledge and allows the use of standard toolchains, libraries and tools. The whole code is public on github.com/LedgerHQ/app-streaming and contributions are welcomed.
Follow-up work should be conducted, especially to assess the impact of such an approach on the overall performance and usability. On the one hand, the VM is obviously slower than native execution and data transfers between the companion and the device is an additional source of delay. On the other hand, user experience is entirely different; apps do not need to be to installed anymore and the user interface is more responsive and richer since there is no code size limitation. Additionally, intensive computation operations can be delegated to the VM.
We strongly believe that this concept, if developed broadly, will change the way apps are developed and even shape a different future for development on hardware wallets. Outside of the crypto-currency world, the concept can be applied to any embedded processor with limited resources.
Finally, this project wouldn’t have been possible without yhql and Nics feedback, discussions, cryptographic reviews and Rust teaching; Salvatore Ingala for the initial idea around the coffee machine and eventually the whole Donjon Team.