Meltdown and Spectre: Design security

4 mins read

While Meltdown and Spectre identified and exploited significant design flaws in CPUs some systems were prepared and protected.

Earlier this year two critical vulnerabilities were identified called Meltdown and Spectre triggering a wave of security advisories and patches that were required to manage design flaws identified in CPUs.

While the initial focus was on Intel chips it soon became apparent that other processors, among them ARM and AMD were also being affected.

Meltdown and Spectre provide insight into building more resilient systems and while coverage at the time focused on the vulnerabilities themselves and the problems in addressing them, some systems were, in fact, prepared and protected—requiring no patches, recompiles, or redesigns.

Why? Because in those cases the separation kernel technology, based on the work of John Rushby, which provides system developers with a stronger ability to separate critical and non-critical computing environments through increased hardware control was used.

Separation is the key to safety and security and has guided high assurance system designs for many decades. For safety, partitioning for aviation systems relies on separating components to ensure their protection. For security, the US Department of Defense relies on modular separation of system design and controlled information flow for securing information. Within these critical contexts, separation failures are safety and security failures and Meltdown is a practical litmus test, revealing those achieving safe and secure separation and those who did not.

Meltdown allows unprivileged attackers to extract memory accessible to an operating system (OS) or hypervisor by exploiting Out-of-Order Execution (OoOE), a performance optimisation technique utilised by nearly all OS kernel designs. OoOE allows transient deferral of the resolution of user application access permissions to kernel memory, benefitting from address translation caching as data transfers to and from application memory. Because user process and kernel memory address translation definitions are set in the same page table, the OoOE transient permission check deferral allows attacking applications to directly reference unauthorised kernel virtual addresses in their program, causing the CPU to load unauthorised memory into the cache.

While the CPU eventually throws a memory access exception, it does not restore the state of the cache, allowing the attacker to utilise cache timing analysis for deducing values of transiently referenced kernel memory.

Solution to Meltdown

The simple solution to Meltdown is page table isolation and least privilege memory access, techniques native to the LynxSecure Separation Kernel, where all kernel-to-physical addresses are stored in different page tables from application/guest OS page tables. Alternative to traditional centralised resource and service-oriented designs, it is rooted in a truly least privilege design from both a kernel construction and user model perspective.

LynxSecure forces each guest computing environment to run self-sufficiently, decentralising the control of resources and execution scheduling. The autonomy of each guest environment obviates the need for the kernel to provide global services, eliminating the design choice of combining kernel address translation records with guest address translation records.

Without kernel address translation records collocated with user space, all Meltdown attempts to transiently load unauthorised memory addresses in the CPU cache fault immediately making LynxSecure systems immune to Meltdown.

Spectre summary

Spectre accesses unauthorised memory by exploiting CPU branch prediction, a performance optimisation technique in which the CPU ‘guesses’ destination addresses of execution control flow operations to keep the execution pipeline from waiting for resolution of a target location. The two identified methods of exploitation are: bounds check bypass, and branch target injection.

Bounds check bypass targets victim code with boundary protections on caller-controlled indices into memory. By training the branch predictor with successive calls to the function using legal index values, the attacker ‘tricks’ the CPU into assuming subsequent function calls are safe to transiently jump into the condition pass code. The attacker then executes the function call with an index value and the CPU transiently loads memory into the cache according to attacker-controlled indices.

Branch target Injection targets code with branch instructions that jump from variable target addresses into code (called ‘gadgets’) that access private memory within the victim’s address space. By training the CPU branch predictor, the attacker causes code to jump into normally non-accessible gadgets.

Little can be done to correct or detect the attack mechanics, as attack code executes normal functions and sets general purpose registers before calling a victim target function. Patches can mitigate Spectre - the most effective being a compiler modification that adopts an alternative machine code generation model for handling conditional application code that is not subject to the specific branch target injection exploit - but this requires all existing software to be recompiled and does not preclude the use of exploitable CPU instructions.

Below, figure 1 - Spectre central service exploit possibilities

In a central service-oriented architecture, how does one know if common services like user login and data encryption expose underlying secrets, including passwords, encryption keys, etc.?

Figure 1 shows attack vectors a user application possesses against such architectures.

Spectre measures activity of victim code that shares a common compute resource. In traditional architectures limiting an attacker from deducing kernel-maintained system secrets is nearly impossible. Worse, if applications can dynamically gain OS/hypervisor administrative rights, then the threat of attack rises dramatically.

Many OSes and hypervisors claim Least Privilege if they feature a mandatory access control system or if they place their drivers outside of the kernel address space. This is insufficient. The LynxSecure separation kernel takes the Principle of Least Privilege further, eliminating the central resource manager, data services, and user administrative control while allowing applications to run without modification in guest environments. Critical code responsible for initialising hardware and managing hardware exceptions is separated from code supporting application services. Furthermore, each application support layer is autonomously segmented. The design decouples application support layers from vital hardware control and ensures that application environments are supported independently, allowing applications only limited transient access to unintended CPU states.

Software designs with central services such as data protection are unavoidable. With a least privilege foundation, a critical security service can be decoupled from a kernel, limiting exposure to need-to-know applications. However, the service must execute somewhere and be resilient to side-channel attacks.

LynxSecure has been designed to provide system configuration controls that can preclude applications from gaining access to sensitive information. Through fine-grained resource allocation control, secrets can be guaranteed to execute on CPU resources that are completely independent from a user process, protecting them from malicious applications.

Where there are not enough available resources to dedicate to critical functions, greater attention to software design must be made. Spectre thrives in software designs with tight coupling between malicious applications and victim code. Traditional OS/hypervisor System APIs are prime targets, as attackers can link victim interfaces and call target code directly.

LynxSecure provides a distributed system architecture that decouples applications and services in which applications must post a request for a service and deposit data for it via message APIs. This provides increased separation between attack and victim code, blinding attackers from victim service interfaces and allowing service code to execute on non-intersecting compute resources.

Below, figure 2: The alternative LynxSecure system architecture, in which resources of critical services are physically separated from applications and services are linked to applications through a message interface versus a system call interface

Author details:

Will Keegan is CTO, Lynx Software Technologies