A layered approach to enhancing security for safety-critical software

7 mins read

Today’s safety-critical embedded environments are becoming more connected to the outside world. The ongoing growth of the Internet and IoT solutions will further drive connectivity requirements for safety-critical systems for the foreseeable future. Aerospace, defence, automotive, medical and industrial control are just some of the vertical markets that will expand as the ability to interconnect and remotely work with devices grows.

The benefits of being able to monitor and control safety-critical embedded devices are real and substantial. Remote monitoring and command-and-control of embedded safety-critical devices will benefit military applications (e.g. UAV surveillance and payload delivery) as well as give consumers benefits such as enhanced personal mobility (e.g. mobile and implantable medical devices) and increased safety (e.g. vehicle-to-vehicle communications). Device makers will be driven towards networking their safety-critical devices in their bids to be innovative and stay competitive. However, with this increase in connectivity, safety-critical systems developers will also need to be concerned with security in addition to safety-critical functionality.

Developers of safety-critical embedded software can no longer consider the domain of security as being separate from the domain of functional safety. Any system interfaced to the outside world has the potential to expose security vulnerabilities. In particular, systems connected to the Internet and IoT need to be protected against specialized targeted malware attacks as well as a whole world of hackers.

The tie between security and safety is also acknowledged by safety related process standards. When there is a potential for security vulnerabilities, standards such as IEC61508/EN61508 (process), ISO26262 (automotive), IEC62304/EN62304 (medical), IEC62278/EN50128 (railways), and DO-178 (aerospace) require functional safety requirements to deal with them. Each of these process standards have requirements for function identification, specification, classification, design, development, and verification to show compliance with system safety requirements.

Security vulnerabilities in safety-critical systems are not merely academic concepts. There are published examples of safety-critical systems found to be vulnerable to security design flaws. In 2011, security researcher Barnaby Jack devised an attack to wirelessly take control of Medtronic’s implantable insulin pumps, demonstrating how such a pump could release a fatal dose of insulin. In 2014, researchers from the University of Michigan were able to take control of a networked traffic signal system currently deployed in the United States due to a number of security flaws. In today’s networked world, Johnny down the street, or a hacker halfway around the world, is now a potential safety threat if he hacks into a medical device or the local traffic light system.

Now that the importance of security on safety-critical embedded systems has been established, this article is going to examine how to use a layered software approach to enhance security. In particular, it will explore how a least privilege separation kernel can provide security protection to a connected safety-critical embedded system, while maintaining the deterministic behaviour required of the safety-critical functionality.

A Common Design Approach

First, let’s consider how to design a safety-critical embedded system with networked monitoring. A common approach would be to take an embedded design using sensors and actuators, and simply add networking to the system. The safety-critical part would commonly be one application, and the networked part would be another. The two applications would then coordinate over an IPC mechanism. Figure 1 shows at a high level how a simple safety-critical embedded application might be made into a network monitored safety-critical application.

Figure 1 - Basic and common networked embedded systems

From a security standpoint, the basic safety-critical system’s isolation makes it more secure. However, hackers can attack the network monitored system via the Internet and exploit weaknesses in the networked application, TCP/IP stack, device drivers, or the OS itself. Each of these could provide a large surface for attackers. Furthermore, once an attacker gets into the system through the network, the entire system is potentially exposed. Typical microkernel architectures that link applications with the kernel to produce a single binary executable have no separation between user space and kernel space. Once a hacker gets into such a system, everything is exposed and open to exploitation.

Using a layered software approach

A layered software approach is a pragmatic way of addressing security concerns. However, safety-critical embedded systems also require real-time deterministic behaviour. Real-time deterministic behaviour, or determinism, requires a bounded response time to events. If the system loses its bounded response time, it loses its determinism. In the layered software approach, it is important to ensure that determinism is maintained while security is enhanced. Choosing the wrong software layers can compromise both determinism and security.

For the layered software approach, hypervisors provide better security for safety-critical embedded systems. The hardware isolation and virtualization of hypervisors will potentially allow isolation of the safety-critical functionality from the rest of the system and protect it from security threats.

Hypervisors are typically classified as Type-2, Type-1, or Type-0.A Type-2 hypervisor sits on an OS and provides virtualisation for other OSes to sit on top of the hosting OS. A Type-1 hypervisor sits on top of bare metal and uses an assisting OS in its virtualization. The Type-1 hypervisor’s OS is less obvious, but still provides a dynamic memory manager, process model, dynamic scheduler, file system, network stack, device drivers, system API, application ABI, and so forth, just like the host OS for a Type-2 hypervisor. The typical desktop hypervisors are prime examples of Type-2 or Type-1 hypervisors.

A Type-0 hypervisor sits on top of bare metal, but unlike a Type-1 hypervisor, it does not use an assisting OS. A least privilege separation kernel is a Type-0 hypervisor that’s used to efficiently partition the system’s resources between guests and then tightly control the data flows of information between them. It is designed with security and minimal performance impact as its primary objectives.

Figure 2 shows the three types of hypervisors.

Figure 2 - The 3 different types of hypervisors

The following sections examine how hypervisors might be used to protect our hypothetical network monitored safety-critical embedded system. A hypervisor can be used to separate the two sets of application functionality – safety-critical and networked monitor. One virtual machine guest will be the safety-critical system, and the other virtual machine guest will provide the remote monitoring capabilities. A data channel will be used for inter-VM communication between the two VMs. The safety-critical system can then discard its TCP/IP stack. This will simultaneously remove a large attack surface from the safety-critical system and simplify its design. Figure 3 shows how such a design might look with each of the three different types of hypervisors.

Figure 3 - System design concepts using the 3 types of hypervisors

Type-2 Hypervisor Scenario

In this scenario, because the networked monitoring system is connected to the Internet, the host OS is also connected to the Internet. A security flaw in the host OS, or one of its applications, can thus be exploited by a hacker and the very underpinnings of the safety-critical system will then be compromised. Essentially, the safety-critical system is exposed to the security vulnerabilities of the host OS and its applications – a very large potential attack surface.

Furthermore, the determinism of the safety-critical system will be lost. General purpose OSes, which are used to host Type-2 hypervisors, are non-deterministic. They are built with user experience in mind, not determinism. A general purpose OS will dedicate CPU cycles to lower priority applications in order to enhance the user experience and carry on background tasks. Deterministic systems always run the highest priority task to the exclusion of lower priority tasks. One cannot place a deterministic system on top of a non-deterministic system and expect it to retain its determinacy.

Using a Type-2 hypervisor will not work. It neither improves security nor retains the determinacy our safety-critical system requires.

Type-1 Hypervisor Scenario

In the Type-1 hypervisor scenario, the Type-1 hypervisor is not exposed to the potential security vulnerabilities of host applications like a Type-2 hypervisor. This gives it a reduced attack surface, which is a plus for the Type-1 hypervisor. However, a Type-1 hypervisor still has the assisting OS’ dynamic memory manager, process model, dynamic scheduler, file system, network stack, device drivers, system API, application ABI, and other components as potential security vulnerabilities.

As far as the determinism of the safety-critical system goes, the same issues arise with a Type-1 hypervisor as with a Type-2 hypervisor. The Type-1 hypervisor’s reliance on the functionality of a general purpose OS makes it non-deterministic. The safety-critical system’s determinism is lost because it has been inside a non-deterministic environment.

Type-0 Hypervisor Scenario – least privilege separation kernel

Obviously, the typical Type-2 and Type-1 hypervisors are not going to be able to provide our safety-critical embedded system with the security and determinism it requires. What is needed is a hypervisor that’s primary functionality is to efficiently partition the system’s resources (e.g. CPU cores, I/O devices, and memory) between guests and then tightly control the data flows of information between them. Functionality like device drivers and networking need to be pushed up the stack into the guest OSes, allowing the hypervisor to reduce the amount of privileged code. When we do these things, we then have a least privilege separation kernel.

In the least privilege separation kernel’s scenario, the safety-critical embedded system and networked monitoring system will again be placed within the hypervisor as separate guests. The least privilege separation kernel, being a Type-0 hypervisor, sits on top of bare metal and does not use an assisting OS for virtualization. That means there are no host OS applications, dynamic memory manager, process model, dynamic scheduler, file system, network stack, device drivers, system API, application ABI, etc., to provide possible attack surfaces.

Because the safety-critical system’s network stack has been removed, its communication to the outside world has been reduced to just its sensors, actuators, and the tightly controlled data channel. This reduces the safety-critical system’s attack surface to a bare minimum. Placing our safety-critical system inside a least privilege separation kernel isolates the safety-critical system, and thus improves the security of the design.

What happens if the networked monitoring system is hacked or attacked? Because the least privilege separation kernel partitions the hardware among its guests, the networked monitoring system’s hardware is separate from the safety-critical system’s hardware. The networked monitoring system cannot access the safety-critical system’s hardware and resources. If the networked monitoring system is hacked or attacked, the safety-critical system continues on unimpeded. In addition, since the least privilege separation kernel only partitions the system’s hardware resources and provides a tightly controlled data channel, there are no drivers, scheduler, network stacks, or other attack surfaces for the compromised monitoring system to use to get into the least privilege separation kernel. Both the least privilege separation kernel and the safety-critical system are protected against the compromised networked monitoring system.

As far as determinism goes, the least privilege separation kernel does not have a host OS’ or assisting OS’ scheduler, device drivers, network stack, etc., to get in the way of the guests’ determinism. It can be both deterministic and tiny – consisting of as little as 25K lines of code. Running a safety-critical embedded system within the deterministic least privilege separation kernel can maintain the safety-critical system’s determinism.

Using a least privilege separation kernel, the safety-critical embedded system can retain its determinism and have improved security through the reduction of its attack surface.

Conclusion

A least privilege separation kernel is primarily concerned with partitioning a system’s hardware resources among guests and tightly controlling data flows between them. It isolates the guest systems into separate virtual machines so they cannot reach or interfere with each other. Furthermore, a least privilege separation kernel does not contain OS-style schedulers, kernel services, device drivers, file systems, network stacks, etc. which can be exploited to compromise a system. It can be as small as 25K lines of code, and be deterministic.

Layering safety-critical software with a least privilege separation kernel is a viable method of providing the security a connected safety-critical embedded system needs through system isolation, while maintaining the deterministic behaviour required of the safety-critical functionality.

Author Profile:
James Deutch is a product specialist at Lynx Software Technologies