The risks of oversharing

5 mins read

Though peak clock rates hit the glass ceiling of 5GHz almost two decades ago, silicon scaling has managed to keep delivering improved performance by using parallelism instead of raw speed.

Not surprisingly, the avionics sector has become increasingly keen to take advantage of the performance that multicore operation offers.

As with trends in the automotive industry, if the performance is available, it makes a lot of sense to consolidate the processing for lots of little tasks onto larger and often cheaper multiprocessor boards. In a market where fuel efficiency is becoming the main challenge, fewer and more powerful computers weigh less. With the integrated modular avionics (IMA) architecture, consolidating onto standardised hardware makes it easier to move tasks around the different units easily as needs change. But there is a catch with this strategy: multicore interference.

Though a real-time operating system (RTOS) that supports the Arinc 653 standard, which is used in most avionics’ computers, guarantee cycles to critical tasks by rigidly time-slicing time the processor’s available cycles, the situation of how exclusive that access is becomes much hazier in the multicore environment. With just a single processor on the board, short of direct memory access (DMA) transfers getting in the way of cache flushes and refills, the running task running can count on exclusive access to memory and I/O ports. If multiple cores are running different tasks at the same time, that is no longer the case.

Cache conflicts represent one form of inter-task conflict that can, potentially, force tasks to miss their deadlines. In these situations, the problem lies in a level-two or level-three cache that is shared between the cores on a SoC. If the memory accesses from two tasks running simultaneously map onto the same cache lines, they can keep evicting each other’s data, forcing much slower accesses to main memory. In applications that are dominated by memory transfers, the bus itself can be a source of conflict as most controllers are designed on a best-effort basis and do not prioritise responses, in contrast to the way that an RTOS or hypervisor might allocate processor time.

Measurements by Rapita Systems on an Nvidia Jetson NGX board running the YOLO neural network for real-time visual object recognition against its library of interference generators showed that just one other task with high memory demands can more than halve the frame rate. Green Hills Software performed analyses that showed as much as an eight-fold decrease in performance with just one core interfering with another. In practice, much depends on what winds up being mixed on the same hardware. To help try to address the problem, the Certification Authorities Team (CAST) published in 2014 the CAST-32A standard to guide avionics teams on how best to analyse the problems caused by interference as part of their airworthiness certification process.

“We have mixed opinions along with our customers. On one end of the spectrum customers view multicore as an overhyped problem, given DMA peripherals introduced interference channels well before CAST-32A was published and they will continue to develop and evaluate multicore designs in the same fashion as single core SoCs with DMA I/O devices. Others are overwhelmed with CAST-32A guidance imposing nearly unbounded assurance activities,” says Will Keegan, CTO of Lynx Software Technologies.

Some of the scenarios look like the attacks proposed to capture encryption keys from high-end processors, which has led to the manufacturers making changes to their architectures that may help designers in the safety-critical arena. The Meltdown attack, for example, relies on software designed to repeatedly evict cache lines used by a victim task and so has clear parallels to the kinds of interference tests used to probe avionics behaviour.

A hardware solution?

“Given that multicore interference is a hardware issue, certainly the best theoretical answer is to address it in hardware,” says Richard Jaenicke, director of marketing for safety-critical systems at Green Hills Software.

Keegan points to changes made by Intel and Arm recently to improve their memory and cache partitioning features, some of which have been introduced for automotive designs, as favourable for avionics users. But Jaenicke says the value of the partitioning features in the newer multicore offerings may be limited in aerospace and still make it difficult to achieve certification to standards like DO-178C.

“The airworthiness market is not sufficient in size to drive a full hardware solution to multicore interference, so some software solution will be needed for the foreseeable future,” Jaenicke adds.

Depending on their interference tolerance, avionics integrators have a number of software-mitigation techniques to which they can turn. The first step is to evaluate the problem.

Although one option is to test the actual application code, this may not uncover serious problems that may occur rarely, such as a batch process interfering with a real-time task. Instead, the same kinds of benchmarks that Green Hills and Rapita used to demonstrate the problem can be used to underpin extensive tests to determine how vulnerable different data channels on the target board are to interference.

“Our interference generating libraries have dozens of different interference patterns to determine the true worst-case execution time, and our customers can select subsets of those to match the types of interference they predict or measure in their multicore system,” says Jaenicke.

Interference mitigation is then a matter of working out which bits of firmware need to change to minimise the risk of real-time tasks missing their deadlines. “There is no one-size-fits-all mitigation for interference channels,” says Nick Bowles, head of marketing at Rapita.

Some methods attempt to remove the problem at source. “Assigning critical tasks to exclusive execution windows where other cores run idle during the critical task periods, temporarily turning multicore processors into single core, is the most common method we come across,” explains Keegan.

Reducing the platform to a single running core clearly limits performance and the benefits of using multicore. With its Integrity-178 tuMP RTOS, Green Hills offers the ability to control the bandwidth used by low-criticality tasks to avoid them starving more important processes of memory bandwidth. This often goes hand-in-hand with actively managing the use of cache partitioning so that the important tasks do not have to risk their cache lines being evicted by other

“Allocation of bandwidth from the CPU to the on-chip interconnect is effective for all types of resource accesses through the interconnect. Architectures with a shared cache between the CPU core and the on-chip interconnect benefit from cache-specific mitigation techniques, such as cache partitioning, with bandwidth allocation addressing all the other types of interference,” Jaenicke says.

Keegan notes, “Publications on dynamic memory bandwidth allocation and throttling in the operating system have demonstrated sound benchmarks in bounding latency response times of application memory access.” But he adds the published can rely on hardware features for debug and telemetry that may be unqualified, and which do not necessarily pick up on contention from DMA transfers initiated by I/O controllers.

Where the critical application is focused on actuation or receiving sensor data, Keegan sees designers choosing highly conservative policies, right up to enforcing single-task execution when the high-criticality task is running. “For general purpose vehicle and mission platforms hosting adaptive and evolving workloads, we see customers holding out to leverage the latest hardware partitioning features,” he adds.

A further complication lies in the way the IMA systems are increasingly being used in a network, with readings from sensors collected on one module being passed to other modules in the airframe. One piece of good news for implementors is that the networks being used often provide better guarantees for high-priority traffic using time-triggered protocols. Keegan says the time-triggered task model has shown promising results.

“However, there is not enough public information on how to properly implement drivers for network stacks to bind deterministic network policies with multicore platforms that need to buffer and forward packets to applications across every core. Given how complex operating-system drivers, network stacks and middleware can be, and the variety of network device integration approaches with software platforms, it is fair to say there is still a bigger picture problem than multicore interference channels that needs to be addressed to get high performance real-time distributed systems to synchronise optimally,” Keegan says.