Multicore processing for embedded systems

7 mins read

Dual cores have been around for some time in several application spaces. In order to deliver the kind of performance predicted by Moore in the 1960s this appears today to be the only direction to go. They bring with them significant challenges and problems but also some major advantages and benefits.
There are two basic software paradigms in the market today AMP and SMP. In SMP the burden of task allocation lies on the OS but in AMP this allocation is done by the software engineer at the outset of the project.
The SH7205 is the latest SuperH device from Renesas containing two SH2A cores and provides a new solution to an old problem by boosting the performance available in the embedded space.

Despite their diverse origins, two enduring topics that still arouse debate and discussion in the engineering community (or at least the engineering circles that I move in) are those of; Dilbert comics and Moore’s law. Although a discussion article of the latest challenge to meet our friend in the comic book world would perhaps be very interesting the topic of today is Multi-core and has more to do with Gordon Moore than Dogbert, Wally and friends. In 1965 Gordon Moore predicted that the performance of ICs would double every 18 months. The law has hit many stumbling blocks over the years, but engineering ingenuity has always found a solution to ensure that performance improvements were always possible. Changes in lithography processing, the introduction of newer materials for gate dielectrics, and the use of copper for interconnects are just a few examples in manufacturing technology that are helping to keep the Moore story alive. It has however long been acknowledged that manufacturing technology can only go so far. In the world of supercomputing, where performance is pushed to the limit, parallelism has been seen as the only way to achieve the kind of performance required, both in terms of instruction parallelism but also in terms of multiple CPUs implementing multiple concurrent algorithms. What is Multi-core? This article is focused on “Multi-core” architectures: That is to say single integrated circuits with more than a single core. The aforementioned supercomputers often use “multiple CPU” architectures with more than one IC. Although the theory is broadly similar the focus will be on systems with a single IC. Broadly speaking Multi-core architectures can be split into two segments; Symmetric multiprocessing (SMP) and asymmetric multiprocessing (AMP) systems. SMP systems are possibly the simplest concept to understand and also what most users imagine when the subject of multiple cores arises. SMP is typified by a single “all powerful” operating system which is “system aware” and uses dynamic load balancing algorithms to allocate specific tasks or functions to cores based on their loading of cores at a specific time. This has the huge advantage allowing users to take existing application software written for a single core system and gain an immediate performance increase. AMP systems are on the other hand quite different. Each core runs its own independent OS. These OSs don’t necessarily need to be the same and in fact it is not actually necessary for both cores to run an OS at all. The cores are essentially unaware of each other. Such a system does however require more changes in the application software than a SMP system. There is not an “all powerful” OS to manage the core-loading for the user. The uses of Multi-core Mobile communications, both in terms of handsets and base stations have required the performance of dual or multiple core processors for some time now and the solution is fairly standard. The PC market can be broadly accepted to be a “multi CPU” architecture, with one processor dedicated to the main processing, another dedicated to graphics processing and so on. The first dual-core processors to hit main CPU market for the PC were in 2006 and are now being widely adopted. For the embedded processing market however there are very few markets spaces where dual-core or multi-core devices are actually being used today. This is about to change. What are the Advantages and challenges Clearly the most obvious advantage of a dual or multi-core system is that the performance increases. Ostensibly in a dual core system the user can get twice the performance at the same clock frequency. This is not always completely true and is dependant on the choice of either AMP or SMP system however some increase in performance is certainly always the case. One offshoot of the increase in performance which is often overlooked in first analysis, but is probably even more relevant to the embedded space is that of power consumption. If you double the throughput of a device it might well mean that such a device can do twice as much at a specific clock frequency, however it would also mean that the same amount of work can be done at half the clock frequency. As frequency and power consumption are approximately linear an additional core can assist in halving the power consumption of the device. One problem that cannot easily be circumvented by the use of one of the multiprocessing systems (SMP or AMP) is that of resource conflict. Software designers, compiler writers and system architects have to be aware of potential problems. If core A is waiting for the Ethernet peripheral to be released before it can, for example, send a message on the Ethernet bus and update a value in RAM, but core B is waiting for the value in RAM to be updated before it will release control of the Ethernet peripheral. We have a problem! This problem is much more difficult to resolve in SMP systems than in AMP systems but in both cases will have catastrophic consequences for the application. SMP approaches bring with them their own set of advantages. As already stated the chief advantage of this approach is that the performance increase is immediate. The software writer depends on the OS to do all of the allocation and load balancing and thus does not need to change significantly his or her coding style. On the other hand however SMP systems do have some significant drawbacks. The first and foremost drawback of the SMP system approach is the complexity and hence cost of the OS. There are several OSs available permitting SMP implementations but very few are aimed at real-time applications or the embedded space. A non deterministic behaviour is probably the most significant problem in the embedded space. It is difficult to calculate the longest possible response delay in an SMP system because the load balancing algorithm is very complex and the chances that a system will enter an unexpected state is relatively high. In embedded applications where real-time response and predictable timing behaviour are mandatory, this is by no means trivial. Another aspect of the fact that such a system can enter one of several different states is that fault tolerance is difficult to build in and also hard to replicate. Pretend for an instance that a device with four cores has a fault in one of the cores meaning that when application “A” is run directly after application “F” an erroneous result is calculated. Finding the fault and replicating the fault becomes exponentially more difficult with the degree of parallelism. The final factor speaking against the SMP approach is that of scalability. The algorithms used by the OS used to calculate the loading of the separate cores is not a trivial task. Opinions differ but the consensus appears to be a system reaches maximum achievable performance once eight cores connected. The addition of a ninth core will add sufficient complexity to the OS that it will actually lower the overall system performance. AMP systems have the advantage that no operating-system updates are necessarily required. Standard operating systems can be used. Because the cores are essentially unaware of the existence of the other core, the only additional consideration is how the two cores will communicate. This is usually achieved by allocating a shared section of memory which both cores will have access to. In an AMP system each task or application runs on a dedicated core. This can lead to very limited performance improvement if the tasks are inappropriately allocated. Once the system is set up there is no dynamic allocation of tasks so once the system has been set up, short of completely reprogramming the device there is no way to re-allocate the tasks. The AMP approach requires that significant consideration be given at the offset to which tasks and applications will be run on each core. On the other hand however some of the burden of an SMP system is alleviated. An AMP system can scale quite well so long as the tasks can be separated into distinct sections. It is also more deterministic and faults are much more reproducible: Which has the additional benefit of assisting in debugging. Implementations for the future In the embedded space, there is no doubt. Dual cores are a new thing! Every time groups of engineers gather together and discuss the topic new ideas are thought of about what is possible out with the bounds of the standard single core world. One idea is to use a standard core architecture but to enhance the safety of the system by use of a second core. Software safety standards are a hot topic at the moment in all areas of the market and this concern over the trustworthiness of software will continue to escalate. A secondary core can do some critical parameter checking and run a similar or the same algorithm as the primary core. This way, critical safety-specific decisions can be checked prior to execution. For example before a motor is started the decision by the MCU to start the motor can be double checked to ensure that the software is operating correctly. Also the secondary core can be used to check other parameters in real time. For example the software stack in RAM can be verified to ensure that it is not reaching overflow conditions. So long as the second core knows how big the stack is permitted to be, it can check it without interruption to standard program flow. Should the stack be getting close to overflow obviously then action could be taken but in most cases this will not be the case and execution can continue as normal. The future will bring surprises: This much is certain and a secondary core can help to alleviate the stress of these surprises. A second core in the system can be used in the embedded space to do “peripheral emulation” or to do “peripheral enhancement”. Examples of this are things like servicing ADC readings. A secondary core can be used to take the readings from an ADC, calculate the average and save the result in an area in RAM where the main CPU can access it. This enhances the functionality of the ADC and creates and much more usable solution. Also new peripherals can be created “from scratch” from software, meaning that a solution with a second core has a significant amount of flexibility to adapt to future requirements without impacting the already deterministic performance of the main controller. In the area of industrial control, many applications include both a real time and a GUI aspect. Such systems obviously lend themselves to an AMP system with one core running a RTOS dedicated to industrial control or motor control, and the second core running a more “GUI friendly” OS such as uCLinux and the likes. Interest in this area is growing fast and it will surely not be long before such systems become standard in the market. The newest dual core solution The latest SuperH device from Renesas is the SH7205. It contains two SH2A superscalar cores, both with integrated Floating Point Units. Each core runs at up to 200MHz with 480DMIPS performance, creating a theoretical performance of nearly 1000DMIPS. The device is aimed at the consumer and industrial applications. As embedded operating systems developers adapt to the new architecture it is predicted that customers will adopt an AMP system using either two real-time operating systems or one real-time OS and a non-real-time version, before SMP ready OSs are available The device is equipped with a USB2.0 (host and function) controller numerous standard communications channels and a simple 2D graphics engine with digital video input. This makes the device suitable for both audio and visualisation applications. On the other side, the device is equipped with motor control timers and ADCs making it also suitable for inverter applications. The combination of these two peripheral-sets in one device means that it can be aimed at the aforementioned industrial control applications which include both a GUIs and a real-time aspect.