In recent years, the embedded processor market has seen a strong focus on multicore product announcements. The majority of these have been based on the symmetric multiprocessing concept, which offers higher processing performance and bandwidth. However, in contrast, systems reliability can by enhanced using dual redundant 'lock-step' architectures executing symmetric code. These offerings have been harnessed for automotive safety-critical applications.

Now joining the ranks of symmetric dual and quad core embedded processors, are new asymmetric offerings; for example, single and dual ARM Cortex-A9 processors, and Cortex-M3 system processors for power management supervisory functions. These products have typically emerged as a means of meeting the increasingly demanding requirements for low standby power management and longer battery life, despite the fact that semiconductor process geometries – now reducing to 40nm and smaller – have been sending leakage power heading in the wrong direction. But this inevitable march of semiconductor process productivity improvement is also a key enabler for these low cost multicore products. Until recently, having a small processor controlling the application processor was realised at the system or board level by the late addition of an 8 or 16bit microcontroller to handle power control, housekeeping and communication duties, while the main processor is in sleep mode or switched off completely. But the inclusion of full 32bit processors at the chip level has allowed designers to take on more of the system level monitoring duties from the main processor or SoC. The next stage in this progression is the inevitable downward scaling of asymmetric architectures heading toward the fast growing 32bit microcontroller space. Asymmetric cores in the MCU market Dual core architectures can be an effective approach to handling more tasks in a tightly constrained real time application environment. A new generation of tiny 32bit cores – such as ARM's 12k gate Cortex-M0 processor – can now be coupled with more powerful 32bit processors that can include enhanced DSP capabilities. The combination of a small control/communication processor with a higher performance processor on a single die provides a complete MCU+DSP solution based around one architecture and development environment. The impact of these dual core MCU's will not just be felt in terms of reduced cost and complexity, but also at the development stage, where the entire application can be visualised at the software and device level and then optimised to take advantage of the multicore environment. The integration stage will now only require retargeting of code between the resources within a single integrated development environment for both MCU and DSP functions. Tiny cores can also be assigned to sub-tasks, specific events and background monitoring tasks such as managing multiple serial communications channels. Alternatively, when the main processor is used in more demanding streaming DSP applications, the secondary processor can now manage foreground tasks directly, including high priority Ethernet and USB network traffic or the toughest system timing. Increased flexibility and system performance from these asymmetric dual core chips can be applied to many industrial applications, ranging from power conversion and lighting solutions to motor control. Realising virtual peripherals Partitioning system functionality will play a key role in the development of asymmetric dual core systems, but another major opportunity in these new implementations is to use 32bit sub processors for traditional 'soft' peripheral programming concepts. Now this firmware no longer burdens the main processor and its key application areas, system designers can realise many 'virtual' peripherals and functions with performance to spare. They can also have much greater confidence in system response times and overall integrity, through access to an ever expanding library of functions proven and tested on the same processor core and architecture used in existing low cost entry level 32bit MCU families. For more demanding real time tasks, NXP has recognised that these implementations need also to be coupled with a new generation of user configurable hardware and peripherals, offering finer grain timing control. Specific applications can be created, using new development tools to convert logic, state and timing requirements into conventional high level C code. These can be downloaded and installed on the sub-system processor domain. Developers can now offer highly differentiated application specific functionality using 32bit sub processors, combined with dedicated hardware and firmware functionality. Entire peripheral arrays and subsystems can now be managed either independently or in tight synchronisation with the main system processor. Examples of such flexible user configured peripherals appearing in various new NXP microcontroller introductions are state configurable timer arrays and intelligent DMA handlers. In addition, configurable serial-parallel I/O cells can be used individually to implement a variety of serial communications protocols. Used as an I/O array, they allow complex pattern-generation and sequencing. Dedicated event handling and prioritisation is used to coordinate all peripheral events and states and to generate I/O events, DMA triggers and CPU interrupts. Complex timer array functionality can be realised, using configurable state machines to link events and states, allowing sophisticated high performance closed-loop hardware I/O sequencing and timing control. Examples of typically demanding peripheral functionality include custom multi-channel serial communications e.g. 7.1 channel i2s audio processing and transmission, as well as new serial interfaces for external peripherals such as quad SPI flash memories. Reducing system power By implementing comprehensive power management with independent power domain control, asymmetric multicore architectures can offer performance, flexibility and great potential for optimised power by offering various stages of system control, even as main processor and sub-processors remain in standby or sleep modes. Autonomous event handling and sub-system functionality can remain in operation at extremely low operating currents, running in low-voltage and low-power domains, while still retaining intelligent response to system requirements and the need for rapid processor intervention when required. Software tools ARM CoreSight debug and trace IP has been optimised for heterogeneous multicore debugging and the benefits of developing for a single architecture will allow much more rapid adoption and standardisation of tools and environments than were possible with mixed processor architectures or MCU + DSP solutions adopted in the past. Once a major barrier to multicore implementation and acceptance, new development tools will now ease product development. Even low cost tools will feature seamless integration for multicore projects, with both separate development flows as well as independent or simultaneous debugging. These new open-source based tools and, more importantly, the large communities of developers that decide to adopt asymmetric approaches and share knowledge, techniques, examples and code will be crucial to the success and spread of cost effective flexible multicore MCUs in the coming years. As process geometries shrink, semiconductor manufacturers are coming up with unique ways to take advantage of the logic and memory densities they provide. One of the most interesting is the asymmetric multicore architecture and NXP is taking full advantage of this approach in next generation general purpose MCUs. A range of system designs will be enabled that were previously only possible with custom asics. This includes lower overall system power, the ability to partition software tasks over multiple processors on one die and the ability to implement customised soft peripherals without burdening the main application. Keep an eye out for multicore coming to an MCU near you. Geoff Lees is the general manager of NXP