How interrupt handling is being offloaded from microcontrollers

6 mins read

Chris Edwards explains how interrupt handling is being offloaded from microcontrollers.

For a number of reasons, including cost, embedded microprocessors have been required to take on tasks for which they are not always well suited. Frequent interruptions from the outside world is one example. These are inevitable in any system expected to have a real time response but, as with people, frequent distractions do not help get the core job done. As the transistor count on embedded microprocessors and microcontrollers steadily increases, some of these distractions are now being offloaded in the hope of the making the whole system more efficient. Like virtualisation, interrupt and I/O offload started on mainframes as a way of improving throughput on the main processor. By relieving of the job of dealing with many small service requests from disk and tape drives and remote terminals, the processor would only have to get involved once packets of data large enough to be worth processing were ready. Interrupts take a lot out of a high speed processor, which is generally heavily pipelined to begin with and, if it is a superscalar design, capable of issuing more than one instruction per cycle. With one pipeline, there could be eight to ten instructions in flight at any one time that either have to be run to completion or annulled and restarted once normal execution resumes. For a two way superscalar, that doubles to 16 to 20. The next job for the processor is to push a bunch of register values onto its stack. Although data will probably go straight to a level one cache, there is still some memory latency to account for. It is possible to reduce the time this takes by pushing only a subset of registers onto the stack, which is what some fast interrupt modes do. But this is not entirely safe as the interrupt service routine (ISR) could easily overwrite vital data. As most RTOS implementations will reschedule tasks at the end of ISRs, there is a good chance that everything will need to go onto the stack anyway, so that the thread's data can be stored properly ready for a task switch. Once the registers have been saved, the processor can finally jump to the memory location that holds the ISR and begin to do something with the peripheral that triggered the interrupt. Once the ISR has completed, and if the RTOS has not scheduled a new task in its place, normal execution can resume. In that time, hundreds or thousands of instructions could have been run, robbing the thread of throughput. Some interrupts to the processor are vital or else the RTOS may never get the opportunity to schedule other tasks to run. But if the processor is taking an interrupt for every byte that turns up on a serial channel, then throughput can drop drastically. The main processor only really needs to know about the incoming serial data when it's time to respond to a command. This is where I/O offloading comes in. A simple microcontroller is generally not heavily pipelined – three or fewer stages are commonly used – neither does it have many registers to save. Its throughput metric is how many interrupts it can deal with before having to involve the main processor. Because of the throughput advantages of splitting the workload in this way, we are beginning to see multicore chips appearing in which a high speed processor is coupled with one or more smaller, cheaper cores. Very often, you do not need a full processor. One of the earliest spinoffs from the mainframe era was the direct memory access (DMA) controller, see fig 2. This is usually implemented as a simple state machine that, given start and destination addresses and a length, will when triggered move data automatically from one place to another. As it saves the main processor from executing a laborious series of back to back reads and writes, it is surprising how even some modern high speed interfaces do not bother to use it. There can be also, on occasion, be baffling omissions from DMA controllers, such as lack of support for interleaved accesses so that once a DMA transfer is initiated the processor has to be halted until it is complete. Hitting the main processor with a lot of interrupts can cause other problems. Interrupt latency is variable. Very often, the processor will disable interrupts while it is running one – the processor cannot afford to take an interrupt while it is in the middle of saving registers for a task or performing RTOS management operations – delaying the point at which the processor can handle subsequent interrupts. For a serial input, this interrupt jitter is unlikely to be a problem as the data should be latched – just as long as the jitter is not so extreme that another byte turns up and overwrites it in the meantime. But for analogue inputs, it could be crucial as the levels are changing all the time. Sampling jitter will increase the amount of noise the software has to contend with when processing the digitised data. It is particularly problematic when parallel channels need to be compared. For this reason, some microcontrollers have acquired sequencing engines that will sample a set of analogue inputs at regular intervals and deposit the data in a circular buffer that can be picked up by the main processor when it is ready to process it. Moore's Law works in our favour here: it provides more transistors that can be used to offload work from the main processor. As the number of available transistors increases, these sequencing engines are becoming more sophisticated. The state configurable timer on NXP's LPC4000, for example, can switch between 32 states as it monitors eight separate inputs (see fig 3). A separate engine on the device performs semi automatic sequencing for serial I/O to reduce the number of interrupts that even the slave ARM Cortex M0 processor core has to take when reading or writing data to the ports on behalf of the host M4 core. Another example is the analogue sequencing engine in some FPGAs made by Microsemi. This programmable state machine takes care of recording data from analogue inputs, offloading responsibility from the chip's main processor core, in this case a Cortex M3. Although performance has traditionally provided the main impetus for I/O offloading, power is becoming a more important consideration. In systems that need to last for years on a single battery charge, such as utility meters, the usual practice is to have the bulk of the microcontroller powered down most of the time, only waking up for brief periods when there is housekeeping to be done. The power consumption largely comes down to how long you can keep the main processor shut down. In advanced process nodes – 0.25µm and smaller – leakage is largely a factor of how fast you need circuits to operate. The higher the clock speed, the higher the transistor threshold voltage needs to be and, in turn, the greater the leakage. Shutting down the core and removing the supply voltage removes leakage as a factor – so long as the core only has to operate for the bare minimum of time. As soon as it has completed a packet of work, it needs to shut down again. By devolving a lot of work to specialised I/O processors – which may be much slower, but which are designed to operate with very low leakage –overall power consumption can be kept down. Keep the processor asleep Not waking up a host processor is especially important if it has large on chip memories, such as caches. When the processor is powered down, the registers and caches need to be flushed to main memory. On wakeup, every access to memory will, at first, demand the loading of entire cache lines before it can get anything done. A simple microcontroller that can store the bare minimum of state in low leakage memory cells is easier to wake up and put to sleep and will generally result in lower power consumption. This is one reason why companies such as Freescale and STMicroelectronics have started to put low end microcontrollers into products that, historically, would have been 'dumb'. Freescale launched a multichip module last year aimed at pocket computers and phones that included a set of sensors, memory and a processor core. The MEMS based sensors would pick up the movements of a phone which might be used to play a game or just sitting in a handbag or pocket being bumped as the user walks along. In one of today's phones, this is not a big issue; if the phone is switched off, it will simply ignore the bumps and bangs. But manufacturers want to give these devices greater situational awareness, perhaps using the motion associated with being stuck in the pocket of walking user to act as a simple pedometer or to disable some of the buttons so the device does not call the person at the top of the frequent callers list to play them the sound of keys jangling. Waking up the host processor to decide what to do every time there is a sudden movement is not a good idea for battery life, so devolving some of the motion processing firmware to a microcontroller with much lower standby power consumption demand makes sense. As I/O offloading becomes more common, designers will have to take more care to ensure the processors synchronise properly, a common problem in multiprocessor design. But it will result in more power efficient system design.