30 August 2011
The 10 best ways to make your device drivers unreliable
The 10 best ways to make your device drivers unreliable
Device drivers, the foundation of many embedded systems, are expected to be robust and stable while also delivering high performance.
However, there are many ways to ensure that your drivers are unreliable, some of which are more subtle and easier to fall foul of than you might expect.
* Don't initialise the device correctly
Devices today are not only complex, but they are also flexible. This means the device driver not only has to configure them to operate in the right mode, but also initialise them to make sure they're in the proper state. This can involve a lot of code, usually written after reading the device's data sheet, and interpreting it correctly.
Failure to do this can mean the device is not properly initialised and likely to behave strangely, if it works at all. Often, the problem is in the code that you did not write, rather than in the code you did. It's surprisingly easy to forget to initialise a critical part of the device, particularly if the data sheet does not spell out all the steps in a logical order or if you are 'just building a prototype' that evolves into production code.
The classic signs of this problem are drivers that work correctly after a power cycle, but not after a processor reset.
* Misconfigure the interrupt controller
Your device driver may need to configure an interrupt controller to respond to an interrupt from the device it is controlling. Getting this wrong can be the source of both obvious and not so obvious problems.
Usually, there are three things to get right when configuring a specific interrupt on an interrupt controller:
• priority (if the controller supports nested interrupts);
• type: whether the interrupt is edge or level triggered;
• polarity: whether the interrupt is asserted on a low level (or low going edge) or a high level( or high going edge).
In most cases, getting these details wrong creates an obvious problem, such as receiving no interrupts, or continuous interrupts. Both of these are not too hard to diagnose and fix. But it doesn't always result in catastrophic failure – your driver might work, but not very well.
One project on which I worked involved diagnosing and fixing an Ethernet interface that was very sluggish and had a very low throughput. Amongst other things, it turned out that the interrupt from the Ethernet controller was configured to be edge triggered, but the controller generated a level triggered interrupt: most of the time the device was interrupting to indicate that it had received packets, but the interrupt controller was not passing the interrupt to the cpu!
* Acknowledge and clear interrupts wrongly
Another potential problem with interrupt handling is to know when and how the interrupt from a device should be acknowledged and cleared; at least so far as it is not done automatically by the hardware. Often, this is done by writing to an interrupt status register, but the details will vary from device to device. In some cases, writing to this register will acknowledge and clear all the device's pending interrupts, which may dictate some careful software design to make sure your driver has handled them all first – otherwise you might end up 'missing' an interrupt. Whatever the mechanics of clearing the interrupt, you will need to think carefully about corner cases: what happens if the device generates an interrupt at about the same time as your driver clears the interrupt(s) it has just handled? Device data sheets can vary enormously in the amount of advice they give in this area: some may give example code which can be very helpful.
* Don't bother using 'volatile' for memory mapped I/O
It's really not optional: if you don't use volatile to tell the compiler that (for example) memory mapped registers can change independently of the code, your driver will not work reliably. Typically, it might be fine while you are debugging it, but once you build with optimisation enabled (or increased), it may stop working altogether or – less conveniently – fail occasionally.
Looking at the assembly code, you will see the compiler may have converted a polling loop into a simple 'if' statement or loaded the value it is testing at each iteration into a register, where it will not change, of course. It may also have reordered a sequence of accesses to the device registers, with possibly unpredictable results.
Using volatile whenever your driver accesses memory mapped devices should be a matter of course – far better to put it in by design rather than retrofit it after tracking down a nasty bug caused by having left it out.
Of course, this is not the end of the story. Even if the compiler does not reorder access to memory mapped devices, you can still get into trouble if you …
* Ignore details of the processor's bus interface
Most high performance processor architectures go to great lengths to decouple their external bus interface from execution of the instruction stream. This is because external bus cycles are normally very slow compared with the time to execute each instruction and the processor would stall if it had to wait for each bus cycle to complete before executing the next instruction.
We normally take it as a 'given' that drivers can access memory mapped registers without worrying about the effects of a data cache (the device is mapped into non cached region of address space). But that's not the only potential problem to contend with. Some processors allow accesses to the external bus to be queued, possibly coalesced, and maybe completed in a different order to that intended by the program. That may be fine for 'normal' data accesses, but it will play havoc with memory mapped I/O.
The key to building a reliable driver in this case is to understand the details of the bus interface and how the instruction stream can be synchronised with external bus cycles. This usually involves use of very processor specific assembly instructions, which – if you are lucky – may be wrapped in C callable form by your development tools. Examples include the 'eieio' and 'sync' instructions for PowerPC processors.
On my first encounter with the MIPS architecture, some 15 years ago, I didn't take into account the cpu write buffer. This allows the cpu to group write operations (stores) to the external memory interface to create a write burst which can occur at some time after the store instructions have apparently completed. This is further complicated by the fact that a read operation (load instruction) after a write can complete before the preceding write. Suffice to say, my first device driver was not very reliable until I inserted calls to flush the write buffer at strategic places in the driver code.
* Ignore cache coherency and alignment when using DMA
Direct memory access (DMA) is often used to transfer data to and from high bandwidth devices, such as network interfaces and USB controllers. While it's a great way to relieve the processor of the burden of transferring data, it does present some traps for the unwary.
The main problem is cache coherency; the contents of the data cache may not always match the contents of main memory. When the processor reads from, then writes to the same location in memory, the write cycle may mean the cached copy of the data is more up to date than the copy in main memory. Since DMA hardware typically has access only to the main memory, if a DMA transfer writes this memory to an I/O device, it would transfer the wrong (stale) data. A similar problem can occur when DMA transfers data from the device to memory: if the cache happens to contain a copy of the data from the same area of memory, it will be stale as the main memory contains the latest value.
Some hardware, including many systems based on Intel processor architecture, includes 'bus snooping', which ensures cache coherency is maintained, even when DMA is used. In systems that don't have bus snooping hardware, the device driver will need to manage the cache coherency itself. This usually means flushing the relevant areas of cache to memory before using DMA to transfer data from memory to the device, and invalidating the cache after using DMA to transfer data from the device into memory.
When you are designing a DMA based driver, understanding the cache architecture and whether or not the hardware automatically maintains cache coherency is a must. If your driver is responsible for this, you also need to take care that the buffers used for DMA transfers are aligned with the cache line boundaries (typically 32byte), otherwise some very tricky and hard to reproduce bugs can result.
On one Pebble Bay project, we were debugging an intermittent problem in a USB transfer from a sensor that provided data at high speed from an isochronous endpoint to the host. Most of the time, this would complete correctly but, on occasions, the transfer would terminate early. We discovered the problem occurred because we had not properly aligned the buffers into which the USB host controller was transferring data: when the driver (correctly) invalidated the cache region after the DMA transfer, it corrupted a driver control variable which happened to be in the same cache line as the first data buffer.
* Don't use the correct data sheet for the device
It sounds pretty obvious: why would you not use the correct data sheet for the device your driver controls? You need to be sure the data sheet you're using as a reference describes the exact version of the device on your hardware. Often, it may be easier to obtain a data sheet for a supposedly software compatible device, but you may miss crucial information if the parts in question don't behave in exactly the same way under certain conditions. Also, be sure you have a copy of the device errata notes, which can save some head scratching if the device does not behave exactly as the data sheet describes.
In one BSP development project, we had a hard time getting the main sdram to work reliably. After double and triple checking our settings for the processor's built in memory controller – which were correct – we belatedly looked at the errata notes: these showed an otherwise undocumented register that had to be set to a specific value so the sdram interface signals were driven correctly. Once we included this within our initialisation sequence, the memory interface worked faultlessly.
* Forget about access serialisation
Device driver code is usually executed in at least two contexts (threads), often more. One of these contexts is normally an interrupt handler, which can pre empt the other contexts. This gives plenty of scope for re entrancy problems, where the code accesses shared resources, such as state variables, device registers and buffer memory.
Serialising access to shared resources is critical to the correct operation of a device driver, otherwise there can be subtle race conditions that can cause failure or data loss. These are often very hard to debug as the problem may only occur infrequently (although arranging to demonstrate your system to an important customer usually does the trick …).
The operating system your driver works with should provide methods for serialising access to shared resources. This could include mutexes to serialise between non interrupt threads and interrupt locks or spin locks to serialise between non interrupt and interrupt threads.
A good driver design will minimise the need for serialisation, but will use these mechanisms where necessary.
* Assume that task level code cannot run while an interrupt is being handled
Assuming that an interrupt service routine prevents any task level code being executed can often simplify code that uses shared resources, such as device registers and/or state variables.
While this may be true for single processor systems, it may not be true on multicore systems; this is something to be wary of if you are porting an existing driver onto a multicore system, where it may cause race conditions making the driver unreliable.
The right approach in this case is to use spin locks (or other mechanisms provided by the operating system) to serialise interrupt and task level code.
* Use empty loops for short delays
Back to basics here! It is so tempting to use empty loops when a delay of a few microseconds is needed, but they can be the source of many problems.
For example, a reasonably good compiler may remove them altogether when optimisation is switched on (unless you pepper the code with 'volatile' keywords). More worryingly, the time it takes to execute the loop is very hard to predict accurately from first principles and may vary enormously, depending on obvious things, such as the processor's clock speed, less obvious things, such as the memory bus width and speed, and fairly subtle and obscure things, such as code and data alignment and interactions with the cache.
Even if your loop gives the right delay on the current hardware design and software build, it is an accident waiting to happen if any part of the design changes.
A more reliable approach is to use free running counter/timer hardware to generate short delays. In that case, the delay will depend only on the counter clock rate and the number of 'ticks' you choose to wait for, so it can be predicted accurately.
One case where software timing loops may be acceptable is for quick diagnosis of (and work arounds for) race conditions while debugging – so long as they get removed once a proper solution is found.
How your embedded system interacts with the real world will depend on the quality of the device drivers. It is fair to say that writing device drivers takes a clear head, a thorough understanding of the hardware and a structured approach.
Hopefully, these tips will help you to avoid the obvious pitfalls and to understand the source of problems, should they occur.
Ian Willats is managing director of Pebble Bay Consulting.