Why do multicore systems make it harder to find and diagnose bugs?

4 mins read

Concurrency takes on a new dimension in multicore platforms, since true parallelism comes into play and communication between threads is often achieved using shared memory. Writing a correct concurrent program is notoriously difficult and the advent of multicore architectures makes it significantly harder again due to their added complexity.

Concurrency bugs – which include race conditions, deadlocks, livelocks and resource starvation – are difficult to find when they manifest and even more difficult to diagnose. They require a new approach to verification that specifically addresses concurrency errors. The most effective way to reduce the risk of these bugs is to take a multifaceted approach that includes peer code reviews, testing and – most of all –advanced static analysis that incorporates sophisticated models for concurrency (see fig 1). Programming language support Since most embedded developers are relatively new to multicore programming, the risk of introducing concurrency bugs is significant. Today, C and C++ are still the most popular programming languages for embedded systems. However, one of their fundamental weaknesses is that they were not designed for concurrency. The most recent versions, C11 and C++11 introduced standardised support for multithreading. Three features were added to address concurrency: a memory model that defines the behaviour of multithreaded programs; atomic data types that can be accessed safely by concurrent threads; and several synchronisation primitives, such as locks and condition variables. Notwithstanding these improvements, the languages retain many of the core features of their ancestors that make writing multithreaded programs very hazardous. At the same time, Java is increasingly popular with embedded developers. With 28% using it today, it is now the third most popular language for embedded systems. In contrast to C and C++, Java has always had built in support for multithreading within the programming language syntax, source compilers, and standard libraries. Additionally, Java 5 added the java.util.concurrent library, extended in Java 6 and Java 7 to provide extensive support for concurrent and parallel programming. Many embedded designs use a combination of C or C++ and Java. For example, Java is very popular for automotive applications because it offers an easy way to program a user interface for a touch screen display or an entertainment system. Such applications may have many layers with safety critical code written in C, communicating with non safety critical Java code running on a user interface. Static analysis tools Perhaps the biggest challenge with concurrent programs for multicore platforms is that no amount of testing can guarantee to find all concurrency bugs. It is the relative order in which instructions are executed in real time that is the main source of defects in multithreaded programs. As multiple threads run, the relative order in which their instructions are executed varies, depending on what other threads are active at the same time. If bugs are introduced through programming errors, non deterministic interleaving can lead to unpredictable results. The number of possible interleavings increases enormously as the number of instructions grows: a phenomenon known as combinatorial explosion. Even the smallest threads have many possible interleavings. Real world concurrent programs have astronomical numbers of legal interleavings, so testing every interleaving is infeasible. Likewise, it is impossible to explore every potential execution path using peer code reviews or walkthroughs. This is where advanced static analysis tools excel. They use symbolic execution engines to identify potential problems in a program without actually having to run the program. Working much like compilers, they take source code as input, parse it and convert it to an Intermediate Representation (IR). Whereas a compiler would use the IR to generate object code, static analysis tools retain the IR, also called the model. Checkers perform analysis on the code to find common defects and violation of policies, for example, by traversing or querying the model, looking for particular properties or patterns that indicate defects. Sophisticated symbolic execution techniques explore paths through a control-flow graph – a data structure representing the order in which statements are executed during a program's execution. Algorithms keep track of the abstract state of the program and know how to use that state to exclude consideration of infeasible paths. The depth of the model determines the effectiveness of the tool. That depth is based on how much knowledge of program behaviour is built in, how much of the program it can take into account at once and how accurately it reflects actual program behaviour. Many developers take advantage of popular open source tools to find bugs in Java, including FindBugs, PMD and CheckStyle. FindBugs uses static analysis to identify hundreds of different types of potential errors in Java programs. It operates on Java bytecode, the form of instructions that the Java virtual machine executes. PMD and CheckStyle check source code for adherence to coding standards and detect bad practices. Each tool has its strengths. An important advantage of static analysis tools, in general, is they can be used early in development to find bugs, even before testing begins. Most of the static analysis tools available for Java are general purpose and catch a range of surface level problems. In comparison to the open source tools mentioned, there are commercial products tailored for very precise identification of concurrency problems in Java, C or C++. These tools incorporate very deep models that enable them to find concurrency problems that are often missed by other tools. Some of the most effective of these advanced static analysis tools are based on cutting edge academic research into software concurrency behaviour. They offer advanced static analysis of C and C++ source code with whole program interprocedural analysis and can typically handle programs with up to 10million lines of code. In addition to finding race conditions and deadlocks (see fig 2), one of the commercial tools for Java identifies unpredictable results caused by incorrect use of the concurrent collection libraries provided by java.util.concurrent. It detects bad error handling or incorrect synchronisation when coordinating access to shared non concurrent collections. Also, it can help diagnose performance bottlenecks caused by incorrect API usage, redundant synchronisation, and unnecessary use of a shared mutable state. Since many projects will include Java and C or C++, teams will find it easier and more productive to work with tools in an integrated development environment. There are tool suites that can be used for both embedded and hosted platforms. Commercial versions offer automated work flow and powerful tools for program analysis, program inspection, program understanding, and architecture visualisation. Using an integrated development environment with targeted advanced static analysis tools (see fig 3) enables developers to discover the underlying design intentions of existing concurrent code, and recognise when new code deviates from this design. It provides early warning when new concurrency defects are first introduced and uses cutting edge technologies to help developers identify and understand them. Developing embedded applications for multicore platforms requires a new approach. Static analysis offers the only feasible means to explore all possible code paths for software errors in highly concurrent systems. When used in conjunction with other code quality practices such as code reviews and integration testing, advanced static analysis tools can significantly reduce the risk of field failures due to undiscovered concurrency bugs. Paul Anderson is GrammaTech's vice president of engineering.