Code injection: a common vulnerability

4 mins read

As the Internet of Things develops, embedded devices are being deployed in environments where attackers can take advantage of source code level security vulnerabilities. Embedded software developers should, therefore, understand the different kinds of security vulnerabilities – and code injection in particular.

The term code injection means a regular data input to a program can be crafted to contain code, and the program can be tricked into executing that code. A code-injection defect means hackers can hijack an existing process and execute whatever code they like with the same privileges as the original process.

In many embedded systems, processes need to run with the highest privileges available, so a successful code injection attack can give complete control over the machine, along with the ability to steal data, cause the device to malfunction, recruit it as a member of a botnet or render it permanently inoperable.

The key aspects of a code injection vulnerability are:

  • the program reads data from an input channel and
  • the program treats the data as code and interprets it

In most cases, it is unusual for a program to execute data as code deliberately, but it is common for data to be used to construct an object that is executed intentionally.

Format string vulnerabilities
Most C programmers are familiar with printf functions. Roughly, these take a format string followed by a list of other arguments, and that format string is interpreted as a set of instructions for rendering the remaining arguments as strings. Most users know how to write the most commonly used format specifiers – such as those for strings, decimals and floats – %s, %d, %f – but not many know there are other format string directives that can be abused.

Here’s one way in which the printf function is commonly misused. Unfortunately, some programmers are in the habit of printing strings as follows:

printf(str);

Although this will have the desired effect most of the time, it is wrong because that first argument to printf will be interpreted as a format string. So, if str contains any format specifiers, they will be interpreted as such. For example, if str contains ‘%d’, it will interpret the next value in the argument list to printf as an integer and convert it to a string. In this case. there are no more arguments, but the implementation cannot know that; all it knows is that some number of arguments to the function have been pushed on the stack.

Because there is no mechanism in the C runtime that it can use to know that there are no more arguments, printf will simply pick the next item that happens to be on the stack, interpret that as an integer and print it. It is easy to see that this can be used to print an arbitrary amount of information from the stack. If str contained ‘%d %d %d %d’, for example, then the values of the next four words on the stack would be printed.

This is a code injection security vulnerability in its own right, but one might be forgiven for concluding the only damage that can be done is that it can be used to get access to data on the stack. This can be bad if sensitive data, such as a password or a certificate key, is located there, but it can turn out to be worse because an attacker can also write to arbitrary memory addresses.

The format specifier that makes this possible is ‘%n’. Normally, the corresponding argument is a pointer to an integer. As the format string is being interpreted to build up the result string, when ‘%n’ is seen, the number of bytes written so far is placed in the memory location indicated by this pointer. For example, after the printf below has completed, the value in i will be 4:

printf("1234%n", &i);

If there are fewer actual arguments to the function than there are format specifiers, printf will interpret whatever is on the stack as the arguments. So if the attacker can control the format string, they can write essentially arbitrary values to stack locations. Because the stack is where local variables are located, their values can be changed. If some of those variables are pointers, this gives a platform to reach other non stack addresses.

The really juicy targets are those giving the attacker control over program execution. If one local variable is a function pointer, then subsequent calls through that pointer can be to code of the attacker’s choice. The attack could also overwrite the address of the instruction to where control will be transferred when the function returns.

Avoiding code injection
The best way to avoid code injection is by design. If you can use a language where such vulnerabilities can never show up, that is best because your code is then immune by construction. Or you could design code to prohibit interfaces that may lead to these kinds of issues. Unfortunately, in embedded systems, these choices are not always feasible. Even though C is a hazardous language and riddled with vulnerabilities, it remains the language of choice for many organisations. Given that, developers should be aware of other methods of avoidance.

Dr Paul Anderson


The two golden rules that should be followed to prevent code injection vulnerabilities are:

  • Don’t interpret data as code if you can avoid it, and
  • If you can’t avoid it, make sure you validate that the data is well formed before using it.

To avoid the format string vulnerability, the first of these rules is most appropriate; you can write the code as follows:

printf("%s", str);

This way, the contents of str are only treated as data. This is a ‘no-brainer’, so long as you can find all the places where that change should be made. This can be tricky for large programs and especially so with libraries of third-party code.

Testing for vulnerabilities
Testing for these kinds of vulnerabilities can be difficult; even tests that achieve very high code coverage can fail to trigger these problems. When testing for security vulnerabilities, the tester must adopt the mindset of a determined attacker. Techniques like fuzz testing can be useful, but that technique is typically too random to be highly reliable.

Static analysis can be effective at finding code injection vulnerabilities. Note that early generation static analysis tools – such as lint and its immediate descendants – are weak at finding these because a whole-program path-sensitive analysis is needed in order to be precise.

Advanced static analysis tools that have emerged recently are more effective – tool vendors have accumulated experience of which interfaces are hazardous and a knowledge base of what to look for and how to do so effectively.

The key technique used here is taint analysis, or hazardous information flow analysis. These tools work by first identifying sources of potentially risky data and tracking how that information flows through the code to locations where it is being used without having been validated. The best tools allow you to also visualise the flow.

Conclusions
Code injection vulnerabilities are dangerous security issues because they can allow an attacker to disrupt the program and sometimes take complete control. Developers who care about making sure their embedded code is secure to use in a potentially hostile networked environment should try hard to eliminate these early in the development cycle and stringent code inspection and the use of advanced static analysis tools is recommended.

Author profile
Dr Paul Anderson is VP of engineering with GrammaTech.