24 September 2013
Cutting overhead through preemption threshold scheduling
Real time embedded systems typically use a collection of application tasks or threads that must complete their work within real time constraints.
Real time systems generally employ preemptive scheduling to guarantee that critical threads get immediate attention and meet their deadlines.
The downside? Preemptive scheduling can bring significant context switch overhead.
There's another way – Preemption Threshold Scheduling (PTS) reduces preemption overhead, whilst enabling applications to meet real time deadlines.
Typical preemptive scheduling
When one thread is executing and a higher priority thread becomes READY to run, the rtos interrupts – or preempts – the running thread and replaces it with the higher priority thread. In this process – termed a 'context switch' –the rtos saves the context of the executing thread on its stack, retrieves the context of the new thread from the stack and loads it into the cpu registers and program counter. The context switch operation is reasonably complex, and can take from 50 to 500 cycles, depending on the rtos and the processor.
Generally, real time schedulers are preemptive – they make sure the highest priority thread that is ready to run is the one they let run; the others must wait. With preemption, a running thread is stopped so that something else can run. In preemptive scheduling, the rtos always runs the highest priority thread that is READY to run. Preemptive scheduling is commonly found in real time systems and rtoses because it provides the fastest response when a thread must run immediately upon the occurrence of an external event or before a particular deadline. While responsiveness is maximised, overhead is high, since a context switch is always required.
Preemption brings some potential problems, which the developer must avoid or handle properly, including:
• Thread starvation. A thread may never get to execute if a higher priority thread never finishes. Developers should avoid conditions where a high priority thread might end up in an endless loop or where the thread consumes an undesirable amount of cpu time, preventing other threads from getting access.
• Excessive overhead. In situations with lots of context switching, overhead can add up. This article outlines some tools that enable this overhead to be seen and measured.
• Priority inversion. Priority inversion happens when a high priority thread is waiting for a shared resource, but the resource is held by a low priority thread which cannot get time to finish its use of the resource due to preemption by a thread of intermediate priority.
Preemption threshold scheduling
PTS was developed to combat excessive overhead and unnecessary context switching. It also can be used to eliminate some instances of priority inversion.
Normally, any thread with a priority higher than the running thread can preempt it. But with PTS, a running thread can only be preempted if the preempting thread has a priority higher than the running thread's Preemption Threshold value, not its priority.
For example, consider a thread of priority 20, which would normally be preemptable by a thread of priority 19, 18, 17, 16, and so on (see fig 1). If its Preemption Threshold were set to 15, for example, only threads higher in priority than 15 (lower in number) could preempt it. So, threads in between — at priority 19, 18, 17, 16, and 15 — cannot preempt, but threads at priority 14 and higher (lower number) can.
The Preemption Threshold is optional and can be specified for any thread, all threads or no threads. If not specified, effectively, the Preemption Threshold of a thread is the same as its priority. But with PTS, a thread can prevent preemption by higher priority threads, up to some limit (the Preemption Threshold value), above which preemption will be permitted.
PTS performance benefits
To illustrate the performance benefits that can be achieved using PTS, consider a simple producer-consumer application, with one thread sending messages (thread_D) and three threads retrieving them from message queues (thread_A, thread_B, and thread_C).
To compare approaches, we'll consider two cases:
• Case 1 uses the fully-preemptive approach, with priorities 1, 2, 3, and 4 assigned to threads A, B, C, and D, respectively.
• In Case 2, we introduce Preemption Threshold to see how that can be used to reduce context switches. To do so, we assign thread_D a Preemption-Threshold of 1, meaning that it can only be preempted by a thread with priority higher than 1. In this system, no thread has priority higher than 1 (=0), hence, thread_D will not be preempted by thread A, B, or C
In Case 1 (see fig 2), Thread_D begins to send its messages to each queue, but as soon as it sends the first message, Thread_A jumps in to retrieve it. Why? Because Thread_A is higher in priority than Thread_D and Thread_A is ready to run, since the queue it was pending on now is non-empty. Once Thread_A reads its message, it again pends on an empty queue and Thread_D is resumed. Thread_D now sends a message to Thread_B, which immediately preempts Thread_D and so on through all nine messages. This completes a cycle in which nine messages were sent, nine retrieved and 18 context switches were recorded.
In Case 2 (see fig 2), Thread_D is not interrupted while it sends its messages because its preemption threshold value precludes this. Thread_D keeps sending messages until it completes its loop, at which time it relinquishes to the next highest priority thread. Note that once Thread_D relinquishes its use of the cpu, it will not resume until threads A, B, and C all are blocked, since Thread_D has priority=4, and thus cannot preempt any other thread.
The efficiency of Case 2 is significantly different from Case 1. Rather than seeing 18 context switches, we see only four context switches. Comparing context switches, we have Case 1 with 18 and Case 2 with four.
By selecting one complete cycle, as shown in fig 3, we can count the number of timer ticks in that cycle. We can see that Case 1 takes 1801 ticks, while the cycle for Case 2 is 964 ticks.
In this application, we see a significant improvement in performance and throughput using PTS, as compared with the fully preemptive scheduling used in Case 1. Context switches have been reduced by more than 75% and throughput has been increased by more than 45%.
We've seen that while a fully preemptive scheduler delivers maximum responsiveness, it can also introduce significant overhead that reduces system efficiency. PTS, on the other hand, can reduce context switches and enable increased performance.
John Carbone is vp of marketing with Express Logic.