The right priorities

David N. Kleidermacher
Vice President of Engineering
Green Hills Software Inc.
Santa Barbara, Calif.

Whether an embedded system is a handheld Internet device, a telephone switch, or a complex aircraft control setup, the real-time operating system (RTOS) running it must guarantee uninterrupted service. Fault tolerance and high availability are key metrics. Safety-critical embedded systems in aircraft, for example, once executed functions having different levels of criticality on separate, dedicated processors to ensure uptime. But the advent of more capable processors - along with pressure to slash maintenance costs, device sizes, weight, and power use - has sparked a demand for RTOSs able to run multiple programs with different safety levels, all on a single processor.

Consider a processor that runs both an in-flight entertainment system and critical flight controls. For the applications to safely coexist, the RTOS must guarantee that a failure in the entertainment program cannot, under any circumstances, disrupt the operation of the flight controls. Memory management units (MMUs) let an RTOS run applications each in their own protected address spaces, though modern commercial RTOSs typically turn off the feature when they boot. As a result, any application has direct access to the code and data of both the kernel and other system applications, a potentially dangerous situation. One that comes to mind: a single errant memory address pointer in the in-flight entertainment program could trash the flight control program or corrupt the kernel and take down the entire system. Use of hardware MMUs prevents code running in one address space from accessing the memory of another.

Access control

RTOSs use two basic types of access control for memory, mandatory and discretionary. A Unix file, for example, uses discretionary access control (in a process or thread) to modify permissions on a file so it can be used by another system process. Discretionary access controls work only as well as the applications using them, and many applications tend to have bugs. But mandatory access control should be part of an RTOS that handles critical-system objects.

Consider an aircraft sensor with its access controlled by a flight-control program. The system designer statically provides the flight-control program access to this device. No other system application can dynamically request and obtain access. Likewise, the flight-control program cannot dynamically provide access to any other system application. The kernel makes access control mandatory so it can't be side-stepped by application code.

Space and time domains

Memory protection is just one requirement of RTOSs for safety-critical systems. Malicious or careless execution of another application cannot run a critical system out of memory. Most RTOSs hold memory in a central store for task control blocks and other kernel objects. For instance, when a task running in the in-flight entertainment application creates a new task or another kernel object, the kernel carves off a chunk of memory from the central store to hold the data for this object. Ditto when a task running in the flight-control program creates a kernel object. But a faulty in-flight entertainment program could create too many kernel objects and exhaust the central store, starving the flight control program of memory and causing it to fail.

One fix uses an RTOS-based memory quota system to statically allocate physical memory to each address space. In the aircraft control example, the in-flight entertainment system gets 128 kbytes and the flight-control program, 196 kbytes. A failure of the in-flight entertainment system may exhaust its own memory but leave that of the flight-control program untouched.

Operating systems employ a scheduler program to coordinate use of shared resources. Most RTOS task schedulers are of the priority-based, preemptive type. Here the highest priority task always gets to run. Multiple tasks at the highest priority level share runtime (time slicing). Unfortunately these simple scheduling schemes can't guarantee runtime for critical tasks.

Consider a system with two tasks at the same priority level. Task A is a noncritical background task while Task B is considered critical and needs 40% runtime to do its job. A typical scheduler gives both tasks 50% of the runtime. Suppose Task A spawns a new task at the same priority level of the other two tasks. Task B now gets 33% of the runtime and cannot do its job properly. And if the code in Task A has a bug or virus, it may spawn dozens or even hundreds of "confederate" tasks, causing Task B to get a tiny fraction of runtime.

One way around the problem is to inform the scheduler of a task's maximum "weight" within the priority level. Then a task spawning another equal-priority task relinquishes part of its own weight to the new task. Assume Task A has 60% of the runtime and Task B, 40%. When Task A spawns a third task, it must give some of its runtime to the new task while Task B still gets 40%.

However, task schedulers know nothing of the application or address space in which the tasks reside. Suppose that Task B normally gets all the runtime it needs by making itself a higher priority than Task A or any other task in the in-flight entertainment application. A bug, poor design, or improper testing of Task B can lower its own priority, causing Task A to gain control of processor runtime. Similarly, Task A may raise its priority above that of Task B with the same effect.

In contrast, an address-space-level (partition) scheduler guarantees tasks having different criticality address spaces that cannot mix. Each address space has one or more windows of execution within a repeating timeline and only those tasks within the currently active address space are runnable. When there are no runnable tasks within the active partition, the partition scheduler runs background tasks in its own partition. A background task may be a low-priority diagnostic agent that runs occasionally but has no hard, real-time requirements.

Schedulability

So-called Rate Monotonic Analysis (RMA) lets designers analyze and predict system timing so RTOSs can meet hard, real-time deadlines. But for RMA to work, a designer must know how long it takes to execute specific code and any associated overhead. Overhead typically includes context switching time, the time required to execute kernel service calls, and the overhead of interrupts and interrupt handlers firing and executing. Lower context switching time implies lower overhead, more efficient use of available processing resources, and an increased likelihood of meeting deadlines.

Some interrupts are of a higher priority than others and require faster response times. An interrupt signaling the kernel to read a critical flight control sensor should be handled as soon as possible. On the other hand, a typical interrupt frequency for a scheduler timer may be 60 Hz to allow for equal-priority tasks to time slice. Most RTOS kernels disable interrupts when manipulating internal data structures during service calls. This is done so the timer scheduler interrupt cannot fire. The interrupt could trigger another task, execute a related service call, improperly access the current data structure, and boost latency for the highest-priority interrupt.

A better approach is to never disable interrupts in kernel service calls. Instead the handling of scheduler interrupts gets postponed until the kernel service call completes. This strategy depends on short kernel service calls. Longer calls must be restartable so the scheduling of events can preempt the completion of the service call. The time to get back to the scheduler may vary by a few instructions but will always be short and bounded. A kernel with such preemptible service calls always handles the highest-priority interrupt with the absolute minimal latency.

Unfortunately, the design of most RTOSs makes this approach impossible. This is because a task primarily spends time executing code but also sends and receives messages. Message transfer times will obviously vary with message size, but the RTOS should know whether transfer times are attributed to a sending task, a receiving task, or both. In fact, the kernel scheduler should treat all activities as prioritized units of execution to prevent priority inversion.

Priority inversion thwarts RMA because it relies on higher-priority tasks running before lower-priority tasks. Priority inversion happens when a high-priority task can't run because a resource, such as a mutex, it tries to get is owned by a low-priority task. The low-priority task in this case can't run and release the mutex because a medium-priority task is concurrently runnable.

A so-called priority inheritance mutex fixes the problem. Here, a high-priority task trying to take a mutex already owned by a low-priority task, signals the kernel to elevate the low-priority task to high priority so that it (the low-priority task) can execute and release the mutex. The priority reduces to normal after the mutex is released and the high-priority task runs again. In this example, the time the low-priority task holds the mutex adds to the overhead of the high-priority task. This priority elevation stops the running of a medium-priority task, which prevents inversion. The bad news is the priority inheritance scheme does not prevent chained blocking.

Suppose that a medium-priority task attempts to take a mutex owned by a low-priority task, but while the low-priority task's priority is elevated by priority inheritance, a high-priority task becomes runnable and attempts to take another mutex already owned by the medium-priority task. The high-priority task must now wait for both the low and medium-priority tasks to complete their critical sections before being able to run again. The chain of blocking critical sections can extend to include the critical sections of any tasks that might access the same mutex. System designers typically deal with the problem by computing the worst-case overhead, which results in a less efficient system and the likelihood of more missed deadlines.

Fortunately, so-called highest locker (HL) semaphores stop priority inversion and chained blocking. A HL semaphore has a preassigned priority equal to that of the highest-priority task that might try to obtain it. When a task takes an HL semaphore, it immediately elevates to the priority of the semaphore. Releasing the semaphore reverts the task to its original priority. Because of this priority elevation, no other tasks that might contend for the same semaphore can run until the semaphore is released. Any task that obtains an HL semaphore always executes its critical section to completion - no task is blocked waiting for an HL semaphore. Ideally an RTOS should provide both options because of the different semantics between HL and traditional priority-inheritance semaphores.

Green Hills Software's Integrity-178B real-time operating system runs the Avionics Multi-Function Display system in Sikorski S-92 helicopters. The system displays and manages primary flight data, navigation data, a digital map, weather radar, terrain information, and engine instruments. The system meets strict FAA Level A guidelines.

Task A is non-critical task and Task B is a critical task that needs 40% of runtime. Both tasks are at the same priority level. Conventional task schedulers can starve critical tasks when a non-critical task spawns a new task. Task schedulers that "weight" tasks within a priority level get around the problem.

Will it fly?
Certification says when
FAA Standard RTCA/DO-178B provides guidelines for developing software for airborne systems and equipment. DO-178B certification has five Levels (A through E), Level A being the most stringent. A failure in Level A software would result in a catastrophic failure condition which would prevent continued safe flight and landing of the aircraft. To pass muster, a run-time system must be certifiable to a level of criticality as high or higher than that of any program running on the processor.

Priority inversion happens when a high-priority task can't run because a resource, such as a mutex, it tries to get is owned by a low-priority task. The low-priority task in this case can't run and release the mutex because a medium-priority task is concurrently runnable.

Common operating system terms
Task -A lightweight unit of program execution. Also commonly referred to as a thread.
Address space - A heavyweight unit consisting primarily of a memory space where one or more tasks execute. A Unix operating system refers to this as a process.
Kernel - The portion of an RTOS that provides core-system services such as scheduling, task synchronization, interprocess communication, and device handling. Also referred to as the executive.
Semaphore - A hardware or software flag that indicates the status of a common resource.
Mutex - A collection of techniques for preventing conflicts among shared resources.