Microsoft Windows CE 3.0 Kernel Services: Multiprocessing and Thread Handling
Summary: The Microsoft® Windows® CE kernel uses the Microsoft Win32® process and thread model, providing embedded operating systems developers with the preemptive multitasking and priority-based thread-handling capabilities of Microsoft Windows NT® and Windows® 2000, optimized for the kernel-level security, efficient memory usage, and greater real-time performance that Windows CE-based platforms require. This white paper provides an overview of the Windows CE process and thread-handling capabilities, as they are implemented in Windows CE 3.0. (9 printed pages)
Windows CE was designed to run on a variety of microprocessor platforms, and in a wide-ranging variety of custom implementations. What these various Windows CE-based platforms have in common are limited resources and the need for a multithreaded, preemptive multitasking operating system. Although Windows CE exposes a subset of the same Win32 application programming interface (API) that is used on Windows-based desktop and server computers, the operating system itself was designed from the ground up, to provide a highly efficient implementation of the Win32 process and thread model. In addition, Windows CE 3.0 supports the following optimizations:
- More thread priority levels
- Simplified priority-inversion handling
- Full support for nested-interrupts
- Greater control over timing and scheduling.
All applications that run on Windows CE consist of a process and one or more threads. A process is a single instance of a running application. Each process starts with a single thread, which is known as the "primary thread," and which provides the resources that are required to run an application. The process creates additional threads as they are needed.
A thread is the basic unit to which the Windows CE operating system allocates processor time. Each thread is responsible for executing code within a process, concurrently with other threads in that process and in other processes.
Preemptive multitasking creates the effect of simultaneously executing multiple threads. When a process has more than one thread running, the operating system switches from one thread to another, so that the threads appear to run simultaneously.
A hardware memory management unit (MMU) protects one running process from another, which allows devices to accept aftermarket customizations and applications without sacrificing reliability.
The MMU also implements a virtual memory model that lets operating system developers use available memory and code space more efficiently. Applications can be executed in read-only memory (ROM), compressed in ROM, and paged into random access memory (RAM), or they can be stored in a file system and paged into RAM at execution time.
Windows CE can run up to 32 processes at one time. A minimum of four, and more typically seven or eight, processes are created when the system starts up. These processes provide kernel, device, and file-system services, and other shell and connectivity services. Each process contains all of the resources required for its execution, including a virtual address space, executable code, and object handles. A process can create other processes, called child processes, which can inherit object handles and other access rights from the parent process.
A process is created with a single thread, often called the primary thread, but can create additional threads from any of its threads. All threads belonging to a process share its virtual address space and system resources. The number of additional threads that a process can create is limited only by the amount of RAM that is available on the device. The kernel itself can address 512 megabytes (MB) of physical memory.
The following illustration shows how memory is allocated in the Windows CE address space.
When a Windows CE-based device starts up, the operating system creates a single, 4-GB virtual address space, which all of the processes share. Part of this address space is divided into 33 "slots" of 32 MB each, one for each process's resources. As each process starts, Windows CE assigns it one of the available 32-MB slots. Slot zero is reserved for the currently running process.
When a process starts, the OS stores all of the dynamic-link libraries (DLLs), the stack, the heap, and the data section in the slot that is assigned to the process. DLLs are loaded at the top of the slot, and they are followed by the stack, the heap, and the executable (.exe) file. The bottom 64 kilobytes (KB) are always left free. DLLs are controlled by the loader, which loads all of the DLLs at the same address for each process.
In addition to assigning a slot for each process, Windows CE creates a stack for the primary thread and a heap for the process. The maximum number of threads that a process can maintain is dependent upon the amount of available memory. You can allocate additional memory outside of the 32 MB that is assigned to each slot, by using memory-mapped files or by calling the VirtualAlloc function to reserve or commit a region in the calling process's virtual address space.
Memory-mapped files are an efficient way for two or more processes to share data, especially when the processes are accessing small amounts of data, frequently, and in no particular order. One process creates a named file-mapping object that reserves a region of address space in virtual memory. Other processes can share this virtual address space by opening a mapping object by the same name. The system must synchronize read and write operations in the shared memory space, to prevent one process from overwriting data that is in use by another process, and from freeing the block while the other process is still using it. By creating a named synchronization object (as described later in the "Event Objects" section), processes can use handles to the shared object to synchronize their activities.
A thread describes a path of execution within a process. Every time the operating system creates a new process, it also creates at least one thread. The purpose of creating a thread is to make use of as much of the CPU's time as possible. For example, in many applications, it is useful to create a separate thread to handle printing tasks so that the user can continue to use the application while it is printing.
Each thread shares all of the process's resources, including its address space. In addition, each thread maintains its own stack for storing variables that are referenced in a function. The actual architecture of the stack is dependent on the microprocessor, but typically the default stack limit is 1MB, with 2 KB reserved for overflow control. Note that in Windows CE you cannot change the stack size when a thread is created. However, you can override the stack limit at compile time.
When a thread is created, Windows CE reserves enough memory space to accommodate the maximum stack size, and allocates one page of memory to the thread. As the stack grows, the system commits virtual pages from the top down. If no memory space is available, the thread needing the stack space is suspended until the request can be granted and the system will attempt to reclaim reserved, but unused, stack memory from another thread.
A thread also contains the state of the CPU registers, known as the "context," and an entry in the execution list of the system scheduler. Each thread in a process operates independently. Unless you make the threads visible to each other, they execute individually and are unaware of the other threads in a process. Threads that share common resources, however, must coordinate their work by using a method of synchronization, as described later in "Thread Synchronization."
Thread local storage (TLS) is the Win32 mechanism that allows multiple threads of a process to store data that is unique for each thread. There are several situations in which you might want a thread to maintain its own data. For example, consider a spreadsheet application that creates a new instance of the same thread each time the user opens a new spreadsheet. The DLL that provides the functions for various spreadsheet operations can use TLS to save data about the current state of each spreadsheet.
TLS uses a TLS array to save thread-specific data. When a process is created, Windows CE allocates a 64-slot array for each running process. When a DLL attaches to a process, it searches for a free slot in the array and assigns the slot to the new thread.
Threads can be in one of the following states: running, suspended, sleeping, blocked, or terminated. Multithreaded applications must avoid two threading problems: deadlocks and race conditions. A deadlock occurs when each thread is waiting for the other to complete an action. A race condition occurs when threads are sharing a value and the final result of the value depends on the scheduling order of the threads.
When all threads are in the blocked state, Windows CE enters the idle mode, which stops the CPU from executing instructions and consuming power. You can create threads that will switch from idle mode into suspend mode. To conserve power, be sure to use synchronization objects to block threads that are waiting, instead of creating a thread that polls for status. The suspended state can be controlled by the OEM and by applications.
Effective with version 3.0, Windows CE offers 256 priority levels, with 0 being the highest priority and 255 being the lowest priority. Levels 255 through 248 are defined for application threads and map to the 8 priority levels that were available in earlier versions of Windows CE.
Many of the higher priority levels (247 through 0) are assigned to real-time applications, drivers, and system processes. In order to prevent random applications from degrading the performance of the system, the OEM can restrict all of the priority levels between 247 and 0 to OEM-specified applications. For more information on the priority levels that are available to applications on your target device, you should consult the OEM.
Windows CE does not support process priority classes, which restrict the thread priorities that can be assigned in a process.
A process starts when the system scheduler gives execution control to one of its threads. The system scheduler determines which threads should run and when they should run. Threads that have a higher priority are scheduled to run first. Threads that have the same priority run in a round-robin fashion; that is, when a thread has stopped running, all of the other threads of the same priority run before the original thread can continue. Threads that have a lower priority do not run until all of the threads that have a higher priority have finished; that is, until they either yield or are blocked. If one thread is running and a thread of higher priority is unblocked, the thread of lower priority is suspended immediately, and the thread of higher priority is scheduled to run.
Threads run for a time slice, or quantum. The OEM can specify a value for the quantum or use the default value of 25 milliseconds. If, after the quantum has expired, the thread has not relinquished its time slice, the thread is suspended and another thread is scheduled to run. However, threads at the highest priority level—THREAD_PRIORITY_TIME_CRITICAL—are exempt from the quantum limit and can continue executing without interruption until they have finished, unless they are interrupted by an Interrupt Service Routine (ISR).
Thread priorities are fixed and do not change. Windows CE does not age priorities, boost foreground thread priorities, or mask interrupts based on these priority levels. However, the kernel temporarily can modify thread priorities to undo a situation that is called priority inversion.
Priority inversion occurs when a high priority thread is blocked from running because a lower priority thread owns a kernel object, such as a mutex or critical section, which the high priority thread needs. For example, consider a Windows CE application that contains three threads. Thread 1 is a high-priority, currently scheduled thread, and it tries to acquire a critical section that is owned by Thread 2. Thread 2 could be in one of two states:
- It could be ready to run.
- It could be blocked already, because it is waiting on a critical section that is owned by Thread 3.
Windows CE 3.0 guarantees the handling of priority inversion only to a depth of one level. In the first case, Thread 1 blocks and Thread 2 is raised to the same priority as Thread 1, so that it can run and then release the critical section object.
In the second case, Thread 1 also blocks until Thread 2 gives up the critical section, but the priority levels of Thread 2 and 3 remain the same and they are scheduled based on a normal scheduling algorithm. Note that this is different from the way that other Windows operating systems handle priority inversion, where Thread 2, Thread 3, and any other thread that was connected in the chain, would all be raised to the same priority as Thread 1.
Whenever possible, avoid priority inversion situations, especially in real-time applications, where exact timing of a thread and control over scheduling are important.
Real-time applications use interrupts to respond to external events in a timely manner. The use of interrupts requires that an OS balance performance against ease of use. Windows CE balances these two factors by splitting interrupt processing into two steps: an interrupt service routine (ISR) and an associated interrupt service thread (IST).
To prevent the loss and delay of high-priority interrupts, Windows CE 3.0 introduces nested interrupts. The following list shows how Windows CE handles nested ISR calls:
- The kernel disables all other interrupt request (IRQ) lines for the same and lower priorities as the IRQ that invoked the current ISR call.
- If a higher-priority IRQ arrives before the current ISR completes processing, the kernel saves the current ISR state.
- The kernel calls the higher-priority ISR to handle the new request.
- When the request is completed, the kernel loads the original ISR and continues processing.
Windows CE can nest as many ISRs as the hardware is designed to support. Keep in mind, however, that each nested ISR imposes a delay on the previous ISR. When the service thread is performing a time-related action, such as an operation that must be completed within a specified interval, interrupts should be disabled until the action is completed.
Synchronization is necessary in order to avoid conflict over shared resources. You can coordinate the activities of threads by using wait functions, synchronization objects, interlocked functions, and messages.
Wait functions block or unblock a thread, based on the state of a single object, or one of several objects. These objects can be synchronization objects, such as an event or mutex object, or handles to processes and threads. Objects reside in either a signaled or nonsignaled state, based on the state of their associated processes or threads. For example, handles are signaled when their processes or threads terminate. If a thread is blocked while waiting on multiple synchronization objects, the wait ends when any one of the objects is signaled. Windows CE 3.0 does not support waiting for all objects to be signaled.
Windows CE supports three interlocked functions: nterlockedIncrement, InterlockedDecrement, and InterlockedExchange. These functions synchronize access to a shared variable by preventing a thread from being preempted while it is executing a complex (non-atomic) operation on the variable. Without such synchronization, a thread could be preempted, for example, after incrementing the value of a shared variable, but before it had checked the value. By the time the thread received its next time slice and was able to check the value, other threads could have changed the value of the variable.
Threads that belong to different processes can also use these functions as long as their variables are in shared memory.
Windows CE 3.0 supports the following synchronization objects:
- Event objects
- Mutex (mutual exclusion) objects
- Critical sections
Windows CE uses event objects to notify a thread when to perform its task or to indicate to other threads or processes that a particular event has occurred. For example, a thread that writes to a buffer can set a designated event object to the signaled state when it has finished writing. Depending on the attributes defined at its creation, the object can either automatically reset itself to the nonsignaled state immediately after being signaled, or it can remain signaled and unavailable, until it is manually reset.
An event object can be either named or unnamed, but only named events can be shared by several processes. A single thread can specify different event objects in several simultaneous overlapped operations, and the process can then wait for the state of any one of the event object to be signaled.
A mutex (mutual exclusion) object is a synchronization object that is used to coordinate exclusive access to a resource, such as a block of memory that is shared across multiple threads. The first thread requesting a mutex object obtains ownership of it. When the thread has finished with the mutex, ownership passes to other threads on a first-come, first-served basis. The object is signaled when it is not owned by a thread, and nonsignaled when it is owned.
A mutex object differs from a critical section object in that both the object and the resource it is protecting are accessible to threads running in different processes.
Critical Section Objects
A critical section is a section of code that requires exclusive access to some set of shared data before it can be executed. A critical section object protects this code by restricting its use to one thread at a time, similar to the way interlocked functions block access to a variable while it is in use.
A thread gains access to the protected code by taking ownership of the critical section object; the object is then in a nonsignaled state. When the thread leaves the critical section it must explicitly relinquish ownership of the object, resetting its state to signaled. The critical section of code cannot be shared with other processes, and only the threads belonging to the single process can own the object.
Event, mutex, and semaphore objects can also be used in a single-process application, but critical section objects provide a slightly faster, more efficient mechanism for mutual-exclusion synchronization. Like a mutex object, a be owned by only one thread at a time, which makes it useful for protecting a shared resource from simultaneous access. There is no guarantee about the order in which threads will obtain ownership of the critical section, however the system will be fair to all threads.
Not all resources require that access be limited to only one thread at a time. Resources that can accommodate simultaneous access by a limited number of threads can use semaphores to control access. A semaphore acts as a turnstile gate, counting the threads as they pass through the gate. When no thread is using the resource, the reference count is at its maximum value, and the state of the object is signaled. Each time a thread accesses the resource, the reference count is decremented and each time a thread releases the resource, the reference count is incremented. When the reference count reaches zero, the state of the semaphore object is nonsignaled and threads requesting the resource must sleep until another thread is finished.
Semaphores can be used across processes or within a single process and multiple processes can hold handles to the same semaphore object. An application can use a semaphore during initialization to block all threads from accessing certain protected resources. When the application finishes initialization, it releases the semaphore to permit normal access to those resources.
Threads rely on messages to initiate processing, control system resources, and communicate with the operating system and user. Windows messages originate from a variety of sources, including the operating system; user activity, such as, keyboard, mouse, or touch-screen actions; and other running processes or threads.
When a message is sent to a thread, that message is placed in a message queue for that thread for eventual handling. Each thread owns a message queue that is completely independent of message queues owned by other threads. Typically a thread will have a message loop that runs continuously, to retrieve and handle messages. When no messages are in the queue and the thread is not engaged in any other activity, the system can suspend the thread, thereby saving CPU resources. Not every thread has a message queue; only a thread that has an associated window.
You can use messages for control purposes, to initiate various types of processing in your application, and to pass data that uses message parameters. For example, a thread might receive a message that indicates that a touch screen has been activated, and the message parameter might indicate the x- and y-coordinates of the action of the user. In another type of message, the parameter might include a pointer or handle to a data structure, window, or other object.
For more information about how the Windows CE 3.0 kernel implements multiprocessing and thread handling, consult the OEM and see the Microsoft Windows CE version 3.0 Software Development Kit (SDK) documentation.