This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

MSDN Magazine

Windows CE 3.0: Enhanced Real-Time Features Provide Sophisticated Thread Handling

Paul Yao
This article assumes you�re familiar with Windows CE
Level of Difficulty     1   2   3 
Download the code for this article: Yao1100.exe (51KB)
Browse the code for this article at Code Center: WinCE Thread Handling
SUMMARYWindows CE is a small, configurable, feature-rich, real-time operating system. In Windows CE 3.0, the real-time support has been improved. This article looks at specific support for the creation of real-time systems and how it compares to the support in Windows for the desktop. The way interrupt handlers, processes, memory management, and synchronization work in Windows CE 3.0 is discussed. An extensive look at threads and thread priority, misconceptions surrounding them, and their impact on performance is included. Refinements to the Windows CE scheduler and support for nestable interrupts are also covered.
M icrosoft® created Windows® CE as a small, configurable, well-connected, real-time operating system (RTOS). When you consider the built-in featuresâ€"a rich multithreaded programming model, sophisticated memory management, a large set of device drivers, and a set of refined development and debugging toolsâ€"you realize that developers of nontrivial real-time systems must recreate in other real-time operating systems features that come with Windows CE. What Windows CE provides, then, is a small, yet feature-rich, operating system.
      This sounds paradoxical. How can an operating system be both small and rich in features? The answer is that Windows CE is highly configurable. You might only need the level of support provided by MINKERN, a 350KB build that has no graphics, no windows, and no user interfaceâ€"just the basic kernel services and some network support. But with that kernel you get support for multiple processes, multithreaded programming, dynamic link libraries, and virtual memory management, as well as an incredible array of I/O device drivers and a rich set of Win32® synchronization objects. And if MINKERN doesn't provide all that you need, that's no problem at all. Microsoft provides a broad range of reference builds (such as MINSHELL, MININPUT, and MAXALL) to use as a starting point, all of which can be supplemented with any of the hundreds of components from the Windows CE Platform Builder, or from a system integrator specializing in Windows CE.
      The recently released Windows CE 3.0 improves on the real-time support of previous versions. It adds support for nestable interrupts, more thread priority levels, and refinements to the Windows CE scheduler. Developers now have total control over system scheduling, with a wide range of features that help support the creation of real-time systems.
      Of course, the mere presence of great real-time support in the operating system doesn't guarantee that every system will have specific real-time responsiveness, just as having a powerful engine in a car doesn't by itself guarantee that it will drive like an Indy 500 race car. Great real-time support in an operating system is necessary, but not sufficient for building a system that provides adequate real-time responsiveness. You also need a CPU that is fast enough to carry the load, device drivers that behave properly, and application software that uses high-priority threads dedicated to doing the time-critical work. And even then, your development team needs to clearly define its real-time goals and measure against these goals early and often.
      This article provides a developer's-eye view of the real-time support in Windows CE 3.0. For readers who are new to real-time, I'll start with a description of key terms related to real-time programming. For readers new to Windows CE, I'll discuss Windows CE-specific features that support the creation of real-time systems. As you'll see, Windows CE has features in common with desktop systems, but with its own twist. I'll conclude with detailed coverage of the Windows CE 3.0 enhancements that Microsoft provides for building real-time systems.
      Let's start with a look at some terms.

Fundamental Terms

      A real-time system is a computer systemâ€"including hardware, operating system, device drivers, and application softwareâ€"that must be able to handle a specific set of activities within a specified timeframe. Often, the timing considerations are driven by the needs of some outside device. Such a device might be an external sensor that generates 10,000 interrupts per second. When an interrupt is triggered, the software might need to read and store a measurement of, say, the speed at which a conveyor belt is moving. Software in such a scenario might play a more active role in the operation of the conveyor belt, perhaps sending a signal to a motor controller to increase or decrease the speed when the current value is out of some predefined range.
      A distinction is sometimes made between hard real-time systems and soft real-time systems. A hard real-time system doesn't tolerate missed deadlines. Missing even a single deadline results in a catastrophe. Consider a high-speed, high-volume color printer that might be used to print the paper edition of a magazine like the one you are reading now. A wide roll of paper flies past at very high speeds, and four separate imagesâ€"one each for black, cyan, magenta, and yellowâ€"must be positioned and imprinted with very precise timing. Failure to place each one exactly right results in blurry images and upset readers.
      A soft real-time system tolerates some amount of failure. In a data collection system, for example, recovery from missing data points can sometimes be extrapolated from known values. One example of a soft real-time system is a multimedia application in which there is a relatively high tolerance for missing deadlines related to decompressing sound or video. Such failures might result in temporary degradation of output quality, but the presentation itself remains largely intact. (Of course, it's also easy to imagine a multimedia system that has hard real-time constraints, which certainly describes what you expect from a digital camcorderâ€"no missed deadlines are tolerated when you are recording that once-in-a-lifetime baptism or bar mitzvah.)
      A serious real-time system implementor uses the terms milliseconds and microseconds to define specific time requirements. A millisecond (abbreviated ms) is 1/1000 (0.001) of a second. Examples of where programmers encounter milliseconds in the Win32 API include the Sleep and WaitForSingleObject functions. With the Sleep function, it is the amount of time a thread should yield to the scheduler. The WaitForSingleObject function uses its millisecond parameter to specify a timeout for waiting on a blocked synchronization object.
      A microsecond (abbreviated as either us or ms) represents one-millionth (1/1,000,000 or 0.000001) of a second. In the past, dedicated hardware was needed to support real-time requirements that require microsecond sensitivity. As CPUs have gotten faster, microsecond-level real-time requirements can be met with a heavier reliance on software than was possible in the past.
      Another important term is interrupt latency, the delay between a hardware interrupt and the start of the actual processing of the interrupt. For some, this is a key benchmark since this sets a limit on the responsiveness of a device to outside events. A core goal for Windows CE 3.0 is to provide an interrupt latency response time of 50 ms on Windows CE running on a 166 MHz Pentium. While this goal has been met, this number is probably not very meaningful unless you happen to want to use the exact hardware configuration that was used in the testing. Otherwise, it's best if you do your own testing on your own hardware. You'll find details of the performance testing support that Microsoft provides in the Windows CE 3.0 Platform Builder. Trying to generalize too much from this data is just like trying to generalize the results of a road race based solely on the engine in the carâ€"the driver, the track, and the weather still play a factor in the race, just as device drivers, application software behavior, and other factors play a role in the actual interrupt latencies of a Windows CE-based system.

Factors in Real-time System Development

      When you evaluate an operating system for time-critical support, you start by identifying those features that can affect real-time performance. For my purposes, I am going to focus on these elements: interrupt handling, processes, threads, synchronization objects, and memory. (Note: this section covers features that are part of every version of Windows CE. If you're primarily interested in the real-time enhancements that are specific to Windows CE 3.0, then you might want to skip ahead to the last section of this article.)

Interrupt Handling

      Interrupt handling in Windows CE is a multistage process, as you can see in Figure 1. Device driver writers provide two pieces that play a key role in interrupt handling: a tiny, kernel-mode Interrupt Service Routine (ISR) to decide how to handle an interrupt, and a user-mode Interrupt Service Thread (IST) to provide processing when any substantial work must be done.

Figure 1 Interrupt Handling
Figure 1Interrupt Handling

      Device driver writers call two separate Win32 functions at driver initialization time to glue the pieces together. HookInterrupt connects specific interrupts to the ISR. This function gets called at system bootup. (Note that system bootup happens for a cold-start, or when the reset button is pushed; the boot sequence does not occur when the system wakes from the sleep state caused, for example, by hitting the power button on a handheld PC or on a Pocket PC.)
      A second initialization function, InterruptInitialize, maps interrupt IDs to event handles that are created using another Win32 functionâ€"CreateEvent. The IST is a regular Win32 threadâ€"created by calling CreateThreadâ€"that starts running and then blocks on the event identified in the call to InterruptInitialize. It waits until the ISR tells the kernel to set the event that allows it to run. Once everything is in place, the only thing that remains is for an interrupt to occur.
      When an interrupt occurs, the Windows CE kernel responds by saving the state of currently executing user-mode code. The kernel calls the appropriate ISR to ask it to quickly decide how to handle the interrupt. An ISR is a function in the device driver that might buffer incoming bytes, but whose primary job is to decide how an interrupt should be handled. It lets the kernel know the results of its decision by means of the return value it passes back to the kernel. For example, the ISR associated with the GPS device in the Windows CE for Automotive systems returns the value of SYSINTR_GPS. The kernel uses this identifier to scan its table for the event handle of the associated IST. The kernel can then trigger the IST to start running by putting its event into a signaled state with a call to SetEvent.
      Of course, an IST doesn't have to be triggered for every interrupt. For example, the GPS driver mentioned earlier stores up location information until its buffers are full. While the buffers are filling, this ISR returns a value of SYSINTR_NOP to the kernel. Only when its buffers are full does the driver tell the kernel to trip the event and cause its IST to run and handle the buffered data.
      There are several reasons that ISRs delegate most of the work of interrupt handling. For one thing, they run on a very small kernel stackâ€"so the amount of calling and local variables that are possible is limited. Another constraint is the fact that there is a very limited set of system services that are available to the ISR. Perhaps the most important consideration is the impact on other parts of the system, including other interrupts and even high-priority threads. When an ISR runs, everything else that's in the system is asleep. If an ISR spends too much time handling a particular interrupt, it might affect the proper operation of other time-critical operations. For all of these reasons, an ISR delegates most of the actual work of handling the interrupt.
      The ISR delegates processing to an IST. An IST is a thread created by the device driver at driver startup, using the same Win32 function that anyone can use: CreateThread. Once started, an IST starts its workday by going to sleep (not a bad job if you can get it!). It goes to sleep by calling the WaitForSingleObject function. And there it waits like Sleeping Beauty until its Prince Charmingâ€"a drama directed by the ISR and played out by the kernelâ€"comes along to kiss it awake.

      As you may have noticed, the Win32 threading functions play a significant role in all of the interrupt code. This is in stark contrast to the separate driver interfaces that device driver writers may have worked with in the past. This driver model is not at all related to the VxD driver interface of 16-bit Windows, to the driver interfaces of Windows NT® and Windows 2000, or to the common WDM of Windows 98 and Windows 2000.

Processes

      Like desktop versions of the Windows OS, Windows CE is a multiprocess and multithreaded operating system. Unlike Windows NT and Windows 2000, Windows CE does not provide multiprocessor support. This is actually good news, since it means that the Windows CE kernel is quite a bit simpler than the multiprocessor-ready Windows NT and Windows 2000 kernel. The downside is that you cannot speed up the execution of a Pocket PC (or any other device running Windows CE) by dropping another processor in, as you can with some desktop systems.
      Another difference from desktop versions of Windows is that Windows CE limits you to 32 simultaneously executing processes (no such limit exists on desktop versions of Windows). By the way, this doesn't mean you can create 32 processes yourself. Windows CE itself creates several processes for its own uses. For example, Windows CE takes several bites from the process pie, including one process for the kernel (nk.exe), one for device drivers (device.exe), one for installable file systems (filesys.exe), andâ€"for GUI systemsâ€"one for the graphics and user interface engine (gwes.exe). To find out about other processes that are running, you can use the Remote Process Viewer tool to check for device-specific processes, and plan your process use accordingly. Figure 2 shows several processes in the Remote Process Viewer.

Figure 2 Processes in the Remote Process Viewer
Figure 2Processes in the Remote Process Viewer

      A process in Windows CE is like a typical process found everywhere else in that it represents a unit of ownership. The most important thing that a process owns is its private address space. A process in Windows CE has a smaller address space than processes on desktop systems. Processes in Windows CE get a 32MB address space, compared to the 2048MB (2GB) process address space found on desktop versions of Windows. (Processes that need more than 32MB of memory can allocate from the pool of shared pagesâ€"using either MapViewofFile or large VirtualAllocsâ€"to augment the 32MB pool of private pages available through calls to VirtualAlloc.) Separate address spaces protect each process from careless code in other processes, contributing to the overall robustness of the system and of each application. While big computer operating systems have always had this protection, many small operating systems do notâ€"so its presence in Windows CE is noteworthy.
      An important aspect of the Windows CE context switching mechanism is that a context switch was designed to be very fast. This is due in part to the way that process address spaces are set up. Unlike in Windows 2000, which gives each process its own set of page tables, on Windows CE all processes share a single set of page tables. The benefits include saved memory usage and faster context switch time. A process switch is much faster on Windows CE than on Windows 2000.

      This has important implications for developers of time-critical systems. It means you are free to use multiple processes that have all the benefits of separate processes but minimal interprocess context switch delays.

Threads

      When I first started looking at Windows CE, I found the presence of threads to be the most surprising and exciting feature. To my knowledge no other platform has begun life with such a powerful capability. MS-DOS®, 16-bit Windows, Apple Macintosh, and the Palm Pilot are all single-threaded. (Art history buffs will note that each of these systems, like Botticelli's Venus, are nakedâ€"entering the world without any threads.)
      Threads are the unit of scheduling on both desktop versions of Windows and Windows CE. Two of the things that a process gets when it starts running are a private address space and a single thread. A process can start additional threads by calling CreateThread and specifying a startup function. Each thread has its own stack for all the usual things that stacks are used for, including local variables and function call return addresses.
      There are a few differences between threads on desktop versions of Windows and Windows CE that are worth noting. For one thing, there are no fibers in Windows CE. Fibers are a lightweight scheduling object supported on both Windows NT and Windows 2000. Fibers provide a way to take the allotted time of a single thread and divide it up in a cooperative, nonpreemptive way for different but related activities. I used to think that fibers were only meant to help port certain special classes of code to Windows, but someone recently told me that SQL Serverâ„¢ 2000 uses fibers. This makes me think that they are worth a second look (though obviously not for Windows CE-based development projects).
      Threads on Windows CE differ from threads on desktop versions of Windows in another important way: Windows CE has a much simpler thread priority model. On desktop versions of Windows, the process owns a priority class, but then the priority of individual threads gets calculated as a delta of the process priority. In Windows CE, process priority is strictly a thread attribute and is unrelated to the process. This introduces the interesting possibility that a single process in Windows CE can hold both a thread with the very highest priority and a thread with the very lowest priority. On Windows NT and Windows 2000, the priorities of all threads in a given process are limited to a delta of two from the process base priority.
      Windows CE differs from desktop versions in the number of thread priorities. On Windows NT and Windows 2000, there are 32 possible thread priorities, with zero being the lowest priority and 32 the highest. This is in contrast to the eight priority levels on Windows CE versions 1.x and 2.x. Starting with Windows CE 3.0, 256 thread priorities are available. Another minor difference from desktop versions is that a lower priority number actually corresponds to a higher priorityâ€"so that the highest priority thread in Windows CE has a thread priority of zero, while the highest priority thread on desktop versions of Windows has the highest number.

Figure 3 Thread Priorities
Figure 3Thread Priorities

      Figure 3 shows how the thread priorities for a Windows CE 1.x or Windows 2.x-based application running on Windows CE 3.0 map to the priorities for Windows CE 3.0. The range of available priorities for the Windows CE 1.x or Windows 2.x-based application is 248-255 (the bottom eight priorities), with priority 0 mapping to 248, 1 mapping to 249, and so on. The importance of this change depends on how important thread priorities are to your application. If thread priorities are important, and if you have an application written to run on Windows CE 1.x or Windows CE 2.x, you have to make a tiny change. Substitute calls to SetThreadPriority with the newer CeSetThreadPriority for Windows CE 3.0. This will allow you access to the broader range of Windows CE 3.0 priorities. When sharing code between Windows CE and Windows NT, you'll definitely need to have separate thread priority handling for each OS.
      Threads are important for real-time programming because they give you a way to separate time-critical work from work that is not time-critical. The scheduler allows threads with a higher priority to run before threads with a lower priority, which you assign by calling SetThreadPriority (you can access the wider range of priorities on Windows CE 3.0 by calling CeSetThreadPriority). So if you have a task (or set of tasks) that are time-critical, you can create a separate thread and assign it responsibility for that work.
      Students in my classes frequently want to know when to use threads. If you think of threads as being like people, you can start to see how creating threads is almost like hiring a new person to join your team. Some tasks, like driving a car, are meant to be done by a single person.
      But some tasksâ€"such as playing catch with a baseballâ€"require several people. It's just not as much fun trying to do it alone. In the same way, some things are meant to be handled by more than a single thread. The most obvious involves the asynchronous nature of outside devices. As discussed earlier in this article, each interrupt vector in Windows CE has a dedicated threadâ€"in the form of an interrupt service threadâ€"for interrupt handling. A second obvious example has to do with communicating with the outside world. Sending or receiving dataâ€"using a network, infrared device, or modemâ€"is best handled with a dedicated communications thread. A program's user interface should have a dedicated user interface thread, which doesn't have many other responsibilities beyond responding to the needs of the user. And when you have time-critical work to do, this should also get delegated to a thread that does little else besides executing your critical code. It should avoid user interface work, file system work, and interacting with random communications devices (unless it is an integral part of its time-critical operation). A specialist real-time thread is what the doctor (especially Dr. GUI) orders when you are serious about any time-critical operations.
      Of course, you know that adding a new person to a team has its complications, and the same is true with threads. Adding a new person to a baseball team involves deciding what position that person can play. It also takes time to coordinate a player's style of fielding with other players on the field. The moment you add a second thread to a process, you introduce the possibility that the two threads will run into each other. Like two baseball players who collide while trying to field a ball, two threads can collide when accessing any of the myriad things that they share. To establish boundaries between the operations of different threads, the Win32 API provides synchronization objects.

Synchronization Objects

      Synchronizing the activity of threads is so important that Windows CE has always had most of the Win32 synchronization objects. All versions of Windows CE support critical sections, mutexes, and events. Support for a fourth type of synchronization object, semaphores, was added in Windows CE 3.0. This means the latest release of Windows CE has the same set of synchronization objects that current desktop versions of Windows have.
      If you've never worked in a multithreaded environment, it might not be exactly clear what synchronization is and why it is important. A simple rule is that any time two threads share anything, you must prevent conflicts using synchronization objects. So what does it mean for something to be shared? How can there be a conflict, and how do you guard against it? Figure 4 will shed some light on this. It shows BadThrd.cpp, a somewhat contrived, tiny multithreaded program with a synchronization conflict.
      When it starts running, this program creates two threads, with ThreadMainA and ThreadMainB as the entry point for each. The two threads share global variable x. So what exactly happens when this program runs?
      There are many possibilities. Perhaps the two threads will get into a deadly embrace and lock up over the variable, or perhaps each thread will be able to run through each of the loops without ever locking up. You might call this thread roulette, sinceâ€"in the absence of a synchronization object to mediate this potential conflictâ€"you might see a problem, or you might not. Perhaps you could ship this code and never encounter a problem. Or, more likely, you'll have this kind of bug lying dormant until you have to demo your code before a large group of people. Then, kablam!â€"everything falls apart.
      The beauty of synchronization objects is that you can get perfect safety for this kind of code with a minimum of extra programming. You just need to see the potential for a problem and actually make use of the available synchronization objects. GoodThrds.cpp completely prevents any conflict over the use of this shared resource. There are exactly six additional lines of code with which to handle the critical sections (see Figure 5).
      This code creates a global variable, cs, of type CRITICAL_SECTION, and gets it ready to use by calling InitializeCriticalSection. The conflict prevention is implemented by the calls to EnterCriticalSection and LeaveCriticalSection around the two loops. The thread that first calls EnterCriticalSection gains exclusive access to that object, and returns immediately from the function call. When the second thread calls EnterCriticalSection, it won't return from that callâ€"it blocksâ€"until the first thread calls LeaveCriticalSection. The addition of the GoodThrd.cpp code makes the program thread-safe, so you can go ahead and ship it without any fear of synchronization collisions.
      A critical section is like a traffic light in an intersection. The use of critical sections prevents collisions between code just like the use of traffic lights prevents collisions between cars. Critical sections are used between two threads in the same process. You can synchronize the activities of two or more threads in different processes using another synchronization objectâ€"a mutex. A third type of synchronization object, the semaphore, is like a mutex with a count. A fourth type, events, is like the starting gun at a raceâ€"they are used as a trigger, such as the IST discussed earlier.
      As this example shows, the use of synchronization objects is important to any multithreaded operation. Developers of real-time systems should be warned, however, that synchronization objects must be used carefully since they can cause problems when writing time-critical code. In particular, whenever a thread is in a time-critical mode, it must avoid accessing any synchronization object that might halt its execution. You should make sure that the path it takes is as free as possible from synchronization objects.
      This sounds like an easy thing to do, but there are quite a few hidden synchronization objects that can create unexpected delays. For example, the file system is protected with a mutex. For this reason, you'll want to avoid file I/O during time-critical operations. In addition, any user interface or graphical output calls may also cause a delay because each of these operations is protected by a mutex. In fact, any system call could potentially get blocked by one or another synchronization object. For this reason, you get the best throughput by having the thread avoid any system calls and focus on the actual time-critical work. And, of course, you improve the throughput of time-critical threads by giving such threads very high priorities.
      Running high-priority threads that never access a synchronization object is the ideal. But sometimes a time-critical, high-priority thread calls into a system function that must acquire a mutex to do its work. Most of the time, the mutex can be acquired right away or with little delayâ€"that is, most of the time the system works.
      But there are times when a high-priority thread blocks on a mutex that is owned by a lower-priority thread that just doesn't release the mutex. There are several reasons for this: the lower-priority thread might just have a bug that prevents it from ever releasing the mutex. Or it might be that other threads are chewing up the available CPU cycles, preventing the lower-priority thread from doing the work it needs to do before it releases the mutex.
      This situation is called priority inversion, since the higher-priority thread has had its privileges subverted by the lower-priority thread. In effect, the priorities of the two threads are the same. To fix this, the Windows CE scheduler does what the Windows NT and Windows 2000 schedulers do: it boosts the priority of the lower-priority thread to the same priority of the higher-priority blocked thread. The hope is that the lower-priority thread just needs to be able to run to release the owned synchronization object. There is no guarantee that this will help, and is quite like pounding your fist on a snack machine that has stolen coins. Sometimes a bit of jiggling is all that is needed to get both stuck threads running and stuck candy machines working again.
      From the perspective of the operating system, this solves a common problem, and is therefore a good thing. From the point of view of time-critical processing, however, this is something of a disaster. All the fussing done by the scheduler to help the lower-priority thread is necessary, of course. But it would be better for the time-critical thread never to have to waste time jiggling the works in the first place. In short, time-critical threads should focus squarely on the work at hand. Tasks that might block should be delegated to other threads, or performed in a different manner.
      I think everyone would agree that it's a good thing for the Windows CE scheduler to help threads free themselves from what might otherwise be a logjam. But perhaps this has you worried that the scheduler somehow meddles in other places as well. Fear not. Priority inversion is the only situation that will cause the Windows CE scheduler to change the priority of a thread. Also priority inversion affects threads that use mutexes, critical section, and semaphores.
      It's also important to keep in mind that there are a few other ways to make your data thread-safe without having to resort to a full-blown synchronization object. You might want to consider the use of InterlockedXXX APIs for simple multithread interactions. This set of functionsâ€"InterlockedIncrement, InterlockedDecrement, and InterlockedExchangeâ€"are less expensive to use than critical sections. These three InterlockedXXX functions will also work for data shared between an ISR and an IST.

Memory

      Programmers new to Windows CE might be surprised to learn that it uses paged memory in its memory management. A typical system running Windows CE has no hard drive, so it has no support for a page file (such as pagefile.sys on Windows NT). So why does Windows CE use paged memory? The answer is to get the benefits that come with pretending you have more memory than you really have, which is what you get from paged virtual memory on any operating system.
      The RAM in a system running Windows CE is divided into two parts: an object store (storage memory) and program memory. Among other things, the object store contains a file system. Data in the object store is compressed (this is an optional setting for an OEM), so that if you have a 128KB program called pager.exe, it might take up only 64KB of object store space. When this program is running, individual parts can be brought into program memoryâ€"a process known as paging-in. Of course, without a page file there is no way to page anything out. But even without a paging file, the presence of paged memory support helps to optimize the use of physical memory. Code is discarded when memory is low and will be paged back in when needed.
      While demand paging reduces memory usage, a page fault during a time-critical operation can create unwanted delays. Developers of time-critical systems want to avoid paging activity when a critical deadline must be met. This is a bit of a problem, since the default behavior of all code in EXEs and DLLs is to load on access. All device drivers are DLLs, which means they are loaded into memory with a call to LoadLibrary. Windows CE has an alternative function, LoadDriver, which is the same as LoadLibrary, except that it guarantees that all pages are loaded and locked into memory.
      I mentioned at the beginning of this article that Windows CE is a configurable operating system, and this applies to the virtual memory system as well. You can, for example, turn off demand paging for the entire system by creating an entry (ROMFLAGS = 0x001) in the config.bib file. While you gain the benefit of avoiding delays caused by page loading, you also lose the benefits of demand paging. In particular, every application and DLL (including device drivers) are loaded entirely into memory before they start running. The downside, then, is that startup times for both applications and device drivers increases a bit. Used judiciously, however, this feature could help remove the memory paging issue from the real-time performance equation.

      While support for demand-loading does make Windows CE quite a modern operating system, the drawbacksâ€"as just describedâ€"are the delays caused by paging. Of course, Windows CE is also an embedded operating system. As such, executables can reside in ROM and, when uncompressed, such executables can execute in place. All versions of Windows CE support execute-in-place; Windows CE 3.0 adds support for multiple execute-in-place (XIP) regions.

Real-time Support

      The enhancements that Microsoft introduced with Windows CE 3.0 to improve real-time performance can be divided into two major categories: interrupt handling and thread scheduling.
      The kernel itself was significantly rewritten to improve the handling of preemption and to reduce the time when the kernel is non-preemptible. Support was also added for nestable interruptsâ€"that is, interruptible ISRs. In previous versions of Windows CE, an ISR could not be interrupted, which meant that the possible delay for any given ISR to start was equal to the running time of the slowest ISR. As I'll discuss shortly, in Windows CE 3.0 ISRs can be interrupted.

      The second major improvement is in the area of thread scheduling. Incidentally, thread scheduling improvements help interrupt handling, since an IST must be scheduled like any other thread. And, of course, thread scheduling is important because real-time system developers like the convenience of creating both multithreaded and multiprocess solutions. You don't want the scheduler to take too long to switch between threads, or it might cause your code to miss important deadlines. You also want to make sure that you have the features you need to implement your threads in the most efficient manner. After I explain the nestable interrupts, I'll explain each of the six key improvements for thread scheduling: new thread priorities, the addition of support for setting thread quantum, improved handling of priority inversion, improved timer granularity, the addition of semaphores, and better support for critical sections. The new real-time functions in Windows CE 3.0 are described in Figure 6.

Nestable Interrupts

      Windows CE 3.0 adds support for interrupt nesting. To be more precise, low-priority ISRs are preemptible by higher-priority ISRs. When one ISR is running and a higher-priority interrupt occurs, Windows CE 3.0 does the same thing that any preemptive context switcher would do: it saves the context of the lower-priority ISR and allows the higher-priority ISR to begin executing. When the higher-priority ISR is completed, the state of the lower-priority ISR is restored and it continues running. In a sense, with Windows CE 3.0 ISRs become more the equal of threads: threads have priority and are preemptible. Windows CE has always supported the prioritization of ISRs. With Windows CE 3.0, ISRs are preemptible, with the benefit that higher-priority ISRs gain more consistent response times.

      There are two drawbacks to this new design. Obviously, lower-priority ISRs run more slowly. But then good system design is all about balancing finite resources between competing demands. The second drawback that affects all ISRs is that ISRs on Windows CE 3.0 run a tiny bit slower than comparable ISRs on Windows CE 2.x. The reason is that the system needs to save more state before calling individual ISRs, so the average start time of all ISRs suffer a bit. The good news, though, is that while average start times increase slightly, maximum start times decrease, particularly for the highest-priority ISRs.

More Thread Priorities

      Windows CE 3.0 now provides 256 thread-priority levels. This compares to 32 priority levels on Windows NT and Windows 2000, and eight priority levels on earlier versions of Windows CE. One of the betas of Windows CE 3.0 shipped with 32 thread priorities, but the message from beta testers was clear: even more priority levels were needed.
      A large number of thread priorities gives system developers a lot of flexibility in controlling the scheduler. In particular, you can create a large number of threads and assign each one a unique priority level. When a thread has a priority level all to itself, it effectively becomes a run-to-completion thread. What's more important is that the scheduler, which always runs the highest-priority thread that is ready, will schedule the specific threads that the system designer has designated as having the highest priority.
      To access the new thread priorities, a pair of new Win32 APIs must be used: CeSetThreadPriority and CeGetThreadPriority. Possible priority values range from 0 (the highest priority) to 255. Programs written for older versions of Windows CE access thread priorities by calling SetThreadPriority and GetThreadPriority. When running on Windows CE 3.0, these functions use the same range of priority values as on previous versions of Windows CE, from 0 (highest priority) to 7. These older priorities map to the eight lowest priority values (from 248 to 255) in the new priority scheme.

      An important difference between Windows CE 3.0 and Windows CE 2.x has to do with the behavior of the highest-priority thread. On Windows CE 2.x, the highest-priority thread does not get preempted. Instead, it is a run-to-completion thread. This is no longer true on Windows CE 3.0, and the default behavior for all threadsâ€"high priority and low priorityâ€"is the same: when other threads of the same priority are in a ready-to-run state, a thread is preempted after its time-slice has expired. And higher-priority threads always preempt lower-priority threads. Run-to-completion support is still available on Windows CE 3.0, and it has been generalized to be available for threads of any priority. Setting the duration of a thread's time-sliceâ€"also known as the thread quantumâ€"is the next subject I'll discuss.

New Feature: Thread Quantum

      Windows CE 3.0 allows the thread quantum, or time-slice duration, to be set on a per-thread basis. The thread quantum is the amount of time that a thread can run before it will be interrupted to allow another thread of its same priority to run. The thread quantum can be set to zero, which means that it will be a run-to-completion thread. Such threads can be interrupted by threads with a higher priority, but they aren't interruptible by threads at their same priority level. Another use of thread quantum would be to give one thread more processing time than other threads of its same priority. The effect would be to allow the privileged thread a bit more time to work, while still allowing those threads that share its priority regular intervals of CPU time.

      A pair of new Win32 functions has been defined to support thread quantum: CeSetThreadQuantum and CeGetThreadQuantum. Processing time for these two functions is measured in milliseconds. The default thread quantum is set in the OEM Adaptation Layer (OAL) through a kernel global variable called dwDefaultThreadQuantum. The default value for this quantum is 100 milliseconds, but a system developer can set the default value to any value except zero in the OEMInit function. (By contrast, the default thread scheduling value on earlier versions of Windows CE was 25 milliseconds.) If the kernel detects a value of zero, the default of 100 milliseconds is used instead. A value that is too low causes the system to spend too much time in the scheduler, while a value that is too high can cause unacceptable delays between the time when threads are scheduled.

Improved Handling of Priority Inversion

      As described earlier in this article, priority inversion describes a situation in which a high-priority thread gets stuck waiting to obtain ownership of an object, such as a mutex, that is owned by a lower-priority thread. The lower-priority thread gets an instant priority boost in the hopes that it will do the work it needs so that the higher-priority thread can obtain ownership of the mutex so that it can do its real work.
      On previous versions of Windows CE, priority inversion was a pretty complicated affair. You might say that earlier versions of Windows CE supported nested priority inversion, because the priorities of multiple threads could get a boost. Consider four threads of descending priority, A, B, C, and D, each blocked on a different mutex, mutA, mutB, mutC, and mutD. If a fifth thread, thread E (even lower priority) owned mutD, then each of the threads in this chain would get a priority boost. The problem with this approach is that it made the scheduler in Windows CE a bit more complicated than it needed to be.

      With Windows CE 3.0, priority inversion support is quite a bit simpler. The scheduler looks ahead only one level, and not the five in the previous example. The net effect is that the scheduler doesn't burn CPU cycles on poorly designed threads. Instead, it keeps the road clear for those threads that are tuned to focus on their real work during time-critical operations.

Improved Timer Granularity

      Another improvement made for Windows CE 3.0 concerns the granularity of timers. In earlier versions of Windows CE, the granularity of timers was equal to the time-slice thread quantum, which defaulted to 25 milliseconds (but which was settable in the OAL, just as it is in Windows CE 3.0). In Windows CE 3.0, timer granularity and thread quantum have been logically separated from each other. The timer granularity is now one millisecond, instead of 25 milliseconds (or more).
      Timers appear in several places in the Win32 API, including in the Sleep function and in the WaitForSingleObject function. A thread can call Sleep to give up the rest of the time slice, with a request to be awakened after the passage of a specified amount of time. The unit of measure requested by the Sleep function is milliseconds. But it does you little good to request a 15-millisecond wake-up call, and then have the actual wait be 25 milliseconds. Such was the experience for system builders with Windows CE 2.x. But with Windows CE 3.0, sleeping threads have a much better chance of being awakened after the specified time has passed. The same level of granularity exists for the timeout in the WaitForSingleObject function.
      While Sleep has gotten more precise, it's important to keep in mind that it does not have 100 percent precision. For this reason, it can't ever have the precision of a dedicated hardware timer. A high-priority thread might get scheduled to run, or an ISR might occur that causes your sleeping thread to be delayed. System builders need to keep these limitations in mind, and add support for a dedicated hardware timer if absolute, completely repeatable results are what you need.

      Incidentally, if you are new to programming in Win32, you might discover the SetTimer function and wonder whether it provides a good way to meet precise timing needs. Well, wonder no moreâ€"it's not. This function, which has been around since the dawn of Windows, is heavily tied to the message-based user interface. The timing provided by this function is more suitable for user interface issues (such as performing a "Rebooting in 10 seconds" countdown) than for the precise needs of real-time system development. While sleep has become more accurate, the architects of Windows CE have not forgotten about battery life. New additions to the OEM Adaptation Layer allow the system tick to be reprogrammed during idle periods. This allows the system to stay in an idle state, conserving battery life for longer periods of time.

New Feature: Semaphores

      Windows CE 3.0 adds support for semaphores, and in doing so makes the set of available synchronization objects in Windows CE match those found in desktop versions of Windows. A semaphore is roughly like a mutex with a count. (Or a mutex is like a semaphore with a count of 1.) Semaphores let you set a limit to the number of threads that have access to a particular resource.

Critical Section Improvements

      Critical sections are a bit different from other synchronization objects. For one thing, they are private to a process, while other synchronization objects provide interprocess support. Also, all other synchronization objects are accessed with a call to WaitForSingleObject, which lets you specify a timeout. You could, for example, attempt to acquire a mutex if it becomes available in 100 milliseconds, but abandon the attempt if it takes longer. With the comparable critical section function, EnterCriticalSection, you cannot specify a timeout. With this function, when a thread tries to acquire a critical section, it must wait.

      Windows CE 3.0 introduces a new function, TryEnterCriticalSection, which never blocks on another thread. Instead, this function returns TRUE when you have successfully acquired the critical section, and FALSE when it is already owned. This provides a safe way to use critical sections without having to worry that a thread will block during a time-critical operation.

Measuring System Responsiveness

      System developers who are serious about system responsiveness know that if you want a specific degree of responsiveness, you have to find a way to measure it. The Platform Builder for Windows CE 3.0 offers help with two tools: ILTIMING and OSBENCH.
      ILTIMING measures the interrupt latency on a system that has the right hooks in place. What this means is that you won't be able to measure the latency on systems like the Pocket PC and the H/PC Pro, which don't have these hooks enabled. But if you are putting together your own Windows CE-based system, you can enable the hooks and get the timing information.
      OSBENCH provides a wide range of timing information for each of the four basic kernel synchronization objects (critical sections, mutexes, semaphores, and events), and some comparisons of interprocess and intraprocess timings. The version that I ran had 33 different tests, some of which involved showing how priority inversion really slows things down.
      The complete source code to both tools is available in the Platform Builder, to allow you to fine-tune them to your specific timing test requirements.

      In addition to ILTIMING and OSBENCH, Windows CE 3.0 comes with Event Tracking. Event Tracking enables an OEM to monitor the system's state to determine how well an application is running. The data enables OEMs to spot potential deadlocks and to detect priority inversion cases.

Conclusion

      Windows CE has always provided the basic set of features to serve as the engine for hard real-time systems. With Windows CE 3.0, Microsoft has upped the ante by giving both the system builder and the application programmer more control over the handling of device-driver interrupts as well as the scheduling of individual threads. In particular, if you need features that are only crudely provided by other real-time operating systems, then maybe it's time to take Windows CE 3.0 out for a test drive.
For related articles see:
https://www.microsoft.com/msj/0799/wincekernel/wincekernel.htm
For background information see:
Inside Microsoft Windows CE by John Murray (Microsoft Press, 1998)
https://msdn.microsoft.com/library/techart/gettingstarted30.htm

Paul Yao is a Windows programming guru, Windows CE expert, and co-author of the first book published on Windows programming. His company specializes in training programmers in Windows and Windows-related technologies. Visit his Web site at https://www.paulyao.com.

From the November 2000 issue of MSDN Magazine.