A real-time system is a computer systemâ"including hardware, operating system, device drivers, and application softwareâ"that must be able to handle a specific set of activities within a specified timeframe. Often, the timing considerations are driven by the needs of some outside device. Such a device might be an external sensor that generates 10,000 interrupts per second. When an interrupt is triggered, the software might need to read and store a measurement of, say, the speed at which a conveyor belt is moving. Software in such a scenario might play a more active role in the operation of the conveyor belt, perhaps sending a signal to a motor controller to increase or decrease the speed when the current value is out of some predefined range. A distinction is sometimes made between hard real-time systems and soft real-time systems. A hard real-time system doesn't tolerate missed deadlines. Missing even a single deadline results in a catastrophe. Consider a high-speed, high-volume color printer that might be used to print the paper edition of a magazine like the one you are reading now. A wide roll of paper flies past at very high speeds, and four separate imagesâ"one each for black, cyan, magenta, and yellowâ"must be positioned and imprinted with very precise timing. Failure to place each one exactly right results in blurry images and upset readers. A soft real-time system tolerates some amount of failure. In a data collection system, for example, recovery from missing data points can sometimes be extrapolated from known values. One example of a soft real-time system is a multimedia application in which there is a relatively high tolerance for missing deadlines related to decompressing sound or video. Such failures might result in temporary degradation of output quality, but the presentation itself remains largely intact. (Of course, it's also easy to imagine a multimedia system that has hard real-time constraints, which certainly describes what you expect from a digital camcorderâ"no missed deadlines are tolerated when you are recording that once-in-a-lifetime baptism or bar mitzvah.) A serious real-time system implementor uses the terms milliseconds and microseconds to define specific time requirements. A millisecond (abbreviated ms) is 1/1000 (0.001) of a second. Examples of where programmers encounter milliseconds in the Win32 API include the Sleep and WaitForSingleObject functions. With the Sleep function, it is the amount of time a thread should yield to the scheduler. The WaitForSingleObject function uses its millisecond parameter to specify a timeout for waiting on a blocked synchronization object. A microsecond (abbreviated as either us or ms) represents one-millionth (1/1,000,000 or 0.000001) of a second. In the past, dedicated hardware was needed to support real-time requirements that require microsecond sensitivity. As CPUs have gotten faster, microsecond-level real-time requirements can be met with a heavier reliance on software than was possible in the past. Another important term is interrupt latency, the delay between a hardware interrupt and the start of the actual processing of the interrupt. For some, this is a key benchmark since this sets a limit on the responsiveness of a device to outside events. A core goal for Windows CE 3.0 is to provide an interrupt latency response time of 50 ms on Windows CE running on a 166 MHz Pentium. While this goal has been met, this number is probably not very meaningful unless you happen to want to use the exact hardware configuration that was used in the testing. Otherwise, it's best if you do your own testing on your own hardware. You'll find details of the performance testing support that Microsoft provides in the Windows CE 3.0 Platform Builder. Trying to generalize too much from this data is just like trying to generalize the results of a road race based solely on the engine in the carâ"the driver, the track, and the weather still play a factor in the race, just as device drivers, application software behavior, and other factors play a role in the actual interrupt latencies of a Windows CE-based system.
When you evaluate an operating system for time-critical support, you start by identifying those features that can affect real-time performance. For my purposes, I am going to focus on these elements: interrupt handling, processes, threads, synchronization objects, and memory. (Note: this section covers features that are part of every version of Windows CE. If you're primarily interested in the real-time enhancements that are specific to Windows CE 3.0, then you might want to skip ahead to the last section of this article.)
Interrupt handling in Windows CE is a multistage process, as you can see in Figure 1. Device driver writers provide two pieces that play a key role in interrupt handling: a tiny, kernel-mode Interrupt Service Routine (ISR) to decide how to handle an interrupt, and a user-mode Interrupt Service Thread (IST) to provide processing when any substantial work must be done.
Figure 1Interrupt Handling
Device driver writers call two separate Win32 functions at driver initialization time to glue the pieces together. HookInterrupt connects specific interrupts to the ISR. This function gets called at system bootup. (Note that system bootup happens for a cold-start, or when the reset button is pushed; the boot sequence does not occur when the system wakes from the sleep state caused, for example, by hitting the power button on a handheld PC or on a Pocket PC.) A second initialization function, InterruptInitialize, maps interrupt IDs to event handles that are created using another Win32 functionâ"CreateEvent. The IST is a regular Win32 threadâ"created by calling CreateThreadâ"that starts running and then blocks on the event identified in the call to InterruptInitialize. It waits until the ISR tells the kernel to set the event that allows it to run. Once everything is in place, the only thing that remains is for an interrupt to occur. When an interrupt occurs, the Windows CE kernel responds by saving the state of currently executing user-mode code. The kernel calls the appropriate ISR to ask it to quickly decide how to handle the interrupt. An ISR is a function in the device driver that might buffer incoming bytes, but whose primary job is to decide how an interrupt should be handled. It lets the kernel know the results of its decision by means of the return value it passes back to the kernel. For example, the ISR associated with the GPS device in the Windows CE for Automotive systems returns the value of SYSINTR_GPS. The kernel uses this identifier to scan its table for the event handle of the associated IST. The kernel can then trigger the IST to start running by putting its event into a signaled state with a call to SetEvent. Of course, an IST doesn't have to be triggered for every interrupt. For example, the GPS driver mentioned earlier stores up location information until its buffers are full. While the buffers are filling, this ISR returns a value of SYSINTR_NOP to the kernel. Only when its buffers are full does the driver tell the kernel to trip the event and cause its IST to run and handle the buffered data. There are several reasons that ISRs delegate most of the work of interrupt handling. For one thing, they run on a very small kernel stackâ"so the amount of calling and local variables that are possible is limited. Another constraint is the fact that there is a very limited set of system services that are available to the ISR. Perhaps the most important consideration is the impact on other parts of the system, including other interrupts and even high-priority threads. When an ISR runs, everything else that's in the system is asleep. If an ISR spends too much time handling a particular interrupt, it might affect the proper operation of other time-critical operations. For all of these reasons, an ISR delegates most of the actual work of handling the interrupt. The ISR delegates processing to an IST. An IST is a thread created by the device driver at driver startup, using the same Win32 function that anyone can use: CreateThread. Once started, an IST starts its workday by going to sleep (not a bad job if you can get it!). It goes to sleep by calling the WaitForSingleObject function. And there it waits like Sleeping Beauty until its Prince Charmingâ"a drama directed by the ISR and played out by the kernelâ"comes along to kiss it awake.
As you may have noticed, the Win32 threading functions play a significant role in all of the interrupt code. This is in stark contrast to the separate driver interfaces that device driver writers may have worked with in the past. This driver model is not at all related to the VxD driver interface of 16-bit Windows, to the driver interfaces of Windows NTÂ® and Windows 2000, or to the common WDM of Windows 98 and Windows 2000.
Figure 2Processes in the Remote Process Viewer
A process in Windows CE is like a typical process found everywhere else in that it represents a unit of ownership. The most important thing that a process owns is its private address space. A process in Windows CE has a smaller address space than processes on desktop systems. Processes in Windows CE get a 32MB address space, compared to the 2048MB (2GB) process address space found on desktop versions of Windows. (Processes that need more than 32MB of memory can allocate from the pool of shared pagesâ"using either MapViewofFile or large VirtualAllocsâ"to augment the 32MB pool of private pages available through calls to VirtualAlloc.) Separate address spaces protect each process from careless code in other processes, contributing to the overall robustness of the system and of each application. While big computer operating systems have always had this protection, many small operating systems do notâ"so its presence in Windows CE is noteworthy. An important aspect of the Windows CE context switching mechanism is that a context switch was designed to be very fast. This is due in part to the way that process address spaces are set up. Unlike in Windows 2000, which gives each process its own set of page tables, on Windows CE all processes share a single set of page tables. The benefits include saved memory usage and faster context switch time. A process switch is much faster on Windows CE than on Windows 2000.
This has important implications for developers of time-critical systems. It means you are free to use multiple processes that have all the benefits of separate processes but minimal interprocess context switch delays.
Figure 3Thread Priorities
Figure 3 shows how the thread priorities for a Windows CE 1.x or Windows 2.x-based application running on Windows CE 3.0 map to the priorities for Windows CE 3.0. The range of available priorities for the Windows CE 1.x or Windows 2.x-based application is 248-255 (the bottom eight priorities), with priority 0 mapping to 248, 1 mapping to 249, and so on. The importance of this change depends on how important thread priorities are to your application. If thread priorities are important, and if you have an application written to run on Windows CE 1.x or Windows CE 2.x, you have to make a tiny change. Substitute calls to SetThreadPriority with the newer CeSetThreadPriority for Windows CE 3.0. This will allow you access to the broader range of Windows CE 3.0 priorities. When sharing code between Windows CE and Windows NT, you'll definitely need to have separate thread priority handling for each OS. Threads are important for real-time programming because they give you a way to separate time-critical work from work that is not time-critical. The scheduler allows threads with a higher priority to run before threads with a lower priority, which you assign by calling SetThreadPriority (you can access the wider range of priorities on Windows CE 3.0 by calling CeSetThreadPriority). So if you have a task (or set of tasks) that are time-critical, you can create a separate thread and assign it responsibility for that work. Students in my classes frequently want to know when to use threads. If you think of threads as being like people, you can start to see how creating threads is almost like hiring a new person to join your team. Some tasks, like driving a car, are meant to be done by a single person. But some tasksâ"such as playing catch with a baseballâ"require several people. It's just not as much fun trying to do it alone. In the same way, some things are meant to be handled by more than a single thread. The most obvious involves the asynchronous nature of outside devices. As discussed earlier in this article, each interrupt vector in Windows CE has a dedicated threadâ"in the form of an interrupt service threadâ"for interrupt handling. A second obvious example has to do with communicating with the outside world. Sending or receiving dataâ"using a network, infrared device, or modemâ"is best handled with a dedicated communications thread. A program's user interface should have a dedicated user interface thread, which doesn't have many other responsibilities beyond responding to the needs of the user. And when you have time-critical work to do, this should also get delegated to a thread that does little else besides executing your critical code. It should avoid user interface work, file system work, and interacting with random communications devices (unless it is an integral part of its time-critical operation). A specialist real-time thread is what the doctor (especially Dr. GUI) orders when you are serious about any time-critical operations. Of course, you know that adding a new person to a team has its complications, and the same is true with threads. Adding a new person to a baseball team involves deciding what position that person can play. It also takes time to coordinate a player's style of fielding with other players on the field. The moment you add a second thread to a process, you introduce the possibility that the two threads will run into each other. Like two baseball players who collide while trying to field a ball, two threads can collide when accessing any of the myriad things that they share. To establish boundaries between the operations of different threads, the Win32 API provides synchronization objects.
Synchronizing the activity of threads is so important that Windows CE has always had most of the Win32 synchronization objects. All versions of Windows CE support critical sections, mutexes, and events. Support for a fourth type of synchronization object, semaphores, was added in Windows CE 3.0. This means the latest release of Windows CE has the same set of synchronization objects that current desktop versions of Windows have. If you've never worked in a multithreaded environment, it might not be exactly clear what synchronization is and why it is important. A simple rule is that any time two threads share anything, you must prevent conflicts using synchronization objects. So what does it mean for something to be shared? How can there be a conflict, and how do you guard against it? Figure 4 will shed some light on this. It shows BadThrd.cpp, a somewhat contrived, tiny multithreaded program with a synchronization conflict. When it starts running, this program creates two threads, with ThreadMainA and ThreadMainB as the entry point for each. The two threads share global variable x. So what exactly happens when this program runs? There are many possibilities. Perhaps the two threads will get into a deadly embrace and lock up over the variable, or perhaps each thread will be able to run through each of the loops without ever locking up. You might call this thread roulette, sinceâ"in the absence of a synchronization object to mediate this potential conflictâ"you might see a problem, or you might not. Perhaps you could ship this code and never encounter a problem. Or, more likely, you'll have this kind of bug lying dormant until you have to demo your code before a large group of people. Then, kablam!â"everything falls apart. The beauty of synchronization objects is that you can get perfect safety for this kind of code with a minimum of extra programming. You just need to see the potential for a problem and actually make use of the available synchronization objects. GoodThrds.cpp completely prevents any conflict over the use of this shared resource. There are exactly six additional lines of code with which to handle the critical sections (see Figure 5). This code creates a global variable, cs, of type CRITICAL_SECTION, and gets it ready to use by calling InitializeCriticalSection. The conflict prevention is implemented by the calls to EnterCriticalSection and LeaveCriticalSection around the two loops. The thread that first calls EnterCriticalSection gains exclusive access to that object, and returns immediately from the function call. When the second thread calls EnterCriticalSection, it won't return from that callâ"it blocksâ"until the first thread calls LeaveCriticalSection. The addition of the GoodThrd.cpp code makes the program thread-safe, so you can go ahead and ship it without any fear of synchronization collisions. A critical section is like a traffic light in an intersection. The use of critical sections prevents collisions between code just like the use of traffic lights prevents collisions between cars. Critical sections are used between two threads in the same process. You can synchronize the activities of two or more threads in different processes using another synchronization objectâ"a mutex. A third type of synchronization object, the semaphore, is like a mutex with a count. A fourth type, events, is like the starting gun at a raceâ"they are used as a trigger, such as the IST discussed earlier. As this example shows, the use of synchronization objects is important to any multithreaded operation. Developers of real-time systems should be warned, however, that synchronization objects must be used carefully since they can cause problems when writing time-critical code. In particular, whenever a thread is in a time-critical mode, it must avoid accessing any synchronization object that might halt its execution. You should make sure that the path it takes is as free as possible from synchronization objects. This sounds like an easy thing to do, but there are quite a few hidden synchronization objects that can create unexpected delays. For example, the file system is protected with a mutex. For this reason, you'll want to avoid file I/O during time-critical operations. In addition, any user interface or graphical output calls may also cause a delay because each of these operations is protected by a mutex. In fact, any system call could potentially get blocked by one or another synchronization object. For this reason, you get the best throughput by having the thread avoid any system calls and focus on the actual time-critical work. And, of course, you improve the throughput of time-critical threads by giving such threads very high priorities. Running high-priority threads that never access a synchronization object is the ideal. But sometimes a time-critical, high-priority thread calls into a system function that must acquire a mutex to do its work. Most of the time, the mutex can be acquired right away or with little delayâ"that is, most of the time the system works. But there are times when a high-priority thread blocks on a mutex that is owned by a lower-priority thread that just doesn't release the mutex. There are several reasons for this: the lower-priority thread might just have a bug that prevents it from ever releasing the mutex. Or it might be that other threads are chewing up the available CPU cycles, preventing the lower-priority thread from doing the work it needs to do before it releases the mutex. This situation is called priority inversion, since the higher-priority thread has had its privileges subverted by the lower-priority thread. In effect, the priorities of the two threads are the same. To fix this, the Windows CE scheduler does what the Windows NT and Windows 2000 schedulers do: it boosts the priority of the lower-priority thread to the same priority of the higher-priority blocked thread. The hope is that the lower-priority thread just needs to be able to run to release the owned synchronization object. There is no guarantee that this will help, and is quite like pounding your fist on a snack machine that has stolen coins. Sometimes a bit of jiggling is all that is needed to get both stuck threads running and stuck candy machines working again. From the perspective of the operating system, this solves a common problem, and is therefore a good thing. From the point of view of time-critical processing, however, this is something of a disaster. All the fussing done by the scheduler to help the lower-priority thread is necessary, of course. But it would be better for the time-critical thread never to have to waste time jiggling the works in the first place. In short, time-critical threads should focus squarely on the work at hand. Tasks that might block should be delegated to other threads, or performed in a different manner. I think everyone would agree that it's a good thing for the Windows CE scheduler to help threads free themselves from what might otherwise be a logjam. But perhaps this has you worried that the scheduler somehow meddles in other places as well. Fear not. Priority inversion is the only situation that will cause the Windows CE scheduler to change the priority of a thread. Also priority inversion affects threads that use mutexes, critical section, and semaphores. It's also important to keep in mind that there are a few other ways to make your data thread-safe without having to resort to a full-blown synchronization object. You might want to consider the use of InterlockedXXX APIs for simple multithread interactions. This set of functionsâ"InterlockedIncrement, InterlockedDecrement, and InterlockedExchangeâ"are less expensive to use than critical sections. These three InterlockedXXX functions will also work for data shared between an ISR and an IST.
Programmers new to Windows CE might be surprised to learn that it uses paged memory in its memory management. A typical system running Windows CE has no hard drive, so it has no support for a page file (such as pagefile.sys on Windows NT). So why does Windows CE use paged memory? The answer is to get the benefits that come with pretending you have more memory than you really have, which is what you get from paged virtual memory on any operating system. The RAM in a system running Windows CE is divided into two parts: an object store (storage memory) and program memory. Among other things, the object store contains a file system. Data in the object store is compressed (this is an optional setting for an OEM), so that if you have a 128KB program called pager.exe, it might take up only 64KB of object store space. When this program is running, individual parts can be brought into program memoryâ"a process known as paging-in. Of course, without a page file there is no way to page anything out. But even without a paging file, the presence of paged memory support helps to optimize the use of physical memory. Code is discarded when memory is low and will be paged back in when needed. While demand paging reduces memory usage, a page fault during a time-critical operation can create unwanted delays. Developers of time-critical systems want to avoid paging activity when a critical deadline must be met. This is a bit of a problem, since the default behavior of all code in EXEs and DLLs is to load on access. All device drivers are DLLs, which means they are loaded into memory with a call to LoadLibrary. Windows CE has an alternative function, LoadDriver, which is the same as LoadLibrary, except that it guarantees that all pages are loaded and locked into memory. I mentioned at the beginning of this article that Windows CE is a configurable operating system, and this applies to the virtual memory system as well. You can, for example, turn off demand paging for the entire system by creating an entry (ROMFLAGS = 0x001) in the config.bib file. While you gain the benefit of avoiding delays caused by page loading, you also lose the benefits of demand paging. In particular, every application and DLL (including device drivers) are loaded entirely into memory before they start running. The downside, then, is that startup times for both applications and device drivers increases a bit. Used judiciously, however, this feature could help remove the memory paging issue from the real-time performance equation.
While support for demand-loading does make Windows CE quite a modern operating system, the drawbacksâ"as just describedâ"are the delays caused by paging. Of course, Windows CE is also an embedded operating system. As such, executables can reside in ROM and, when uncompressed, such executables can execute in place. All versions of Windows CE support execute-in-place; Windows CE 3.0 adds support for multiple execute-in-place (XIP) regions.
The second major improvement is in the area of thread scheduling. Incidentally, thread scheduling improvements help interrupt handling, since an IST must be scheduled like any other thread. And, of course, thread scheduling is important because real-time system developers like the convenience of creating both multithreaded and multiprocess solutions. You don't want the scheduler to take too long to switch between threads, or it might cause your code to miss important deadlines. You also want to make sure that you have the features you need to implement your threads in the most efficient manner. After I explain the nestable interrupts, I'll explain each of the six key improvements for thread scheduling: new thread priorities, the addition of support for setting thread quantum, improved handling of priority inversion, improved timer granularity, the addition of semaphores, and better support for critical sections. The new real-time functions in Windows CE 3.0 are described in Figure 6.
There are two drawbacks to this new design. Obviously, lower-priority ISRs run more slowly. But then good system design is all about balancing finite resources between competing demands. The second drawback that affects all ISRs is that ISRs on Windows CE 3.0 run a tiny bit slower than comparable ISRs on Windows CE 2.x. The reason is that the system needs to save more state before calling individual ISRs, so the average start time of all ISRs suffer a bit. The good news, though, is that while average start times increase slightly, maximum start times decrease, particularly for the highest-priority ISRs.
An important difference between Windows CE 3.0 and Windows CE 2.x has to do with the behavior of the highest-priority thread. On Windows CE 2.x, the highest-priority thread does not get preempted. Instead, it is a run-to-completion thread. This is no longer true on Windows CE 3.0, and the default behavior for all threadsâ"high priority and low priorityâ"is the same: when other threads of the same priority are in a ready-to-run state, a thread is preempted after its time-slice has expired. And higher-priority threads always preempt lower-priority threads. Run-to-completion support is still available on Windows CE 3.0, and it has been generalized to be available for threads of any priority. Setting the duration of a thread's time-sliceâ"also known as the thread quantumâ"is the next subject I'll discuss.
A pair of new Win32 functions has been defined to support thread quantum: CeSetThreadQuantum and CeGetThreadQuantum. Processing time for these two functions is measured in milliseconds. The default thread quantum is set in the OEM Adaptation Layer (OAL) through a kernel global variable called dwDefaultThreadQuantum. The default value for this quantum is 100 milliseconds, but a system developer can set the default value to any value except zero in the OEMInit function. (By contrast, the default thread scheduling value on earlier versions of Windows CE was 25 milliseconds.) If the kernel detects a value of zero, the default of 100 milliseconds is used instead. A value that is too low causes the system to spend too much time in the scheduler, while a value that is too high can cause unacceptable delays between the time when threads are scheduled.
With Windows CE 3.0, priority inversion support is quite a bit simpler. The scheduler looks ahead only one level, and not the five in the previous example. The net effect is that the scheduler doesn't burn CPU cycles on poorly designed threads. Instead, it keeps the road clear for those threads that are tuned to focus on their real work during time-critical operations.
Incidentally, if you are new to programming in Win32, you might discover the SetTimer function and wonder whether it provides a good way to meet precise timing needs. Well, wonder no moreâ"it's not. This function, which has been around since the dawn of Windows, is heavily tied to the message-based user interface. The timing provided by this function is more suitable for user interface issues (such as performing a "Rebooting in 10 seconds" countdown) than for the precise needs of real-time system development. While sleep has become more accurate, the architects of Windows CE have not forgotten about battery life. New additions to the OEM Adaptation Layer allow the system tick to be reprogrammed during idle periods. This allows the system to stay in an idle state, conserving battery life for longer periods of time.
Windows CE 3.0 adds support for semaphores, and in doing so makes the set of available synchronization objects in Windows CE match those found in desktop versions of Windows. A semaphore is roughly like a mutex with a count. (Or a mutex is like a semaphore with a count of 1.) Semaphores let you set a limit to the number of threads that have access to a particular resource.
Windows CE 3.0 introduces a new function, TryEnterCriticalSection, which never blocks on another thread. Instead, this function returns TRUE when you have successfully acquired the critical section, and FALSE when it is already owned. This provides a safe way to use critical sections without having to worry that a thread will block during a time-critical operation.
In addition to ILTIMING and OSBENCH, Windows CE 3.0 comes with Event Tracking. Event Tracking enables an OEM to monitor the system's state to determine how well an application is running. The data enables OEMs to spot potential deadlocks and to detect priority inversion cases.
From the November 2000 issue of MSDN Magazine.
More MSDN Magazine Blog entries >
Browse All MSDN Magazines
Subscribe to MSDN Flash newsletter
Receive the MSDN Flash e-mail newsletter every other week, with news and information personalized to your interests and areas of focus.