Key Benefits of the I/O APIC
Updated: December 4, 2001
On This Page
Both end users and platform designers have been plagued by the lack of interrupt resources in systems when additional hardware devices and functionality are added to the system.
This problem is due in part to the increasing number of devices supported per platform, and is rooted in the history of PC-architected systems. Interrupts have been most commonly handled using the same basic methodology as that used by the original PC-AT architecture--in other words, using legacy "chained" 8259 Programmable Interrupt Controller (PIC) devices.
Since the initial release of Microsoft Windows NT 3.1, Windows NT has supported an improved interrupt handling architecture, the Advanced Programmable Interrupt Controller (APIC). However, the benefits of the APIC are not broadly understood. This article presents some of the key benefits of the APIC to explain why APIC should be part of all future platform architectures, rather than being restricted to the current common deployments in servers and professional workstations.
The information in this paper applies for Microsoft Windows 2000 and later versions.
What is APIC? APIC is a distributed set of devices that make up an interrupt controller. In current implementations, each part of a system is connected by an APIC bus. One part of the system, the "local APIC," delivers interrupts to a specific processor; for example, a machine with three processors must have three local APICs. Intel has been building local APICs into their processors since the Pentium P54c in 1994. Computers built with genuine Intel processors already contain part of the APIC system.
The other important part of the system is the I/O APIC. There can be as many as eight I/O APICs on a system; they collect interrupt signals from I/O devices and send messages to the local APICs when those devices need to interrupt. Each I/O APIC has an arbitrary number of interrupt inputs (or IRQs). Intel's past and current I/O APICs typically have 24 inputs--others can have as many as 64. And some machines have several I/O APICs, each with a number of inputs, amounting to hundreds of IRQs in the machine that are available for device interrupts.
However, without an I/O APIC in the system, the local APICs are useless. In such a situation, Windows 2000 has to revert to using the 8259 PIC.
Windows 2000 Synchronization Primitives. Windows 2000 was built to support multiprocessing. The writers of the operating system code assumed that multiple processors might simultaneously run any single piece of code. Because of this possibility, each processor must not corrupt data that is currently being modified by another processor.
Many older operating systems, particularly those that added support for multiple processors late in their lifecycles, solve this problem by allowing only one processor to execute code in the operating system kernel. This leads to poor scaling. Windows 2000 works to keep all the processors doing useful work by putting "locks" around key data structures. There are several types of locks: mutexes, semaphores, spin locks, and so on; each lock is used to increase the efficiency of a specific situation.
At the core of each of these different locks is the concept of the Interrupt Request Level (IRQL). IRQL is a scheme of increasing priorities. Threads can only run on a processor when that processor is running at passive level. The thread dispatcher needs to run at a higher priority than threads themselves. It runs at dispatch level, as do Deferred Procedure Calls (DPCs), which are pieces of code that device drivers use for most of their work. At device level, only device interrupt service routines and higher priority interrupt code can run. At high level, the processor cannot be preempted. (There are other levels in between these examples, but they are beyond the scope of the discussion.) Each lock has an associated IRQL; when one processor is waiting for another processor to release a lock, it can still do higher-priority work.
Because Windows 2000 is designed to run with multiple processors, there are hundreds of locks in the system. Taking each lock involves raising IRQL to the level of the lock and then lowering it. Receiving an interrupt from a device also causes a transition to and from a higher IRQL. As a result, each processor raises and lowers IRQL hundreds of times per second.
The basic architecture of Windows 2000 does not change when there is only a single processor in the system. The processor changes IRQL the same number of times as it would with multiple processors in the machine.
Enforcing the IRQL priority scheme is done in the hardware by writing a new mask into the machine's interrupt controller at each transition. The rest of this discussion explains why the APIC is better than the 8259 PIC, even in single-processor systems.
Sharing Interrupts Is Bad
Edge-triggered Interrupts. With any interrupt controller, an edge-triggered interrupt is a one-time event. There is no notion of feedback. The operating system never really knows when it has handled the situation that caused the interrupt; it can only know that an event happened sometime in the recent past. Therefore, the only rational response to an edge-triggered interrupt is to run all the Interrupt Service Routines (ISRs) associated with that vector once, with the hope that this will resolve it.
This situation is especially sketchy when dealing with hardware that doesn't give any real indication of why it interrupted, a common occurrence among today's edge-triggering devices. The result is that the operating system can miss interrupts delivered in the interval between when an interrupt is first taken and when it is acknowledged.
With an 8259 PIC interrupt controller, the situation is even worse. The 8259 is inherently unreliable, particularly when coupled with an actual ISA bus. The operating system software will see a number of spurious interrupts, some of which show up on different vectors than the original signal.
Level-triggered Interrupts. The protocol of a level-triggered interrupt is as follows:
This is good for sharing, because it confirms that the operating system handled the interrupting device. The algorithm is:
If multiple devices interrupt, the operating system will find the first one, run its ISR, and then ACK the interrupt. Then the operating system will immediately take another interrupt on that vector. At this point, the operating system runs through all the ISRs until it finds the second device, runs its ISR, and ACKs the interrupt again.
An excellent example of the problem case comes when there are twelve devices all chained on one vector, common for a docked laptop. Every time the operating system takes an interrupt, the operating system must run as many as twelve ISRs before it begins to handle the condition that caused the interrupt.
Even assuming that every device in the chain is well designed and can determine within a few I/O cycles whether it is generating an interrupt, a huge interrupt latency occurs, as each interrupt causes the operating system to touch up to twelve different pieces of hardware.
Unfortunately, the reality of the modern PC market is that many devices cause problems. Most hardware is designed with the mindset that the ISR will handle all the work of the interrupt.
Because of this approach, designers may overlook the need for good hardware synchronization in their chip designs; this has the side effect of pushing the problem into software. When this happens, the result is device drivers that do most of their real work in their ISR. This exponentially lengthens the time the operating system spends running the ISR chain, causing the operating system to run for relatively long periods of time with interrupts off and no threads or deferred procedure calls (DPCs). Imagine trying to do real-time video manipulation when every interrupt causes a 500us delay in processing.
The previous examples assume that all device drivers are well behaved. However, if even one device driver returns FALSE from its ISR when its device is actually interrupting, then a system will hang as a result. The ACK causes the interrupt controller to generate another interrupt. The operating system runs the ISR chain forever.
APICs provide more interrupt resources, thus greatly reducing, if not removing, the need to share interrupts among hardware devices.
PIC (8259) Hardware Is Slow
The PIC interrupt controller has a built-in hardware priority scheme that is not appropriate for machines running operating systems based on Windows NT Technology. To address this problem, a different hardware priority scheme is used by the operating system.
When the operating system raises or lowers IRQL, a new mask is written into the 8259 that enables only the interrupts allowed at this IRQL. Therefore, raising or lowering IRQL causes either two "out" instructions or software simulation of one sort or another. Each of these I/O instructions causes bus cycles that must make it all the way to the South Bridge and back.
The Compaq Alpha interrupt controller has software levels that are used in Windows NT and Windows 2000 to cause DPC and APC interrupts. On Intel Architecture platforms using a PIC, this must be simulated because there is no hardware to cause the interrupts. Therefore, when the operating system drops below DISPATCH_LEVEL, the operating system must check to see whether a DPC has been queued. If it has, it must simulate an interrupt.
With an APIC device, the operating system can queue a DPC at any time by sending itself an interrupt at a priority that matches DISPATCH_LEVEL. Then, whenever it lowers the IRQL below DISPATCH_LEVEL, the DPC fires in hardware with no software intervention at all.
Furthermore, raising or lowering IRQL on an APIC is just a matter of adjusting the Task Priority Register in the local APIC. Because this is just a "mov" instruction that adjusts a register that is actually internal to the processor, it can happen much more quickly than multiple writes to the South Bridge. Keep in mind that every time anything is synchronized using any of the Win32 or Windows NT native synchronization primitives, the IRQL is changed at least twice.
APICs, therefore, provide better speed in interrupt handling.
PIC Hardware Can't Do Multiple Processors
The 8259 PIC is limited because it cannot be used in machines with multiple processors.
APIC Provides Superior Interrupt Architecture
APIC solves all the problems described earlier in this article. Interrupt latency can be drastically reduced by assigning a unique vector to each device. Operating systems can speed up all of their synchronization code. As an industry, we can increase debuggability and stability just by shortening the chains of ISRs.
Windows 95/98 both have a design requirement to support Microsoft MS-DOS device drivers. Because MS-DOS device drivers may assume that they can write directly to the 8259 PIC and its associated IDT entries, the APIC is, unfortunately, unsupportable on these operating systems.
Call to action for APICs: