PCI Multi-Level Rebalance in Windows Vista
Updated: November 25, 2003
On This Page
The Windows Vista operating system will include support for resource rebalancing at multiple levels across a PCI hierarchy. Windows Vista multi-level rebalancing supports systems that use multiple PCI-to-PCI bridge levels, including CardBus bridges, and also enhances support for hot-plug technologies. However, multi-level rebalancing can expose problems in device drivers that do not handle Plug and Play I/O request packets (IRPs) correctly.
This paper discusses potential driver problems and Windows Vista multi-level rebalancing support, and also provides implementation details for driver developers.
When the system BIOS does not configure PCI-to-PCI bridge device resource windows during the power-on self-test (POST) in Windows 2000, Windows XP, and Windows Server 2003 operating systems, the operating system assigns static default resource ranges to the bridge device without considering the requirements of the devices behind the bridge device.
As soon as the resource windows are configured with static ranges, the resource windows cannot be changed at run time. Because Windows allocates resources without examining the devices that are behind the PCI bridge, the operating system cannot ensure that the selected resource ranges will adequately support all devices behind the bridge. If the default allocation proves to be too small to support the resource requirements for all devices behind the bridge, some of the devices will not work properly. Devices might fail to start, and will appear in Device Manager marked as problem code 12, "insufficient resources."
Insufficient bridge resources appear when the platform BIOS cannot assign appropriate PCI-bridge resource windows during POST. Systems supporting hot-plug PCI devices are particularly problematic. When a PCI device can be hot-plugged behind a PCI bridge at run time, it is impossible for the BIOS to ascertain during POST how large a bridge resource window must be to accommodate a device. Additionally, a PCI bridge device might be hot plugged in certain situationsfor example, in generic docking solutions that connect through CardBus adapters, or in the I/O expansion chassis often used in large server systems.
Systems with deep bridge hierarchies can cause problems too. The complexity of assigning a complete address map to all PCI bridges in a system might be beyond the scope of the system BIOS, and the BIOS might not attempt to configure all bridge devices, thereby leaving some bridges unconfigured. The unconfigured bridges will be assigned default resource-window ranges, and are then susceptible to resource problems.
The previous scenarios are exacerbated with the emergence of PCI Express. PCI Express defines many bridge devices that are used to represent ports, thereby making complex bridge hierarchies more prevalent. Additionally, PCI Express hot-plug will be widely used in desktop and workstation client systems, so hot-plug scenarios will not be limited to large servers.
PCI Multi-Level Rebalance in Windows Vista
Windows Vista will include support to rebalance resources at multiple levels across a PCI hierarchy. PCI multi-level rebalance will enable Windows to support systems with several levels of PCI-to-PCI bridge devices, including CardBus bridges, and will also enhance support for hot-plug technologies. PCI multi-level rebalance supports dynamic reconfiguration of the location and range lengths of PCI-to-PCI bridges that are forwarded to resource windows at run time. Dynamic reconfiguration implies that a PCI bridge, whose resource windows are not being consumed by devices behind the bridge, might lose resources to another bridge that needs supplemental resources.
Windows Vista systems will have greater control and flexibility over device resource assignments with PCI multi-level rebalancing, and the number of device configuration failures due to insufficient resources will be reduced. With multi-level rebalance support, the operating system can aggressively reclaim resources from bridges, and then rebalance resources on bridges that are not configured by the BIOS.
Multi-level rebalancing in Windows Vista will reduce the need to allocate large resource apertures for devices that appear at run time. Windows Vista will allocate smaller optimal resource apertures, and then perform a resource rebalance when additional resources are required.
Multi-level rebalance has system-wide implications for devices that reside behind bridges, and for BIOS configuration of bridges and devices. For multi-level rebalance support in Windows to work successfully, system device drivers and platform firmware must be designed and tested with the appropriate level of support. The following sections describe best practices for driver and BIOS developers to realize effective system-wide support for multi-level PCI resource rebalancing.
Multi-Level Rebalance Device Driver Requirements
Windows device drivers must be carefully designed and thoroughly tested to properly support Windows Driver Model (WDM) stop semantics. The Stop request is an operational pause where the driver takes necessary steps to suspend an operation until the assignment of new resources is completed.
Note: The rebalance process must remain transparent to applications and end users. End users might experience a temporary delay in a device’s operation, but data must never be lost.
Several events can trigger a resource rebalance request. For example, the system can fail to assign resources to a device behind a bridge whose resource windows are not opened wide enough to support the resource request. A device might also report that its resource requirements have changed at run time. Windows can also determine that a PCI bridge was configured to boot with a resource window that is larger than necessary, then Windows will reduce the size of the resource window and release associated resources.
Handling Stop Semantics
Upon initiating a rebalance sequence, the Windows Plug and Play manager issues an IRP_MN_QUERY_STOP_DEVICE request to determine whether the driver for a device can stop the device and release its hardware resources. Devices that respond to the Query Stop request can potentially participate in the rebalance process. If a child device fails to respond to this request, the parent device cannot be stopped, and the resource rebalance attempt will fail.
After all devices involved in the rebalance request have succeeded the Query Stop request, the kernel Plug and Play manager computes new resource assignments. If a new configuration set can be determined, the kernel Plug and Play manager will send an IRP_MN_STOP_DEVICE request to stop the devices that succeeded the IRP_MN_QUERY_STOP request.
Note: Drivers based on the port and miniport driver model, such as Videoprt.sys and NDIS, might expose the IRP request sequences to class miniport drivers in a different manner. For more information, see "Layered Driver Architecture" in "Getting Started with Windows Drivers," and the introduction to the port class model for the specific device type, such as audio or storage, in the Microsoft Windows Driver Development Kit (DDK).
Typical driver actions during the stop request should:
Note: Stop requests cannot be failed by drivers.
Handling Restart Requests
Once new resource assignments are committed by kernel Plug and Play resource arbiters, the kernel Plug and Play manager issues the IRP_MN_START_DEVICE request to restart the devices that stopped during the rebalance. This start request will contain the new resources assigned to the device. Devices currently stopped are then restarted with the new assigned resources.
Device drivers must be prepared to handle the second instance of IRP_MN_START_DEVICE with new resource assignments. Pay attention to drivers that continue to use old resources. Additionally, drivers should only initialize I/O queues, internal state machines, or hardware logic on the first instance of IRP_MN_START_DEVICE.
If a driver fails to restart the device after the resources are rebalanced, the kernel Plug and Play manager sends remove IRPs to the device stack (on Windows 2000 and later systems). The kernel Plug and Play manager first sends an IRP_MN_SURPRISE_REMOVAL request. Then the Plug and Play manager sends an IRP_MN_REMOVE_DEVICE request, but only after all open handles to the device are closed.
Handling Cancel Requests
In the event the resource rebalance request cannot succeed, the kernel Plug and Play manager will cancel a Query Stop IRP by sending an IRP_MN_CANCEL_STOP_DEVICE request. In response, the drivers for a device should return the device to the started state and resume processing I/O requests for the device.
The kernel Plug and Play manager cancels the Query Stop IRP for a device stack if one driver in the stack failed the request, or if the overall rebalance operation failed and the Query Stop request is cancelled. When the kernel Plug and Play manager cancels the Query Stop request on just one device stack, it sends the IRP_MN_CANCEL_STOP_DEVICE request because any drivers attached above the driver that failed the query have the device in the stop-pending state. When the IRP_MN_CANCEL_STOP_DEVICE succeeds, drivers should have returned the device to the started state.
Testing Drivers for Rebalance Scenarios
Until driver developers have access to Windows Vista builds that support multi-level rebalancing, the following strategies can be used to exercise relevant rebalance code paths.
Note: For details on implementing and responding to Plug and Play IRPs, see "Handling IRPs" in the Windows DDK.
Plug and Play Driver Test
The Windows DDK includes a Plug and Play Driver test tool called Pnpdtest.exe. Driver writers can test various Plug and Play-related code paths in their driver and user-mode components with Pnpdtest.exe.
Pnpdtest.exe requires drivers to handle almost all of the Plug and Play IRPs. Three specific areas Removal, Rebalance, and Surprise Removal can be tested separately or together (for example, stress). This process is accomplished using a combination of user-mode API calls (by means of the test application), and kernel-mode API calls (by means of an upper-filter driver). Rebalance is tested by selecting the target device and clicking the Test Rebalance button. This process encompasses the IRP_MN_QUERY_STOP_DEVICE, IRP_MN_STOP_DEVICE, and IRP_MN_CANCEL_STOP_DEVICE requests. For more information about using Pnpdtest.exe, see the DDK documentation at %WinDDK_dir%\tools\pnpdtest\readme.htm.
Adding a Test Interface to Your Driver
You can implement a test interface in your driver code that, when called by a user-mode test application, calls the IoInvalidateDeviceState routine. In response, the kernel Plug and Play manager calls your device stack with the IRP_MN_QUERY_PNP_DEVICE_STATE request. Your driver should then set Irp->IoStatus.Status to STATUS_SUCCESS, and set Irp->IoStatus.Information to a PNP_DEVICE_STATE bitmask with the PNP_DEVICE_FAILED | PNP_DEVICE_RESOURCE_REQUIREMENTS_CHANGED flags set. The kernel Plug and Play manager will then call your device stack with the IRP_MN_QUERY_STOP request. You can then test your Query Stop logic and handling functionality.
The kernel Plug and Play manager will then send the IRP_MN_FILTER_RESOURCE_REQUIREMENTS request to a function driver. If the function driver handles this IRP, it must set a completion routine and handle the IRP on its way back up the device stack. In this completion routine, you should add a small memory resource requirement (for example, 1 byte) to the device’s resource requirements list. Irp->IoStatus.Information should point to an IO_RESOURCE_REQUIREMENTS_LIST containing the hardware resource requirements modified for test purposes. The kernel Plug and Play manager will send an IRP_MN_STOP_DEVICE request, then you can test your stop code. Finally, the kernel Plug and Play manager sends the IRP_MN_START_DEVICE request, then you can test how well your driver code handles a second Start IRP.
Note: Remove the 1-byte test memory resource added for testing before sending IRP_MN_START_DEVICE down to the driver’s PDO.
BIOS Implementation Notes
Windows Vista is designed to maintain the resource allocations of a bridge configured by the BIOS, and not change its resource aperture. Windows can expand the resource window as part of a rebalance, but it will not reduce the window if the BIOS assigned a particular window size for a valid reason. For example, the BIOS may know that a particular bridge exposes many hot-pluggable slots behind it, and that it needs to assign sufficient resources to support hot-plugging of devices.
However, if many PCI bridge devices are configured by the BIOS, the operating system’s multi-level rebalance algorithm will not have sufficient flexibility and might not work properly. This is particularly problematic for I/O resource windows, since I/O port resources are highly constrained.
Windows Vista will attempt to adjust all other bridges before it rebalances any resources that are associated with boot-configured bridges. Since query stop on a boot device is likely to fail, the system BIOS should not configure bridges without including boot devices that are situated behind the bridge. Boot device customization is often necessary in Windows Server 2003 and earlier operating systems. However, this customization limits resource assignments to all resource consumers.
The system BIOS, after evaluating all non-boot devices, should configure all PCI bridge devices with boot devices that are situated behind the bridge. Since the operating system will most likely be unable to increase the bridge window aperture, the system BIOS should also verify that sufficient resources are available to account for all devices.
When the system BIOS configures multiple PCI bridge devices, the operating system’s ability to successfully execute multi-level rebalance is compromised. However, configuring multiple bridge devices is essential on existing operating systems, without multi-level rebalance support, in order to avoid problems with devices that fail to start due to insufficient resources. System BIOS developers must seek to balance these considerations, applying the appropriate policy based on the target system.
Call to Action and Resources
Call to Action
Microsoft Windows Driver Development Kit (DDK)
Windows WDK Kernel-Mode Driver Architecture Plug and Play