Queuing a work item: Once is enough!

A work item is a structure of type IO_WORKITEM that is associated with a worker routine and a context area. A driver adds a work item to the system work queue; a system worker thread later removes the item and runs the associated routine at PASSIVE_LEVEL. However, queuing a work item that is already in the work queue can corrupt the queue and cause hard-to-detect problems in your driver.

How can queuing a work item corrupt the work queue?

Internally, the system maintains the queued work items in a doubly-linked list. When you call IoQueueWorkItem (or ExQueueWorkItem) to add an item to the queue, the system checks the links. On a checked build, the system asserts if the item is already linked into the list. On a free build, however, the item is added to the list, the function returns a success status, and the list becomes corrupted. At some later point, the corrupted list can cause serious and unpredictable problems. (In Windows Vista and Windows Server 2008, attempting to queue the work item a second time will cause bug check 0xE4, WORKER_INVALID.)

If you're familiar with deferred procedure call (DPC) queues, this behavior might surprise you because you might expect both types of queues to operate in the same way. However, DPC queues operate differently from work queues. If you try to queue a DPC that is already in the DPC queue, KeInsertQueueDpc simply returns FALSE. It does not queue the DPC a second time, and the DPC runs only once.

When can this problem occur?

If your driver queues the same work item each time a given driver routine runs, it might attempt to queue the work item again when it is still in the queue. The most likely scenario involves queuing the work item from a routine that runs at DISPATCH_LEVEL, although it can happen with work items that are queued at PASSIVE_LEVEL as well.

For example, DPCs are sometimes called repeatedly over a short period of time while a driver handles a series of interrupts. DPCs run at DISPATCH_LEVEL and typically queue work items to perform tasks later at PASSIVE_LEVEL. However, on a single-processor machine, the system does not drop IRQL to PASSIVE_LEVEL if any DPCs are in the queue waiting to run at DISPATCH_LEVEL (or if any other work remains at DISPATCH_LEVEL). Therefore, a system worker thread will not have the opportunity to run the work item if the DPC has already been queued a second time.

On a multiprocessor system, the problem is less likely, though still possible. As long as the DPC queue is not empty, the system schedules DPCs to run on the next available processor.

Consequently, on either single-processor or multiprocessor hardware, the DPC could run more than once before the system worker thread runs at all. If so, the original work item remains in the queue, and the queue becomes corrupted when the DPC tries to queue it again.

To run a work item, the system removes it from the queue and resets the links. After the work item has been removed from the queue, you can safely queue it again. Keep in mind that if you successfully queue the work item a second time, it could run concurrently on another processor or in a different thread context on a single processor. You must protect any shared, writable data that the work item routine uses.

Preventing work-item queue corruption in your driver

Because this error depends on timing, hardware activity, and transient conditions, detecting it in drivers can be quite difficult. One clue is a bug check that is caused by an invalid address when the function to dequeue the work item is on the stack and the worker routine has already freed the work item. The best approach, however, is to code your driver to prevent the problem.

To avoid corrupting the queue, the driver should keep track of whether the work item is in the worker queue. However, a driver cannot use an event or a similar synchronization mechanism to determine when the worker routine has started, because a queuing routine that runs at DISPATCH_LEVEL cannot wait for a nonzero period. Performing a zero-length wait in a loop does not work either. On a single-processor system, a zero-length wait causes a deadlock because the worker thread cannot run at PASSIVE_LEVEL until the queuing routine has finished running at DISPATCH_LEVEL.

Instead, the driver should use a technique like the following:

  • The driver maintains a list of work for the worker routine.
  • The worker routine performs all the work in the list each time it runs.
  • When new work arrives, the driver queues the work item only if the list was previously empty.
  • If the queuing routine requires data that the worker routine has processed, the driver can either create a context each time it queues the work item or maintain a history of changes by using a circular buffer or similar technique.

The SMB battery sample from the Windows DDK shows how to implement this technique. The device raises an alarm in response to certain conditions, such as a low battery, and the driver maintains a list of pending alarms. Each time the driver is notified of an alarm, the SmbBattAlarm function adds a record to the list. The worker routine (SmbBattWorkerThread) reads the alarm records from the list and processes them sequentially until the list is empty. The driver does not queue the work item every time it is notified of an alarm. Instead, it queues the work item only if the list was empty when the new alarm arrived.

The following functions are excerpted from the sample. The complete sample is installed with the Windows DDK at src\wdm\acpi\smbbatt\smbbatt.c.

VOID
SmbBattAlarm (
    IN PVOID    Context,
    IN UCHAR    Address,
    IN USHORT   Data
    )
{
    PSMB_ALARM_ENTRY        newAlarmEntry;
    ULONG                   compState;
    PSMB_BATT_SUBSYSTEM     subsystemExt  = 
                              (PSMB_BATT_SUBSYSTEM) Context;
    //
    // Allocate a new list structure from non-paged pool.
    //
    newAlarmEntry = ExAllocatePoolWithTag (NonPagedPool, 
          sizeof (SMB_ALARM_ENTRY), 'StaB');
    if (!newAlarmEntry) {
        BattPrint (BAT_ERROR, ("SmbBattAlarm:  couldn't"
                   "allocate alarm structure\n"));
        return;
    }
    newAlarmEntry->Data     = Data;
    newAlarmEntry->Address  = Address;
    //
    // Add this alarm to the alarm list
    //
    ExInterlockedInsertTailList(
        &subsystemExt->AlarmList,
        &newAlarmEntry->Alarms,
        &subsystemExt->AlarmListLock
    );
    //
    // Add 1 to the WorkerActive value.
    // If this is the first alarm queue a work item.
    //
    if (InterlockedIncrement
       (&subsystemExt->WorkerActive) == 1) {
            IoQueueWorkItem (subsystemExt->WorkerThread, 
                SmbBattWorkerThread, DelayedWorkQueue, 
                subsystemExt);
    }
    BattPrint(BAT_TRACE, ("SmbBattAlarm: EXITING\n"));
}

VOID
SmbBattWorkerThread (
    IN PDEVICE_OBJECT   Fdo,
    IN PVOID            Context
    )
{
    PSMB_ALARM_ENTRY        alarmEntry;
    PLIST_ENTRY             nextEntry;
    ULONG                   selectorState;
    ULONG                   batteryIndex;
    BOOLEAN                 charging = FALSE;
    BOOLEAN                 acOn = FALSE;
    PSMB_BATT_SUBSYSTEM     subsystemExt = 
                               (PSMB_BATT_SUBSYSTEM) Context;
    do {
        //
        // Check to see if we have more alarms to process.  
        // If so retrieve the next one and decrement the 
        // worker active count.
        //
        nextEntry = ExInterlockedRemoveHeadList(
                        &subsystemExt->AlarmList,
                        &subsystemExt->AlarmListLock
                    );
        ASSERT (nextEntry != NULL);
        alarmEntry = CONTAINING_RECORD (nextEntry, 
                     SMB_ALARM_ENTRY, Alarms);
        //
        // Code here handles the specific alarm conditions.
        // It's been deleted for this tip because it's 
        // device-specific.
        //
        . . . 
        //
        // Free the alarm structure
        //
        ExFreePool (alarmEntry);
    } while (InterlockedDecrement 
            (&subsystemExt->WorkerActive) != 0);
    BattPrint(BAT_TRACE, ("SmbBattWorkerThread: EXITING\n"));
}

The SmbBattAlarm function is called when a new alarm arrives. This function creates a record that describes the alarm, adds it to the alarm list, and increments the WorkerActive variable, which tracks the number of records in the list. SmbBattWorkerThread decrements this variable when it removes records from the list. Therefore, if the value of the variable is zero, the driver can be certain that the work item is not in the queue. The driver uses InterlockedXxx functions to increment and decrement the variable WorkerActive because both SmbBattAlarm and SmbBattWorkerThread change this value. In this example, SmbBattAlarm does not use data about the alarms that the SmbBattWorkerThread function has processed, so neither a circular buffer nor a context is required.

What should you do?

  • Keep track of whether the work item has been queued.
  • Design worker routines to perform all available work each time they run.
  • If your driver repeatedly queues a work item from a DPC (or other DISPATCH_LEVEL routine), do not use events or similar synchronization mechanisms to wait for the work item to complete.
  • If the work item and the routine that queues it share writable data, synchronize access to ensure that both routines operate on the correct data.
  • Allocate a context or maintain a history of changes if the queuing routine requires data that the work item has already processed.

System Worker Threads

Multiprocessor Considerations for Kernel-Mode Drivers

 

 

Send comments about this topic to Microsoft