October 2009

Volume 24 Number 10

Event Tracing for Windows - Core Instrumentation Events in Windows 7, Part 2

By Dr. Insung Park, Alex Bendetov | October 2009

Welcome back for the second part of our two-part article series: Core Instrumentation Events in Windows 7. In the first article, we presented a high-level overview of the Event Tracing for Windows (ETW) technology and core OS instrumentation. We also discussed tool support to obtain and consume OS events. In this second part, we continue to provide more details about the events from various subcomponents in the core OS. We also explain how the different system events can be combined to produce a comprehensive picture of system behavior, which we demonstrate by using a set of Windows PowerShell scripts.

Disk, File, File Details and Driver Events

From a program's perspective, operations such as opening, reading, or writing files are the way to access the contents on the disk. Due to optimizations such as caching and prefetching, not all file IO requests result in immediate disk access. Furthermore, file contents may be scattered across disks, and certain disk devices support mirroring and striping, and so on. For such cases, reading one block of data from a file translates into multiple accesses to one or more disks. The events for file and disk access account for file IO start, file IO completion, disk access start, disk access end, split IO, driver activities and file (name to unique key) maps.

A request from a user application to access a file and the corresponding completion of that request back to the user application travels through a stack of multiple components. In the Windows IO system, IO operations are tracked by an entity called an IO Request Packet (IRP). A user-initiated IO operation is turned into an IRP when it enters the IO Manager. As an IRP traverses a chain of components, each component performs necessary tasks to process the request, updates the IRP and passes it on, if necessary, to the component that will handle the request next. When all requirements of the IO request are satisfied (in a simple case, a requested block of a file is retrieved from a disk), registered completion routines are called to perform any additional processing of the data, and the requested data is returned to the user application.

At a higher layer in the core IO system, File IO events record the operations issued by an application. File IO events include the following types: Create, Read, Write, Flush, Rename, Delete, Cleanup, Close, Set Information, Query Information, Directory Enumeration, and Directory Change Notification. Operations such as Create, Read, Write, Flush, Rename and Delete are straightforward, and they contain data items such as file key, IO request packet (IRP) pointer, block size, and offset into the file, as necessary. Set Information and Query Information events indicate that file attributes were set or queried. A Cleanup event is logged when the last handle to the file is closed. A Close event specifies that a file object is being freed. Directory Enumeration and Directory Change Notification events are logged when a directory is enumerated or a directory change notification is sent out to registered listeners, respectively. File IO events are logged to ETW when the operation is requested. Those that are interested in the completion and duration of the file IO operations can enable File IO Completion events, which can be correlated to the original File IO events through IRP pointer. File IO Completion events record IRP pointer and return status.

Disk events are logged at a lower level in the IO stack, and they contain disk-access-specific information. Read and Write operations generate Disk Read and Write events containing disk number, transfer size, byte offset to the address being accessed, IRP pointer, and response time of the access. Flush events record disk flush operations. Unlike File IO events that are logged at the beginning of operations, Disk IO events are logged at the IO completion time. Users have the option to collect additional Disk IO Init events for all Disk IO events (ReadInit, WriteInit and FlushInit events). As mentioned earlier, not all File IO events have matching Disk IO events, if for instance the requested content is already available in the cache or a write to disk operation is buffered. Split IO events indicate that IO requests have been split into multiple disk IO requests due to the underlying mirroring disk hardware. Users without such hardware will not see Split IO events even if they enable them. It maps the original parent IRP into multiple child IRPs.

Disk IO, File IO and Split IO events contain unique file keys created for open files. This file key can be used to track related IO operations within the IO system. However, the actual file name for the file operation is not available in any File or Disk IO events. To resolve the name of the files, File Details events are needed. All open files are enumerated to record their file keys and names. In a simulated state machine, file objects are tracked in terms of file keys, to record file IO requests and actual disk accesses, and then names are updated in the objects when File Details events are encountered. For a historical reason, File Keys in Disk IO and File Details events are named FileObject. Most File IO events contain both file object and file key.

Driver events indicate activities in drivers, which, depending on the device type, may or may not overlap with disk IO activities. Driver events may be of interest to users familiar with the Windows Driver Model (WDM). The driver instrumentation adds events around driver IO function calls and completion routines. Driver events contain driver data such as file key, IRP pointer, and routine addresses (major and minor function and completion routine), as appropriate for individual event types.

IO events usually result in a very large volume of events, which may require increasing the number and/or size of the buffers for the kernel session (-nb option in logman). Also, IO events are useful in analyzing file usages, disk access patterns and driver activities. However, the process and thread id values of the IO events, with the exception of Disk IO events, are not valid. To correlate these activities correctly to the originating thread and thus to the process, one needs to consider tracking Context Switch events.

Network Events

Network events are logged when network activities occur. Network events from the kernel session are emitted at TCP/IP and UDP/IP layers. TCP/IP events are logged when the following actions take place: Send, Receive, Disconnect, Retransmit, Reconnect, Copy, and Fail. They all contain the packet size, source and destination IP addresses and ports, except for Fail events since no such information is available. Also, the originating process id for Send type events and the target process id for Receive type events are included because often these network operations are not performed under the originating/receiving process context. This means that Network events can be attributed to a process, but not to a thread. UDP/IP events are either Send or Receive events, which include the data items listed above. Finally, each operation type (Send, Receive and so on) corresponds to a pair of events: one for IPV4 protocol and the other for IPV6. Events for IPV6 protocol were added in Windows Vista.

Registry Events

Most registry operations are instrumented with ETW, and these events can be attributed to processes and threads through process and thread ids in the event header. The Registry event payload does not contain a full registry key name. Instead, it includes a unique key, noted as Key Control Block (KCB), for each open registry key. At the end of a kernel session, rundown events are logged that map these keys to full registry names. To resolve registry names, one needs to use a similar technique utilized for file name resolution, in which these map events are used to update registry objects in the state machine. Registry events have been useful in analyzing access patterns and identifying redundant accesses for optimization. The Registry event payload contains the return status of the operation, which can be used to monitor registry operation failures and troubleshoot possible application problems.

Figure 1 Get-ETW-PID-List Script for Printing Process Table

<####################
 # Get-ETW-PID-List
 # \@brief Takes in an XML stream and prints the list of process names
 # with PIDs that were detected in the incoming stream.
 ####################>
function Get-ETW-PID-List([Switch] $print) {
    begin {
        $pidlist = new-object System.Collections.Hashtable
        $procid = 0
        $procname = ""
    }
    process {
        $xmldata = $_
        
        # For each Process event, grab process id and binary name 
        # and add to the list. 
        foreach ($e in $xmldata.Events.Event) {
            if ($e.RenderingInfo.EventName.InnerText -eq "Process") {
                foreach ($d in $e.EventData.Data) {
                    # Grab process id.
                    if ($d.Name -eq "ProcessId") {
                       $procid = $d.InnerText
                    # Grab the process name.
                    } elseif ($d.Name -eq "ImageFileName") {
                        $procname = $d.InnerText
                    }
                }
                if (!$pidlist.Contains($procid)) {
                    $pidlist.add($procid, $procname) | Out-Null
                }
            }
        }
    }
    end {
        remove-variable xmldata
        if ($print) {
            "{0,-29}| PID" -f "Binary Name"
            "-------------------------------------"
            foreach ($item in $pidlist.Keys) {
                "{0,-30}({1})" -f $pidlist.$item,$item
            }
        } else {
            return $pidlist
        }
    }
}

Figure 2 Output of Get-ETW-PID-List Script

PS C:\tests> $xmldata | Get-ETW-PID-List -print
Binary Name                  | PID
-------------------------------------
dwm.exe                       (2776)
powershell.exe                (2384)
svchost.exe                   (708)
notepad.exe                   (4052)
iexplore.exe                  (4284)
...
iexplore.exe                  (4072)
svchost.exe                   (3832)
smss.exe                      (244)
System                        (4)
spoolsv.exe                   (1436)
Idle                          (0)

Sample-Based Profile Events

Sample-Based Profile events (Profile events, hereafter) allow tracking where CPUs spend their time. Profile events can be used to compute a good approximation of CPU utilization. When Profile events are enabled, the Windows kernel turns on profiling interrupts, which causes Profile events to be logged from each processor at a fixed rate (the default is 1000 times a second on each processor). It should be noted that in low-power mode on some machines, Profile events may be turned off temporarily. The Profile event payload consists of the thread id of the thread currently running on that processor and the value of the instruction pointer register at the time of the profiling interrupt. In Windows 7, the Profile event payload also contains flags needed to identify execution context (thread/DPC/ISR).

Figure 3 Get-ETW-PID-Info Script for Printing Process Details

<####################
 # Get-ETW-PID-Info 
 # \@brief Retrieves various information about a particular process id
 ####################>
function Get-ETW-PID-Info([Int] $p, [switch] $i, [switch] $t) {
    begin {
        $threadlist = New-Object System.Collections.ArrayList
        $imagelist = New-Object System.Collections.ArrayList
        $procname = ""
    }
    process {
        $xmldata = $_
        
        $sievedxml = $($xmldata | Get-ETW-PID-Events $p).$p
        foreach ($e in $sievedxml.Events.Event) {
            if ($e.RenderingInfo.EventName.InnerText -eq "Process") {
                foreach ($d in $e.EventData.Data) {
                    # Grab the process binary name 
                    if ($d.Name -eq "ImageFileName") {
                        $procname = $d.InnerText
                    }
                }
            }
            if ($e.RenderingInfo.EventName.InnerText -eq "Image") {
                foreach ($d in $e.EventData.Data) {
                    # Grab the loaded image name and add it to the list
                    if ($d.Name -eq "FileName") {
                        if (!$imagelist.contains($d.InnerText)) {
                            $imagelist.add($d.InnerText) | Out-Null
                        }
                    }
                }
            }
            if ($e.RenderingInfo.EventName.InnerText -eq "Thread") {
                foreach ($d in $e.EventData.Data) {
                    # Grab thread id and add it to the list
                    if ($d.Name -eq "TThreadId") {
                        $tid = $d.InnerText
                        if (!$threadlist.contains($tid)) {
                            $threadlist.add($tid) | Out-Null
                        }
                    }
                }
            }
        }
    }
    end {
        "Process Name: $procname"
        if ($t) {
            "Thread List: ($($threadlist.Count))"
            $threadlist | Sort -descending
        } else {
            "Thread Count: $($threadlist.Count)"
        }
        if ($i) {
            "Image List ($($imagelist.Count)):"
            $imagelist | Sort
        } else {
            "Images: $($imagelist.Count)"
        }
    }
}

CPU utilization can be approximated with Profile events, which is the percentage of Profile events with thread ids other than that of the idle thread. CPU usage distribution per process requires one more step of tracking thread ids in the Profile event payloads to individual processes. If a state machine is available with Process and Thread events as discussed in Part 1 of this two-part article series, it is straightforward to generate a per-process CPU usage report. It is also possible to trace CPU usage to a loaded module through Image Load events. Comparing the instruction pointer with the address range of loaded modules results in the location of instruction pointers to a binary image, and thus to a profile of CPU usage per module. If binary symbols are available, one can obtain function names from instruction pointers using the DbgHelp library. With call-stacks enabled for Profile events, one can even deduce how the function is invoked. This is very helpful when the instruction pointer points to a frequently used library function.

Ready Thread Events

There are several reasons why the system will switch out a running thread. One of the most common reasons is that it needs to wait for other threads to finish related tasks before it can continue. Such dependencies in execution surface as threads being blocked from execution while waiting for an object to be set, such as event, semaphore, timer and so on. The Windows OS scheduler (also known as dispatcher) keeps track of threads becoming unblocked by another thread, a DPC, or a timer. Dependencies between components and threads are not readily seen and predicted during development. When unforeseen delays take place, it becomes difficult to track it to the root cause of the problem.

Ready Thread (or Dispatcher) events were added to help diagnose such problems. When a thread is unblocked (or readied) and placed in the dispatcher ready queue, a Ready Thread event is logged, whose payload contains the thread id of the readied thread. By following up the chain of Ready Thread events, one can track the chain of execution dependency as it unfolds. Context Switch events, presented above, indicate when the readied thread actually is scheduled to run. Call-stacks enabled for Ready Thread events may lead to the point in the code responsible for unblocking of the waiting thread. Ready Thread events are newly available in Windows 7.

System Call Events

System Call events denote entries into and exits out of Windows core OS system calls. System calls are the interface into the Windows kernel, consisting of a number of APIs. This instrumentation was added to monitor system calls made by user mode applications and kernel mode components. There are two types of System Call events. A System Call Enter event indicates an invocation of a system call and logs the address of the routine corresponding to the invoked system service. A System Call Exit event indicates an exit from a system call, and its payload contains the return value from the API call. System Call events are useful for statistical analysis on system call activities and latencies, but like some of the IO and Memory events, they are logged without current process and thread id information in the header. To correlate system call activities with processes and threads, one must collect Context Switch events at the same time and, during state machine simulation, track the current thread running on the CPUs using the CPU id in the event header of the System Call events, as described in Part 1 of this two-part article series.

Advanced Local Procedure Call Events

Local Procedure Call (LPC) has been an efficient local Inter-Process Communication (IPC) mechanism on Windows platforms for years. Windows Vista provided a more efficient and secure means for IPC needs, called Advanced Local Procedure Call (ALPC). ALPC is also used as the transport mechanism for local Remote Procedure Call (RPC). The ALPC component is instrumented with ETW, emitting Send, Receive, Wait for New Reply, Wait for New Message and Unwait events.

System Configuration Events

When the kernel session ends, ETW logs several events that describe the system configuration of the machine on which the events are collected. In many performance analysis scenarios, knowing the underlying hardware and software configuration helps greatly in understanding the system behavior, especially when the machine on which the events are analyzed is different from the machine where the events were collected. These System Configuration events provide information on CPU, graphics cards, network cards, PnP devices, IDE devices, physical and logical disk configuration, and services. The System Configuration event payload varies depending on the device or configuration that it describes but typically contains description strings and key specifications.

Figure 4 Output from Get-ETW-PID-Info Script

PS C:\tests> $xml | Get-ETW-PID-Info 4052 -i
Process Name: notepad.exe
Thread Count: 2
Image List (31):
\Device\HarddiskVolume2\Windows\System32\advapi32.dll
\Device\HarddiskVolume2\Windows\System32\dwmapi.dll
\Device\HarddiskVolume2\Windows\System32\gdi32.dll
\Device\HarddiskVolume2\Windows\System32\imm32.dll
\Device\HarddiskVolume2\Windows\System32\kernel32.dll
\Device\HarddiskVolume2\Windows\System32\KernelBase.dll
...
\Device\HarddiskVolume2\Windows\System32\version.dll
\Device\HarddiskVolume2\Windows\System32\winspool.drv

Figure 5 Get-ETW-Top-VMops Script for Printing Top n Processes Invoking the Most VM Operations

<####################
 # Get-ETW-Top-VMops
 # \@brief Gets the top $num processes ranked by the number of virtual 
   memory events
 ####################>
function Get-ETW-Top-VMops ([Int] $num) {
    begin {
        $hold = @{}
    }
    process {
        $xmldata = $_
        # Get a list of the PIDs
        $list = $xmldata | Get-ETW-PID-List
       
        # Iterate through each PID
        $eventxml = $xmldata | Get-ETW-PID-Events $list.Keys
        foreach ($pd in $list.Keys) {
            $vmops = 0
            # Count the number of VM events
            foreach ($e in $eventxml.$pd.Events.Event) {
                [String] $type = $e.RenderingInfo.EventName.InnerText
                [String] $opcode = $e.RenderingInfo.Opcode
                if ($type -eq "PageFault") {
                    if ($opcode -eq "VirtualAlloc" –or 
                      $opcode -eq "VirtualFree") {
                        $ vmops++
                    }
                }
            }
            $hold.$pd = $vmops
        }
    }
    end {
        "{0,-20}|{1,-7}| VM Event Count" -f "Binar"," PID"
        "--------------------------------------------------------"
        
        $j = 0
        foreach ($e in ($hold.GetEnumerator() | Sort Value  -Descending)) {
            if ($j -lt $num) {
                $key = $e.Key
                "{0,-20} ({1,-6} {2}" -f $list.$key,"$($e.Key))",$e.Value
            } else {
                return
            }
            $j++
        }
    }
}

Simple Core OS Event Analysis Samples

In this section, we present simple scripts to demonstrate a few basic accounting techniques on some of the OS events introduced previously. We will use Windows Powershell scripts for simplicity, but the underlying algorithms can be adopted in applications for more efficient processing. Once an XML dump is created by the tracerpt tool, one can import the events as objects in Windows Powershell using the following commands:

> $xmldata = New-Object System.Xml.XmlDocument
> $xmldata.Load(<xml dump file>)

The first Windows Powershell script in Figure 1 simply prints all the processes in the log file by scanning for Process events. It updates a hash table with a process id and name pair and prints out the table at the end if the –print option is specified. By default, it passes along the array of pairs to a pipe so that other scripts can use its output. Please note that we skipped conventional handling of arguments, comments and error checks to simplify the sample scripts. We also assume process and thread ids are not recycled during the kernel session in this script, but they do get recycled in reality. Additional codes are needed for correct processing.

The output of this script, if the xml dump contains Process events, is as shown in Figure 2. The goal of the next script is to build a basic state machine and, given a process ID, to print the number of threads and a list of loaded images in that process. If the –i or –t options are given, the script prints the names of loaded images and the ids of threads in the process, instead of the counts of images and threads. Figure 3 shows the script Get-ETW-PID-Info written for this functionality. This script calls another script called Get-ETW-PID-Events (that we do not show here) that picks up only the events relevant to the given process ID.

Looking for the notepad process from Figure 2, we get the following output from the Get-ETW-PID-Info script. Figure 4 lists all the modules loaded by notepad.exe.

Finally, we are going to print a table with top n processes with the most virtual memory operations. In this script called Get-ETW-Top-VMops, shown in Figure 5, we run Get-ETW-PID-List to construct a list of process IDs. Then we filter events by each process ID and count VM events. Last, we sort and print the table with top n processes.

The output from this script with top 10 processes with the most virtual memory operations is given in Figure 6. Note that two instances of logman.exe show up in the table, one for starting and the other for stopping the kernel session.

Figure 6 The Output From Get-ETW-Top-VMops

PS C:\tests> $xml | Get-ETW-Top-Vmops 10
Binary              | PID   | VM Event Count
--------------------------------------------------------
svchost.exe          (536)   486
iexplore.exe         (3184)  333
svchost.exe          (1508)  206
logman.exe           (2836)  98
svchost.exe          (892)   37
sidebar.exe          (3836)  37
logman.exe           (1192)  19
svchost.exe          (2052)  18
audiodg.exe          (7048)  18
services.exe         (480)   13

Enhance Your Tool Capability

Windows 7 features hundreds of event providers from various components. In this two-part article series, we have presented the set of core OS ETW events available on Windows 7 and the analysis techniques that we have used for many years. Individual events indicate certain activities in the core OS, but if combined through context-sensitive analysis methods, they can be used to produce meaningful reports that provide insights into patterns and anomalies in resource usage. Moreover, we know of quite a few cases in which effective and careful examination of a subset of these events resulted in great findings in specialized fields of study. Applications instrumented with ETW can benefit even further from precise correlation of application and OS activities; for instance, if an application emits events indicating a beginning and an end of an important application activity, core OS events collected at the same time contains accurate OS resource usage information attributed to that activity. Management and performance tool developers can take advantage of the core system events and various analysis techniques to enhance their tool capability, which will in turn benefit IT workers. Application developers may be able to diagnose application problems better if such analysis is incorporated in their environments. We hope that our two-part article series will lead to the promotion of sound engineering practice, greater software quality and better user experiences.


Dr. Insung Parkis a development manager for the Windows Instrumentation and Diagnosis Platform Team. He has published a dozen papers on performance analysis, request tracking, instrumentation technology, and programming methodology and support. His e-mail address is insungp@microsoft.com.

Alex Bendetovis a development lead for the Windows Instrumentation and Diagnosis Platform Team. He works on both Event Tracing for Windows and the Performance Counter technologies. He can be reached at alexbe@microsoft.com.