December 2011

Volume 26 Number 12

Sysinternals ProcDump v4.0 - Writing a Plug-in for Sysinternals ProcDump v4.0

By Andrew Richards | December 2011

You’ve been up all night installing the latest upgrade of your mission-critical application and everything has gone perfectly. And then it happens—the application hangs, just as everyone starts to arrive at work. At times like this, you need to cut your losses, accept that the release is a failure, gather relevant evidence as fast as possible, and then start that ever-important rollback plan.

Capturing a memory dump of an application at times like this is a common troubleshooting tactic, whether for hang, crash or performance reasons. Most dump capture tools take an all-or-nothing approach: they give you everything (full dumps) or very little (mini dumps). The mini dumps are usually so small that fruitful debug analysis isn’t possible because the heaps are missing. Full dumps have always been preferred, but they’re rarely an option anymore. Ever-increasing memory means that a full dump can take 15, 30 or even 60 minutes. Moreover, the dump files are becoming so big that they can’t easily be moved around for analysis, even when compressed.

Last year, Sysinternals ProcDump v3.0 introduced the MiniPlus switch (-mp) to address the size issue for native applications. This creates a dump that’s somewhere between a mini dump and a full dump in size. The MiniPlus switch’s memory inclusion decisions are based on a multitude of heuristic algorithms that consider memory type, memory protection, allocation size, region size and stack contents. Depending on the layout of the target application, the dump file can be 50 percent to 95 percent smaller than a full dump. More important, the dump is as functional as a full dump for most analysis tasks. When the MiniPlus switch is applied to the Microsoft Exchange 2010 Information Store running with 48GB of memory, the result is a 1GB–2GB dump file (a 95 percent reduction).

Mark Russinovich and I have been working on a new release of ProcDump that now lets you make the memory inclusion decisions. Sysinternals ProcDump v4.0 exposes the same API that MiniPlus uses internally as an external DLL-based plug-in via the -d switch.

In this article, I’m going to dissect how Sysinternals ProcDump v4.0 works by building a series of sample applications that expand on each other, implementing more and more of the ProcDump functionality. By delving into how ProcDump works under the covers, I’ll show you how you can write a plug-in that interacts with ProcDump and the underlying DbgHelp API.

The code download contains the sample applications and also a collection of applications that crash in various ways (so you can test your code). The MiniDump05 sample has all of the APIs implemented as a standalone application. The MiniDump06 sample implements the MiniDump05 sample as a plug-in for Sysinternals ProcDump v4.0.

Terminology

It’s easy to get all the terms associated with dump collection confused—the term “Mini” is used a lot. There is the MiniDump file format, Mini and MiniPlus dump contents, and the MiniDumpWriteDump and MiniDumpCallback functions.

Windows supports the MiniDump file format via DbgHelp.dll. A MiniDump dump file (*.dmp) is a container that supports partial or complete capture of the memory in a user mode or kernel mode target. The file format supports the use of “streams” to store additional metadata (comments, process statistics and so forth). The file format’s name is derived from the requirement to support the capture of a minimal amount of data. The DbgHelp API MiniDumpWriteDump and MiniDumpCallback functions are prefixed with MiniDump to match the file format they produce.

Mini, MiniPlus and Full are used to describe the different amounts of content in the dump files. Mini is the smallest (minimal) and includes the process environment block (PEB), thread environment blocks (TEBs), partial stacks, loaded modules and data segments. Mark and I coined MiniPlus to describe the contents of a Sysinternals ProcDump -mp capture; it includes the contents of a Mini dump, plus memory heuristically deemed important. And a Full dump (procdump.exe -ma) includes the entire virtual address space of the process, regardless of whether the memory is paged in to RAM.

MiniDumpWriteDump Function

To capture a process in MiniDump file format to a file, you call the DbgHelp MiniDumpWriteDump function. The function requires a handle to the target process (with PROCESS_QUERY_INFORMATION and PROCESS_VM_READ access), the PID of the target process, a handle to a file (with FILE_GENERIC_WRITE access), a bitmask of MINIDUMP_TYPE flags, and three optional parameters: an Exception Information structure (used to include an exception context record); a User Stream Information structure (commonly used to include a comment in the dump via the CommentStreamA/W MINIDUMP_STREAM_TYPE types); and a Callback Information structure (used to modify what’s captured during the call):

BOOL WINAPI MiniDumpWriteDump(
  __in  HANDLE hProcess,
  __in  DWORD ProcessId,
  __in  HANDLE hFile,
  __in  MINIDUMP_TYPE DumpType,
  __in  PMINIDUMP_EXCEPTION_INFORMATION ExceptionParam,
  __in  PMINIDUMP_USER_STREAM_INFORMATION UserStreamParam,
  __in  PMINIDUMP_CALLBACK_INFORMATION CallbackParam
);

The MiniDump01 sample application (see Figure 1) shows how you call MiniDumpWriteDump to take a Mini dump without any of the optional parameters. It starts by checking the command-line arguments for a PID and then calls OpenProcess to get a process handle of the target. It then calls CreateFile to get a file handle. (Note that MiniDumpWriteDump supports any I/O target.) The file has an ISO date/time-based filename for uniqueness and chronological sorting: C:\dumps\minidump_YYYY-MM-DD_HH-MM-SS-MS.dmp. The directory is hardcoded to C:\dumps to ensure write access. This is required when doing post mortem debugging because the current folder (for example, System32) might not be writable.

Figure 1 MiniDump01.cpp

// MiniDump01.cpp : Capture a hang dump.
//
#include "stdafx.h"
#include <windows.h>
#include <dbghelp.h>
int WriteDump(HANDLE hProcess, DWORD dwProcessId, HANDLE hFile, MINIDUMP_TYPE miniDumpType);
int _tmain(int argc, TCHAR* argv[])
{
  int nResult = -1;
  HANDLE hProcess = INVALID_HANDLE_VALUE;
  DWORD dwProcessId = 0;
  HANDLE hFile = INVALID_HANDLE_VALUE;
  MINIDUMP_TYPE miniDumpType;
  // DbgHelp v5.2
  miniDumpType = (MINIDUMP_TYPE) (MiniDumpNormal | MiniDumpWithProcessThreadData |
    MiniDumpWithDataSegs | MiniDumpWithHandleData);
  // DbgHelp v6.3 - Passing unsupported flags to a lower version of DbgHelp
     does not cause any issues
  miniDumpType = (MINIDUMP_TYPE) (miniDumpType | MiniDumpWithFullMemoryInfo |
    MiniDumpWithThreadInfo);
  if ((argc == 2) && (_stscanf_s(argv[1], _T("%ld"), &dwProcessId) == 1))
  {
    // Generate the filename (ISO format)
    SYSTEMTIME systemTime;
    GetLocalTime(&systemTime);
    TCHAR szFilename[64];
    _stprintf_s(szFilename, 64, _T("c:\\dumps\\minidump_%04d-%02d-
      %02d_%02d-%02d-%02d-%03d.dmp"),
        systemTime.wYear, systemTime.wMonth, systemTime.wDay,
        systemTime.wHour, systemTime.wMinute, systemTime.wSecond,
        systemTime.wMilliseconds);
    // Create the folder and file
    CreateDirectory(_T("c:\\dumps"), NULL);
    if ((hFile = CreateFile(szFilename, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS,
      FILE_ATTRIBUTE_NORMAL, NULL)) == INVALID_HANDLE_VALUE)
    {
      _tprintf(_T("Unable to open '%s' for write (Error: %08x)\n"), szFilename,
        GetLastError());
      nResult = 2;
    }
    // Open the process
    else if ((hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, dwProcessId)) == NULL)
    {
      _tprintf(_T("Unable to open process %ld (Error: %08x)\n"), dwProcessId,
        GetLastError());
      nResult = 3;
    }
    // Take a hang dump
    else
    {
      nResult = WriteDump(hProcess, dwProcessId, hFile, miniDumpType);
    }
    if (hFile) CloseHandle(hFile);
    if (hProcess) CloseHandle(hProcess);
    if (nResult == 0)
    {
      _tprintf(_T("Dump Created - '%s'\n"), szFilename);
    }
    else
    {
      DeleteFile(szFilename);
    }
  }
  else
  {
    _tprintf(_T("Usage: %s <pid>\n"), argv[0]);
    nResult = 1;
  }
  return 0;
}
int WriteDump(HANDLE hProcess, DWORD dwProcessId, HANDLE hFile, MINIDUMP_TYPE miniDumpType)
{
  if (!MiniDumpWriteDump(hProcess, dwProcessId, hFile, miniDumpType, NULL, NULL, NULL))
  {
    _tprintf(_T("Failed to create hang dump (Error: %08x)\n"), GetLastError());
    return 11;
  }
  return 0;
}

The DumpType parameter is a MINIDUMP_TYPE-based bitmask that causes the inclusion (or exclusion) of particular types of memory. The MINIDUMP_TYPE flags are quite powerful and allow you to direct the capture of lots of memory regions without the need for additional coding via a callback. The options used by the MiniDump01 sample are the same as used by ProcDump. They create a (Mini) dump that can be used to summarize a process.

The DumpType always has the MiniDumpNormal flag present because it has a value of 0x00000000. The DumpType used includes every stack (MiniDumpNormal), all PEB and TEB information (MiniDumpWithProcessThreadData), the loaded module information plus any globals (MiniDumpWithDataSegs), all handle information (MiniDumpWithHandleData), all memory region information (MiniDumpWithFullMemoryInfo) and all thread time and affinity informaton (MiniDumpWithThreadInfo). With these flags, the dump created is a rich version of a Mini dump but it’s still quite small (less than 30MB even for the biggest of appli­cations). Example debugger commands supported by these MINIDUMP_TYPE flags are listed in Figure 2.

Figure 2 Debugger Commands

MINIDUMP_TYPE Debugger Commands
MiniDumpNormal knL99
MiniDumpWithProcessThreadData !peb, !teb
MiniDumpWithDataSegs lm, dt <global>
MiniDumpWithHandleData !handle, !cs
MiniDumpWithFullMemoryInfo !address
MiniDumpWithThreadInfo !runaway

When using MiniDumpWriteDump, the dump taken will match the architecture of the capture program, not the target, so use a 32-bit version of your capture program when capturing a 32-bit process, and a 64-bit version of your capture program when capturing a 64-bit process. If you need to debug “Windows 32-bit on Windows 64-bit” (WOW64), you should take a 64-bit dump of the 32-bit process.

If you don’t match the architecture (by accident or on purpose), you’ll have to change the effective machine (.effmach x86) in the debugger to gain access to the 32-bit stacks in a 64-bit dump. Note that lots of debugger extensions fail in this scenario.

Exception Context Record

Microsoft Support engineers use the terms “hang dump” and “crash dump.” When they ask for a crash dump, they want a dump with an exception context record. When they ask for a hang dump, they (usually) mean a series of dumps without one. A dump with exception information isn’t always from the time of a crash, though; it can be from any time. The exception information is just a means to provide additional data in a dump. The user stream infor­mation is similar to the exception information in this respect.

An exception context record is the combination of a CONTEXT structure (the CPU registers) and an EXCEPTION_RECORD structure (the exception code, the instruction’s address and so on). If you include an exception context record in the dump and run .ecxr, the current debugger context (thread and register state) is set to the instruction that raised the exception (see Figure 3).

Figure 3 Changing Context to the Exception Context Record

This dump file has an exception of interest stored in it.

The stored exception information can be accessed via .ecxr.

(17cc.1968174c.6f8): Access violation - code c0000005 (first/second chance not available)
eax=00000000 ebx=001df788 ecx=75ba31e7 edx=00000000 esi=00000002 edi=00000000
eip=77c7014d esp=001df738 ebp=001df7d4 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
ntdll!NtWaitForMultipleObjects+0x15:
77c7014d 83c404          add     esp,4
0:000> .ecxr
eax=00000000 ebx=00000000 ecx=75ba31e7 edx=00000000 esi=00000001 edi=003b3374
eip=003b100d esp=001dfdbc ebp=001dfdfc iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
CrashAV_x86!wmain+0x140xd:
00000001`3f251014 45891b003b100d 8900            mov     dword ptr [r11],r11deax],eax  ds:002b:00000000`00000000=????????=????????

To support .ecxr, the optional MINIDUMP_EXCEPTION_­INFORMATION structure needs to be passed to MiniDumpWriteDump. You can get exception information at run time or at post mortem.

Runtime Exceptions

If you implement a debugger event loop, exception information is passed to you when the exception occurs. The debugger event loop will receive an EXCEPTION_DEBUG_EVENT structure for breakpoints, first chance exceptions and second chance exceptions.

The MiniDump02 sample application shows how to call MiniDumpWriteDump from within a debugger event loop so that the second chance exception’s context record is included in the dump (equivalent to “procdump.exe -e”). This functionality runs when you use the -e switch. Because the code is quite long, the pseudo code for the application is shown in Figure 4. Refer to this article’s code download for the full source code.

Figure 4 MiniDump02 Pseudo Code

Function Main
Begin
  Check Command Line Arguments
  CreateFile(c:\dumps\minidump_YYYY-MM-DD_HH-MM-SS-MS.dmp)
  OpenProcess(PID)
  If "–e" Then
    DebugEventDump
    TerminateProcess(Process)
  Else
    WriteDump(NULL)
  CloseHandle(Process)
  CloseHandle(File)
End
Function WriteDump(Optional Exception Context Record)
Begin
  MiniDumpWriteDump(Optional Exception Context Record)
End
Function DebugEventDump
Begin
  DebugActiveProcess(PID)
  While (Not Done)
  Begin
    WaitForDebugEvent
    Switch (Debug Event Code)
    Begin
    Case EXCEPTION_DEBUG_EVENT
      If EXCEPTION_BREAKPOINT
        ContinueDebugEvent(DBG_CONTINUE)
      Else If "First Chance Exception"
        ContinueDebugEvent(DBG_EXCEPTION_NOT_HANDLED)
      Else "Second Chance Exception"
        OpenThread(Debug Event Thread ID)
        GetThreadContext
        WriteDump(Exception Context Record)
        CloseHandle(Thread)
        Done = True
    Case EXIT_PROCESS_DEBUG_EVENT
      ContinueDebugEvent(DBG_CONTINUE)
      Done = True
    Case CREATE_PROCESS_DEBUG_EVENT
      CloseHandle(CreateProcessInfo.hFile)
      ContinueDebugEvent(DBG_CONTINUE)
    Case LOAD_DLL_DEBUG_EVENT
      CloseHandle(LoadDll.hFile)
      ContinueDebugEvent(DBG_CONTINUE)
    Default
      ContinueDebugEvent(DBG_CONTINUE)
    End Switch
  End While
  DebugActiveProcessStop(PID)
End

The application starts by checking the command-line arguments for a PID. Next it calls OpenProcess to get a process handle of the target, and then it calls CreateFile to get a file handle. If the -e switch is missing it takes a hang dump as before. If the -e switch is present, the application attaches to the target (as a debugger) using DebugActiveProcess. In a while loop, it waits for a DEBUG_EVENT structure to be returned by WaitForDebugEvent. The switch statement uses the dwDebugEventCode member of the DEBUG_EVENT structure. After the dump has been taken, or the process has ended, DebugActiveProcessStop is called to detach from the target.

The EXCEPTION_DEBUG_EVENT within the DEBUG_EVENT structure contains an exception record within an exception. If the exception record is a breakpoint, it’s handled locally by calling ContinueDebugEvent with DBG_CONTINUE. If the exception is a first chance, it isn’t handled so that it can turn in to a second-chance exception (if the target doesn’t have a handler). To do this, ContinueDebugEvent is called with DBG_EXCEPTION_NOT_HANDLED. The remaining scenario is a second-chance exception. Using the DEBUG_EVENT structure’s dwThreadId, OpenThread is called to get a handle to the thread with the exception. The thread handle is used with GetThreadContext to populate the CONTEXT structure required. (A word of caution here: the CONTEXT structure has grown in size over the years as additional registers have been added to processors. If a later OS increases the size of the CONTEXT structure, you’ll need to recompile this code.) The obtained CONTEXT structure and the EXCEPTION_RECORD from the DEBUG_EVENT are used to populate an EXCEPTION_POINTERS structure, and this is used to populate a MINIDUMP_EXCEPTION_INFORMATION structure. This structure is passed to the application’s WriteDump function for use with MiniDumpWriteDump.

The EXIT_PROCESS_DEBUG_EVENT is handled specifically for the scenario where the target ends before an exception occurs. ContinueDebugEvent is called with DBG_CONTINUE to acknowledge this event and the while loop is exited.

The CREATE_PROCESS_DEBUG_EVENT and LOAD_DLL_DEBUG_EVENT events are specifically handled as a HANDLE needs to be closed. These areas call ContinueDebugEvent with DBG_CONTINUE.

The Default case handles all other events by calling Continue­DebugEvent with DBG_CONTINUE to continue execution and close the handle passed.

Post Mortem Exceptions

Windows Vista introduced a third parameter to the post mortem debugger command line to support the passing of exception information. To receive the third parameter, you need to have a Debugger value (in the AeDebug key) that includes three %ld substitutions. The three values are: Process ID (%ld), Event ID (%ld) and JIT Address (%p). The JIT Address is the address of a JIT_DEBUG_INFO structure in the target’s address space. Windows Error Reporting (WER) allocates this memory in the target’s address space when WER is invoked as the result of an unhandled exception. It fills in the JIT_DEBUG_INFO structure, invokes the post mortem debugger (passing the address of the allocation), and then frees the memory after the post mortem debugger ends.

To determine the exception context record, the post mortem application reads the JIT_DEBUG_INFO structure from the target’s address space. The structure has the address of a CONTEXT structure and EXCEPTION_RECORD structure in the target’s address space. These two structures are read from the target’s address space and their local address is added to the EXCEPTION_POINTERS structure. The ClientPointers member of the MINIDUMP_EXCEPTION_INFORMATION structure is set to FALSE to indicate that the EXCEPTION_POINTERS structure (and its members)  is in the local address space.

The MiniDump03 sample application shows how to implement the JIT_DEBUG_INFO support (see Figure 5).

Figure 5  MiniDump03 – JIT_DEBUG_INFO Handler

int JitInfoDump(HANDLE hProcess, DWORD dwProcessId, HANDLE hFile, 
  MINIDUMP_TYPE miniDumpType, ULONG64 ulJitInfoAddr)
{
    JIT_DEBUG_INFO jitDebugInfo;
    MINIDUMP_EXCEPTION_INFORMATION exceptionInfo = {0};
    EXCEPTION_POINTERS exceptionPointers = {0};
    EXCEPTION_RECORD exceptionRecord = {0};
    CONTEXT contextRecord = {0};
    SIZE_T numberOfBytesRead;

    // As we are the same bitness as the target - the structure sizes will match 
    if (ReadProcessMemory(hProcess, (void*)ulJitInfoAddr, 
      &jitDebugInfo, sizeof(jitDebugInfo), &numberOfBytesRead) && 
      (numberOfBytesRead == sizeof(jitDebugInfo)))
    {
        if (ReadProcessMemory(hProcess, (void*)jitDebugInfo.lpExceptionRecord, 
            &exceptionRecord, sizeof(exceptionRecord), &numberOfBytesRead) && 
            (numberOfBytesRead == sizeof(exceptionRecord)))
        {
            if (ReadProcessMemory(hProcess, (void*)jitDebugInfo.lpContextRecord, 
              &contextRecord, sizeof(contextRecord), &numberOfBytesRead) && 
              (numberOfBytesRead == sizeof(contextRecord)))
            {
                exceptionPointers.ExceptionRecord = &exceptionRecord;
                exceptionPointers.ContextRecord = &contextRecord;
                exceptionInfo.ThreadId = jitDebugInfo.dwThreadID;
                exceptionInfo.ExceptionPointers = &exceptionPointers;
                exceptionInfo.ClientPointers = FALSE;

                return WriteDump(hProcess, dwProcessId, hFile, miniDumpType, &exceptionInfo);
            }
        }
    }
    return WriteDump(hProcess, dwProcessId, hFile, miniDumpType, NULL);
}

When an application is invoked as the post mortem debugger, the application has two choices on how it will end the debugging. It can simply exit or it can trigger the event handle passed as the second parameter. The event allows you to release the target and then start post-analysis. In the MiniDump03 example, JIT_DEBUG_INFO is handled for Vista+ and SetEvent is called after the dump has been taken.

// Post Mortem (AeDebug) dump - No JIT - pre Vista
else if ((argc == 3) && (_stscanf_s(argv[2], _T("%ld"), &hEvent) == 1))
{
    nResult = WriteDump(hProcess, dwProcessId, hFile, miniDumpType, NULL);
    SetEvent(hEvent);
}
// Post Mortem (AeDebug) dump - JIT_DEBUG_INFO - Vista+
else if ((argc == 4) && (_stscanf_s(argv[2], _T("%ld"), &hEvent) == 1) && 
    (_stscanf_s(argv[3], _T("%p"), &ulJitInfoAddr) == 1))
{
    nResult = JitInfoDump(hProcess, dwProcessId, hFile, miniDumpType, ulJitInfoAddr);
    SetEvent(hEvent);
}

To replace the post mortem debugger with your own application, you point the Debugger value of the AeDebug keys to the post mortem application with the appropriate architecture. By using the matching application, you remove the need to adjust the effective machine (.effmach) in the debugger.

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug
Debugger (REG_SZ) = "C:\dumps\minidump03_x64.exe %ld %ld %p"

HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows NT\CurrentVersion\AeDebug
Debugger (REG_SZ) = "C:\dumps\minidump03_x86.exe %ld %ld %p"

MiniDumpCallback Function

So far, the dump taken contains memory that the DumpType parameter indicated to include (and optionally an exception context record). By implementing a MiniDumpCallback function prototype, we can add not only additional memory regions but also some error handling. I’ll describe later how you can implement solely the MiniDumpCallback function prototype for use with Sysinternals ProcDump v4.0.

There are currently 16 callback types that allow you to control multiple aspects of the dumping, such as the memory included for modules and threads, memory itself, the ability to cancel a dump that’s in progress, the control of the dump file I/O and the handling of errors.

The switch statement in my template code (see Figure 6) includes all the callback types roughly in the call order of their first invo­cation. Some callbacks can be called more than once and can subsequently occur out of sequence. Equally, there’s no contract for the order and thus it could change in future releases.

Figure 6 Template Implementation of the MiniDumpCallback Function Prototype

BOOL CALLBACK MiniDumpCallbackRoutine(
  __in     PVOID CallbackParam,
  __in     const PMINIDUMP_CALLBACK_INPUT CallbackInput,
  __inout  PMINIDUMP_CALLBACK_OUTPUT CallbackOutput
)
{    // Callback supported in Windows 2003 SP1 unless indicated
  // Switch statement is in call order
  switch (CallbackInput->CallbackType)
  {
  case IoStartCallback:  //  Available in Vista/Win2008
    break;
  case SecondaryFlagsCallback:  //  Available in Vista/Win2008
    break;
  case CancelCallback:
    break;
  case IncludeThreadCallback:
    break;
  case IncludeModuleCallback:
    break;
  case ModuleCallback:
    break;
  case ThreadCallback:
    break;
  case ThreadExCallback:
    break;
  case MemoryCallback:
    break;
  case RemoveMemoryCallback:
    break;
  case WriteKernelMinidumpCallback:
    break;
  case KernelMinidumpStatusCallback:
    break;
  case IncludeVmRegionCallback:  //  Available in Vista/Win2008
    break;
  case IoWriteAllCallback:  //  Available in Vista/Win2008
    break;
  case IoFinishCallback:  //  Available in Vista/Win2008
    break;
  case ReadMemoryFailureCallback:  // Available in Vista/Win2008
    break;
  }
  return TRUE;
}

The MiniDump04 sample contains a callback that does two things; it includes the entire stack contents, and it ignores read failures. This example uses ThreadCallback and MemoryCallback to include the entire stack, and the ReadMemoryFailureCallback to ignore the read failures.

To invoke the callback, the optional MINIDUMP_CALLBACK_­INFORMATION structure is passed to the MiniDumpWriteDump function. The structure’s CallbackRoutine member is used to point to the MiniDumpCallback function implemented (MiniDumpCallbackRoutine in my template and sample). The CallbackParam member is a VOID* pointer that allows you to retain context between callback calls. My WriteDump function from the Mini­Dump04 sample is in Figure 7.

Figure 7 WriteDump from the MiniDump04 Example

int WriteDump(HANDLE hProcess, DWORD dwProcessId, HANDLE hFile, MINIDUMP_TYPE miniDumpType, PMINIDUMP_EXCEPTION_INFORMATION pExceptionParam)
{
  MemoryInfoNode* pRootMemoryInfoNode = NULL;
  MINIDUMP_CALLBACK_INFORMATION callbackInfo;
  callbackInfo.CallbackParam = &pRootMemoryInfoNode;
  callbackInfo.CallbackRoutine = MiniDumpCallbackRoutine;
  if (!MiniDumpWriteDump(hProcess, dwProcessId, hFile, miniDumpType, pExceptionParam, NULL, &callbackInfo))
  {
    _tprintf(_T("Failed to create hang dump (Error: %08x)\n"), GetLastError());
    while (pRootMemoryInfoNode)
    {    // If there was an error, we'll need to cleanup here
      MemoryInfoNode* pNode = pRootMemoryInfoNode;
      pRootMemoryInfoNode = pNode->pNext;
      delete pNode;
    }
    return 11;
  }
  return 0;
}

The structure I have defined (MemoryInfoNode) to retain context between calls is a link list node that contains an address and a size, like so:

struct MemoryInfoNode
{
  MemoryInfoNode* pNext;
  ULONG64 ulBase;
  ULONG ulSize;
};

Entire Stack

When you use the MiniDumpWithProcessThreadData flag in the DumpType parameter, the contents of each stack are included from the stack base to the current stack pointer. My MiniDumpCallbackRoutine function implemented in the MiniDump04 sample augments this by including the remainder of the stack. By including the entire stack, you might be able to determine the origin of a stack trash.

A stack trash is when a buffer overflow occurs to a stack-based buffer. The buffer overflow writes over the stack return address and causes the execution of the “ret” opcode to POP the buffer contents as an instruction pointer, instead of the instruction pointer PUSHed by the “call” opcode. This results in execution of an invalid memory address, or even worse, the execution of a random piece of code.

When a stack trash occurs, the memory below (remember, stacks grow downward) the current stack pointer will still contain the unmodified stack data. With this data, you can determine the contents of the buffer and—most of the time—the functions that were called to generate the contents of the buffer.

If you compare the memory above the stack limit in a dump taken with and without the additional stack memory, you will see that the memory is displayed as missing (the ? symbol) in the normal dump, but is included in the dump made with the callback (see Figure 8).

Figure 8 Stack Contents Comparison

0:003> !teb
TEB at 000007fffffd8000
  ExceptionList:        0000000000000000
  StackBase:            000000001b4b0000
  StackLimit:           000000001b4af000
  SubSystemTib:         0000000000000000
...
// No Callback
0:003> dp poi($teb+10) L6
00000000`1b4af000  ????????`???????? ????????`????????
00000000`1b4af010  ????????`???????? ????????`????????
00000000`1b4af020  ????????`???????? ????????`????????
// Callback
0:003> dp poi($teb+10) L6
00000000`1b4af000  00000000`00000000 00000000`00000000
00000000`1b4af010  00000000`00000000 00000000`00000000
00000000`1b4af020  00000000`00000000 00000000`00000000

The first part of the “entire stack” code is the handling of the ThreadCallback callback type (see Figure 9). This callback type is called once per thread in the process. The callback is passed a MINIDUMP_THREAD_CALLBACK structure via the CallbackInput parameter. The structure includes a StackBase member that’s the stack base of the thread. The StackEnd member is the current stack pointer (esp/rsp for x86/x64 respectively). The structure doesn’t include the stack limit (part of the Thread Environment Block).

Figure 9 ThreadCallback Is Used to Collect the Stack Region of Each Thread

case ThreadCallback:
{    // We aren't passed the StackLimit so we use a 1MB offset from StackBase
  MemoryInfoNode** ppRoot = (MemoryInfoNode**)CallbackParam;
  if (ppRoot)
  {
    MemoryInfoNode* pNode = new MemoryInfoNode;
    pNode->ulBase = CallbackInput->Thread.StackBase - 0x100000;
    pNode->ulSize = 0x100000; // 1MB
    pNode->pNext = *ppRoot;
    *ppRoot = pNode;
  }
}
break;

The example takes a simplistic approach and assumes that the stack is 1MB in size—this is the default for most applications. In the case where this is below the stack pointer, the DumpType parameter will cause the memory to be included. In the case where the stack is larger than 1MB, a partial stack will be included. And in the case where the stack is smaller than 1MB, additional data will just be included. Note that if the memory range requested by the callback spans a free region or overlaps another inclusion, no error occurs.

The StackBase and the 1MB offset are recorded in a new instantiation of the MemoryInfoNode structure I’ve defined. The new instantiation is added to the front of the link list that’s passed to the callback via the CallbackParam argument. After the multiple invocations of ThreadCallback, the linked list contains multiple nodes of additional memory to include.

The last part of the “entire stack” code is the handling of the MemoryCallback callback type (see Figure 10). MemoryCallback is called continuously while you return TRUE from the callback, and provides a nonzero value for the MemoryBase and MemorySize members of the MINIDUMP_CALLBACK_OUTPUT structure.

Figure 10 MemoryCallback Is Continually Called While Returning a Stack Region

case MemoryCallback:
{    // Remove the root node and return its members in the callback
  MemoryInfoNode** ppRoot = (MemoryInfoNode**)CallbackParam;
  if (ppRoot && *ppRoot)
  {
    MemoryInfoNode* pNode = *ppRoot;
    *ppRoot = pNode->pNext;
    CallbackOutput->MemoryBase = pNode->ulBase;
    CallbackOutput->MemorySize = pNode->ulSize;
    delete pNode;
  }
}
break;

The code sets the values on the CallbackOutput parameter and then deletes the node from the linked list. After multiple invocations of MemoryCallback, the linked list will contain no more nodes, and zero values are returned to end the MemoryCallback invocation. Note that the MemoryBase and MemorySize members are set to zero when passed; you just need to return TRUE.

You can use the MemoryListStream area of the .dumpdebug command output to see all of the memory regions in the dump (note that adjacent blocks might be merged). See Figure 11.

Figure 11 The .dumpdebug Command Output

0:000> .dumpdebug
----- User Mini Dump Analysis
...
Stream 3: type MemoryListStream (5), size 00000194, RVA 00000E86
  25 memory ranges
  range#    RVA      Address      Size
       0 0000101A    7efdb000   00005000
       1 0000601A    001d6000   00009734
       2 0000F74E    00010000   00021000
       3 0003074E    003b0f8d   00000100
       4 0003084E    003b3000   00001000
...
Read Memory Failure

The last piece of code is quite simple (see Figure 12). It sets the Status member of the MINIDUMP_CALLBACK_OUTPUT structure to S_OK to signal that it’s OK to omit a region of memory that could not be read during the capture.

Figure 12 ReadMemoryFailureCallback Is Called on Read Failures

case ReadMemoryFailureCallback:  // DbgHelp.dll v6.5; Available in Vista/Win2008
  {    //  There has been a failure to read memory. Set Status to S_OK to ignore it.
    CallbackOutput->Status = S_OK;
  }
  break;

In this simple implementation, the callback implements the same functionality as the MiniDumpIgnoreInaccessibleMemory flag. The ReadMemoryFailureCallback callback is passed the offset, number of bytes and the failure code via the MINIDUMP_READ_MEMORY_FAILURE_CALLBACK structure in the CallbackInput parameter. In a more complex callback, you could use this information to determine if the memory is critical to dump analysis, and whether the dump should be aborted.

Dissecting Memory

So how do you know what you can and can’t afford to remove from a dump? Sysinternals VMMap is a great way to see what an application’s memory looks like. If you use Sysinternals VMMap on a managed process, you’ll notice that there are allocations associated with the garbage collected (GC) heap, and allocations associated with the application’s images and mapped files. It is the GC heap that you need in a dump of a managed process because the Son of Strike (SOS) debugger extension requires intact data structures from within the GC heap to interpret the dump.

To determine the location of the GC heap, you could take an exacting approach by starting a Debugging Engine (DbgEng) session against the target using DebugCreate and IDebugClient::AttachProcess. With this debugging session, you could load the SOS debugger extension and run commands to return the heap information (this is an example of using domain knowledge).

Alternatively, you could use heuristics. You include all regions that have a memory type of Private (MEMORY_PRIVATE) or a Protection of Read/Write (PAGE_READWRITE or PAGE_EXECUTE_­READWRITE). This collects more memory than absolutely necessary but still makes a significant saving by excluding the application itself. The MiniDump05 sample takes this approach (see Figure 13) by replacing the MiniDump04 sample’s thread stack code with a once-only VirtualQueryEx loop in the ThreadCallback callback (the new logic still causes the entire stack to be included as before). It then uses the same MemoryCallback code used in the MiniDump04 sample to include the memory in the dump.

Figure 13  MiniDump05—ThreadCallback Is Used Once to Collect the Memory Regions

case ThreadCallback:
{    // Collect all of the committed MEM_PRIVATE and R/W memory
  MemoryInfoNode** ppRoot = (MemoryInfoNode**)CallbackParam;
  if (ppRoot && !*ppRoot)    // Only do this once
  {
    MEMORY_BASIC_INFORMATION mbi;
    ULONG64 ulAddress = 0;
    SIZE_T dwSize = VirtualQueryEx(CallbackInput->ProcessHandle, (void*)ulAddress, &mbi, sizeof(MEMORY_BASIC_INFORMATION));
    while (dwSize == sizeof(MEMORY_BASIC_INFORMATION))
    {
      if ((mbi.State == MEM_COMMIT) &&
        ((mbi.Type == MEM_PRIVATE) || (mbi.Protect == PAGE_READWRITE) || (mbi.Protect == PAGE_EXECUTE_READWRITE)))
      {
        MemoryInfoNode* pNode = new MemoryInfoNode;
        pNode->ulBase = (ULONG64)mbi.BaseAddress;
        pNode->ulSize = (ULONG)mbi.RegionSize;
        pNode->pNext = *ppRoot;
        *ppRoot = pNode;
      }
      // Loop
      ulAddress = (ULONG64)mbi.BaseAddress + mbi.RegionSize;
      dwSize = VirtualQueryEx(CallbackInput->ProcessHandle, (void*)ulAddress, &mbi, sizeof(MEMORY_BASIC_INFORMATION));
    }
  }
}
break;

Mapped Memory Image Files

You may be wondering how you can debug a dump with image regions (MEM_IMAGE) missing. For example: How would you see the code that’s executing? The answer is a bit out-of-the-box. When the debugger needs to access a missing image region in a non-Full dump, it obtains the data from the image file instead. It finds the image file by using the module load path, the PDB’s original image file location, or uses the .sympath/.exepath search paths. If you run lmvm <module>, you’ll see a “Mapped memory image file” line indicating that the file has been mapped in to the dump, like so:

0:000> lmvm CrashAV_x86
start    end        module name
003b0000 003b6000   CrashAV_x86   (deferred)            
  Mapped memory image file: C:\dumps\CrashAV_x86.exe
  Image path: C:\dumps\CrashAV_x86.exe
  Image name: CrashAV_x86.exe
...

Relying on the “Mapped memory image file” ability of the debugger is a great approach to keeping the size of dumps small. It works particularly well with native applications, because the binaries compiled are the ones used and are therefore available on your internal build server (and pointed to by the PDBs). With managed applications, the JIT compilation on the remote customer’s computer complicates this. If you want to debug a managed application dump from another computer, you’ll need to copy the binaries (as well as the dumps) locally. This is still a savings because multiple dumps can be taken quickly, and then a single (large) application image file collection can be done without outage. To simplify the file collection, you could use the ModuleCallback to write out a script that collects the modules (files) referenced in the dump.

Plug Me In!

Changing the standalone application to use Sysinternals ProcDump v4.0 makes your life much easier. You no longer have to implement all of the code associated with calling MiniDumpWriteDump, and, more important, all of the code to trigger the dump at the right time. You just need to implement a MiniDumpCallback function and export it as MiniDumpCallbackRoutine in a DLL.

The MiniDump06 sample (see Figure 14) includes the callback code from MiniDump05 with just a few modifications.

Figure 14  MiniDumpCallbackRoutine Changed to Use a Global Instead of CallbackParam

MemoryInfoNode* g_pRootMemoryInfoNode = NULL;
...
case IncludeThreadCallback:
{
  while (g_pRootMemoryInfoNode)
  {    //Unexpected cleanup required
    MemoryInfoNode* pNode = g_pRootMemoryInfoNode;
    g_pRootMemoryInfoNode = pNode->pNext;
    delete pNode;
  }
}
break;
...
case ThreadCallback:
{    // Collect all of committed MEM_PRIVATE and R/W memory
  if (!g_pRootMemoryInfoNode)    // Only do this once
  {
...
    pNode->pNext = g_pRootMemoryInfoNode;
    g_pRootMemoryInfoNode = pNode;
...
  }
}
break;
...
case MemoryCallback:
{    // Remove the root node and return its members in the callback
  if (g_pRootMemoryInfoNode)
  {
    MemoryInfoNode* pNode = g_pRootMemoryInfoNode;
    g_pRootMemoryInfoNode = pNode->pNext;
    CallbackOutput->MemoryBase = pNode->ulBase;
    CallbackOutput->MemorySize = pNode->ulSize;
    delete pNode;
  }
}
break;

The new MiniDump06 project compiles the callback code as a DLL. The project exports the MiniDumpCallbackRoutine (case sensitive) using a DEF file:

LIBRARY    "MiniDump06"
EXPORTS
  MiniDumpCallbackRoutine   @1

Since ProcDump passes a CallbackParam value of NULL, the function needs to use a global variable instead to track its progress via my MemoryInfoNode structure. On the first IncludeThreadCallback, there’s new code to reset (delete) the global variable if set from a prior capture. This replaces the code that was implemented after a failed MiniDumpWriteDump call in my WriteDump function.

To use your DLL with ProcDump, you specify the -d switch followed by the name of your DLL that matches the architecture of the capture. The -d switch is available when taking Mini dumps (no switch) and Full (-ma) dumps; it isn’t available when taking MiniPlus (-mp) dumps:

procdump.exe -d MiniDump06_x64.dll notepad.exe
procdump.exe –ma -d MiniDump06_x64.dll notepad.exe

Note that the callback is invoked with different callback types than the ones described in my samples when a Full dump (-ma) is being taken (refer to the MSDN Library for the documentation). The MiniDumpWriteDump function treats the dump as a Full dump when the DumpType parameter contains MiniDumpWithFullMemory.

Sysinternals ProcDump (procdump.exe) is a 32-bit application that extracts from itself the 64-bit version (procdump64.exe) when required. After procdump64.exe has been extracted and launched by procdump.exe, procdump64.exe will load your (64-bit) DLL. As such, debugging your 64-bit DLL is quite tricky because the launched application is not the desired target. The easiest thing to do to support debugging your 64-bit DLL is to copy the temporary procdump64.exe to another folder and then debug using the copy. In this way, no extraction will occur and your DLL will be loaded in the application you launch from within the debugger (Visual Studio, for example).

Break!

Determining the origin of crashes and hangs is not easy when you can afford only a Mini dump. Making a dump file with additional key information solves this without incurring the expense of a Full dump.

If you’re interested in implementing your own dump application or DLL, I suggest you investigate the effectiveness of the Sysinternals ProcDump, WER and AdPlus utilities first—don’t reinvent the wheel.

When writing a callback, make sure you spend the time to understand the exact layout of memory within your application. Take Sysinternals VMMap snapshots and use dumps of the application to delve into the details. Making smaller and smaller dumps is an iterative approach. Start including/excluding obvious areas, and then refine your algorithm. You may need to use both heuristic and domain knowledge approaches to get you to your goal. You can help your decision making by modifying the target application. For example, you could use well-known (and unique) allocation sizes for each type of memory use. The important thing is to think creatively when deciding how to determine what memory is required, in both the target application and the dumping application.

If you’re a kernel developer and are interested in callbacks, there’s a similar kernel-based mechanism; refer to the BugCheckCallback Routine documentation on msdn.com.


Andrew Richards is a Microsoft senior escalation engineer for Windows OEM. He has a passion for support tools, and is continually creating debugger extensions and callbacks, and applications that simplify the job of support engineers. He can be contacted at andrew.richards@microsoft.com.

Thanks to the following technical experts for reviewing this article: Drew Bliss and Mark Russinovich