From the March 2002 issue of MSDN Magazine

Article
10/23/2019

MSDN Magazine

Windows 2000 Loader

What Goes On Inside Windows 2000: Solving the Mysteries of the Loader

Russ Osterlund

This article assumes you're familiar with Win32 APIs and C++

Level of Difficulty 1 2 3

Download the code for this article: Loader.exe (76KB)

SUMMARY DLLs are a cornerstone of the Windows operating system. Every day they quietly perform their magic, while programmers take them for granted. But for anyone who's ever stopped to think about how the DLLs on their system are loaded by the operating system, the whole process can seem like a great mystery. This article explores DLL loading and exposes what really goes on inside the Windows 2000 loader. Knowing how DLLs are loaded and where, and how the loader keeps track of them really comes in handy when debugging your applications. Here that process is explained in detail.

ver since I first encountered a definition of dynamic link libraries in a description of the then-new operating system OS/2, the idea of DLLs has always fascinated me. This beautifully simple concept of modules that could be loaded and unloaded as needed with well-defined interfaces that outlined routines written beforehand and, perhaps by other programmers, was a powerful jolt to me because I was more accustomed to statically linked code in mainframe or MS-DOS® programs. And, like many others new to programming for Windows®, the first utility I built enumerated DLLs that were already loaded into the system in order to demonstrate this concept at work. Now, even with the Windows world changing at a frenetic pace, employing COM interfaces and their ActiveX® components, and moving toward common language runtimes with their assemblies of managed code, the humble DLL remains at the center of things, providing services to the system on an as-needed basis.
      During this long association with DLLs, I accepted their loading by the operating system as if it were magic and never truly appreciated the amount of work required by LoadLibrary and its variations. This article is an attempt to rectify that oversight by looking inside NTDLL.DLL. Since I do not have access to the source files, much of what I discuss here falls under that nebulous category of undocumented information and is therefore subject to change or obsolescence in future releases of the operating system.
      The details that I'll cover are based on an examination of the binaries available when I wrote this article, Windows 2000 Professional (Build 2195: Service Pack 1). Access to a properly installed set of debug symbol files, .DBG and .PDB dated July 9, 2000, and working with a suitable debugger will make the information easier to understand.
      This article can be seen as a precursor to Matt Pietrek's Under The Hood column in the September 1999 issue of MSJ, where Matt writes pseudocode for the operation of LdrpRunInitializeRoutines for Windows NT® 4.0 SP3 and describes how a library is initialized and when DllMain gets called. Note that I will refer to this column frequently.
      My discussion will begin with a brief look at LoadLibrary, starting with LdrLoadDll, and will conclude when LdrpRunInitializeRoutines is invoked. While trying to follow the execution path needed to load a simple DLL using a debugger, you can easily become confused by the numerous unconditional jump statements and lost in the recursion common in the later stages of DLL loading, so I'll guide you carefully through the call to LoadLibrary.
      Note that all code modules mentioned in this article can be found at the link at the top of this article.

All Paths Lead to LoadLibraryExW

      There are several ways to get to LoadLibraryExW. For example, LoadTypeLibEx and CoLoadLibrary in the COM universe eventually call LoadLibraryExW. The two most familiar routes to LoadLibraryExW are LoadLibraryA and LoadLibraryW. All you need to do is specify one parameter—the name of the DLL—and you are on your way.
      But if you examine a disassembly of LoadLibraryA and LoadLibraryW, you will discover that they are merely thin wrappers around the more versatile LoadLibraryExA and LoadLibraryExW APIs, respectively. With LoadLibraryA, there is a curious test for the DLL twain_32.dll, but normally two zeroes are passed as the second and third parameters to LoadLibraryExA before continuing. LoadLibraryW is even more direct; the two zeroes are pushed onto the stack and the code moves directly onto LoadLibraryExW. These paths merge with an examination of LoadLibraryExA. There is a call to a helper routine to convert the DLL's name into a Unicode string before the code proceeds onto LoadLibraryExW.
      LoadLibraryExW is a fairly involved routine which must decide between at least four different variations on the type of DLL-loading that the program wants to perform. In addition to the first parameter (which contains the name of the DLL), and the second parameter (which must be NULL according to the SDK documentation), there is a flag parameter that specifies the action to take when loading the module. You can ignore the flag value LOAD_LIBRARY_AS_DATAFILE, since it does not lead to the desired goal: LdrLoadDll, an API exported by NTDLL.DLL. The remaining valid values—0, DONT_RESOLVE_DLL_REFERENCES, and LOAD_WITH_ALTERED_SEARCH_PATH—all eventually find their way to LdrLoadDll.
      It is interesting to note that there is no sanity check for the flag parameter that returns something like STATUS_INVALID_PARAMETER if values other than the documented ones are passed to LoadLibraryExW. For example, plugging in a reasonable value such as 4 results in a normal DLL load. In any case, the alternate paths taken with DONT_RESOLVE_DLL_REFERENCES or LOAD_WITH_ALTERED_SEARCH_PATH will be noted later in this article. For now, I will concentrate on the normal case in which dwFlags has a value of 0.
      There's one more API exported from NTDLL.DLL that leads to LdrLoadDll, but I have found neither documentation for it nor any references to it in existing DLLs. The name of the API is LoadOle32Export. It accepts two parameters, the image base for Ole32.DLL and the name of the routine to find, and it returns the address to the requested function. Whether this is a mysterious secret path or some kind of holdover from past versions of the operating system is uncertain.

APIs Exported by NTDLL.DLL

Figure 1 lists the exported APIs in NTDLL.DLL beginning with the prefix "Ldr", and Figure 2 lists the internal routines beginning with "Ldrp". The main difference between the names for the LdrXXX and LdrpXXX routines is that the "p" indicates that these functions are private—hidden from the outside world. As you can see by examining the lists, familiar APIs such as GetProcAddress are wrappers around NTDLL exports like LdrGetProcedureAddress. In fact, many APIs in Kernel32 that are familiar to the SDK programmer, such as GetProcAddress, ExitProcess, ExpandEnvironmentStrings, and CreateSemaphore, pass the real work along to an NTDLL surrogate (LdrGetProcedureAddress, NtTerminateProcess, RtlExpandEnvironmentStrings_U, and NtCreateSemaphore, respectively).
Now, let's take a close look at LdrLoadDll using DumpBin or my disassembler, PEBrowse (available from https://www.smidgeonsoft.com).

  0X77F889A9  PUSH      0X1                       
  

  0X77F889AB  PUSH      DWORD PTR [ESP+0X14]      
  

  0X77F889AF  PUSH      DWORD PTR [ESP+0X14]      
  

  0X77F889B3  PUSH      DWORD PTR [ESP+0X14]      
  

  0X77F889B7  PUSH      DWORD PTR [ESP+0X14]      
  

  0X77F889BB  CALL      0X77F887E0          ; SYM:LdrpLoadDll 
  

  0X77F889C0  RET       0X10

Do not be deceived by the apparent simplicity of this routine, which looks like merely a wrapper around the internal procedure, LdrpLoadDll (that I'll discuss in the next section). The first parameter points to a wide-string representation of the search path. The second parameter will hold a DWORD with the value of 2 if DONT_RESOLVE_DLL_REFERENCES was specified in the call to LoadLibraryExW. The third parameter contains a pointer to a Unicode string structure that encases the name of the DLL that is to be loaded. The fourth item is an output parameter and will receive the address at which the module was loaded when LdrpLoadDll has finished its work.
But what does the fifth parameter, hardcoded with a value of 1, mean? As you will see later, LdrpLoadDll can be called recursively in LdrpSnapThunk. Here, the value means normal processing, but later you will see that a value of 0 means process forwarded APIs.

LdrpLoadDll

      I have provided a small project in the code download for this article that contains a main executable, ingenuously named Test, and three DLLs: TestDll, Forwarder, and Forwarded. The DLLs demonstrate different variations that illustrate some of the scenarios LdrpLoadDll will commonly encounter. See the Readme.txt file in the code download for important information about setting environment variables and predefined breakpoints.
      Figure 3 shows some of the internal loader routines you will bump into when you pass one of the documented flags to LoadLibraryExW. If you concentrate for a moment on the typical situation (#1 in Figure 3), you will see that there are six subroutines called directly by LdrpLoadDll: LdrpCheckForLoadedDll, LdrpMapDll, LdrpWalkImportDescriptor, LdrpUpdateLoadCount, LdrpRunInitializeRoutines, and LdrpClearLoadInProgress. (I'll discuss the first four subroutines later in this article.) LdrpRunInitializeRoutines has already been described in Matt Pietrek's column, so I won't go into it here. LdrpClearLoadInProgress is briefly mentioned in that column as well.
      Let's take a high-level look at the steps taken by LdrpLoadDll, which occur as follows:

Check to see if the module is already loaded.
- Map the module and supporting information into memory.
  - Walk the module's import descriptor table (that is, find out what other modules this one is adding).
  - Update the module's load count as well as any others brought in by this DLL.
  - Initialize the module.
  - Clear some sort of flag, indicating that the load has finished.

      Looking more closely at the routines listed, you will notice that LdrpCheckForLoadedDll can be and is called from several locations. It is therefore reasonable to assume that this is a helper function. Also, note that LdrpSnapThunk and LdrpUpdateLoadCount, as well as the previously mentioned case for LdrpLoadDll, are all candidates for recursion.
      The code for LdrpLoadDll.cpp contains pseudocode for the operations found in this and the other loader routines. My pseudocode is not a copy of the actual code for LdrpLoadDll and should not be compiled; it represents intelligent guesswork obtained by studying a disassembly of the routine. It also shows how much information can be brought together about any section of code just by taking the time to study the available binaries. The pseudocode by no means covers every detail.
      In many cases, the variable names were assigned based on their role in the code. Others were taken from parameter descriptions found in Windows NT/2000 Native API Reference by Gary Nebbett (New Riders Publishing, 2000). Take a look at this book to see what takes place inside NTDLL.DLL. It provides a reference for the Nt/Zw routines—the so-called native APIs—and where possible relates these routines back to their Win32® counterparts.
      Now I'll describe in some detail what happens when your code loads the typical DLL, then I'll throw in a few options for variety. LdrpLoadDll first sets up a __try/__except block, then checks the flag, LdrpInLdrInit. If the flag is turned on, then it sets up a critical section block to prevent updates in the data structures that it will be referencing and modifying. Next, it checks on the length of the incoming DLL's name against the maximum, 532 bytes or 266-wide characters, which is close to the better-known constant MAX_PATH and its value of 260. If the length of the name exceeds the maximum value, then the routine quits with a return code of STATUS_NAME_TOO_LONG.
      At this point, LdrpLoadDll attempts to locate the position of the period character (dot) in the module's name. If the dot is missing, then the routine appends the string ".dll" to the end of the module's name, assuming this does not exceed the maximum size. There are two items to note here: first, you do not need to pass the .dll extension in your calls to LoadLibrary, and second, some of the features, such as the API-forwarding discussed later in this article, will not work with any extension other than .dll. (For a preview, try the following: change the output name of the Forwarded project to "Forwarded.cpl" and run the test program, making certain that the GetProcAddress stuff is included. You'll see that the GetProcAddress call fails and the messagebox that should display "Hello World!" will not appear!)
      The test for ShowSnaps and enabling this handy debugging aide on your system were explained in Matt's column. ShowSnaps occupies the second bit position in the Registry value GlobalFlag, located at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager. When enabled, this provides feedback in a debugger's output window about actions the loader is taking. If you decide not to enable "snaps" on your system, then the text strings appearing throughout the code will provide useful diagnostic hints. (In the download is an ASCII text file, LdrSnap(Forwarder).TXT, that includes the snap outputs produced by loading the Forwarder test DLL. Later in the article you will find figures showing this information for Forwarded.DLL and TestDll.DLL.) LdrpLoadDll then constructs a Unicode string of the module's name that will be used by the first subroutine, LdrpCheckForLoadedDll.

LdrpCheckForLoadedDll

      The pseudocode for this routine can be found in LdrpCheckForLoadedDll.cpp. You'll see that the code checks all of the possible lists for loaded DLLs and if the initial search fails, the routine tries the alternative. Following all of the paths in this routine is rather complicated, but the essence of this routine is rather simple to understand. There are two places to start a search: an optimization based on a hash table found inside of NTDLL.DLL, and walking the module list maintained inside the process's environment block (PEB). The definition of the module list and its location in memory will follow shortly. LdrpLoadDll, the calling routine, starts the search by setting the UseLdrpHashTable parameter to 0. Later on, I'll show you a similar case in which the parameter is set to 1 and the hash table is used. If the DllName parameter does not contain a path, the comparison using RtlEqualUnicodeString is made on the simple file name entries.
      There is a small wrinkle in the search option when the hash table is not used, however. If the incoming DllName contains a path, LdrpCheckForLoadedDll calls on RtlDosSearchPath_U to validate the path, then examines the entries in the module list that contain fully qualified file names. In case you were wondering how the Microsoft® .NET Common Language Runtime (CLR) supports side-by-side execution of different versions of the same assembly, this information should give you a head start. If the module is found in the PEB list but contains a load count of 0, LdrpCheckForLoadedDll forces the load by issuing a sequence of commands similar to what will be seen in LdrpMapDll. But describing this sequence now would be jumping too far ahead.
      If the search has succeeded and the loading DLL is already part of the current process, then LdrpLoadDll has a few remaining details to take care of. The loaded count for this image will be incremented if the module is a DLL and if the loaded count field has not been set to -1 (which seems to mean "do not update the load count.") The module flag will also be cleared of its load-pending bit, which was set by the routine LdrpUpdateLoadCount. LdrpLoadDll will then leave the critical section block and set the ImageBase parameter to the address where the DLL was loaded.
      More than likely, though, this is not the quick exit you are looking for—unless you are wasting machine cycles by using LoadLibrary to return an HINSTANCE of an already loaded module. Otherwise, GetModuleHandle is a better alternative. Instead, let's see what happens when the search fails.

LdrpMapDll

      Now that LdrpLoadDll knows that the requested DLL has not already been loaded into the process, it calls upon its second helper routine, LdrpMapDll, to perform the duties of finding the DLL's housing (the actual file), loading the DLL into memory (but not initializing it), and creating and adding a structure that I have tagged, MODULEITEM, to the PEB's module list. First, orient yourself in Figure 3 and become familiar with the support routines you can see in LdrpMapDll, (LdrpCheckForKnownDll, LdrpResolveDllName, LdrpCreateDllSection, LdrpAllocateDataTableEntry, LdrpFetchAddressOfEntryPoint, and LdrpInsertMemoryTableEntry). In the file LdrpMapDll.cpp, you will find the pseudocode that shows the fine points of what actually takes place here.
      LdrpCheckForKnownDll checks to see if the loading DLL can be located in the directory specified in the Unicode string, LdrpKnownDllPath, located at 0x77FCE008. Examining this memory location with a debugger reveals that this variable contains the value "C:\WINNT\SYSTEM32" (or something similar, depending upon how your system is configured).
      If you start up WinObj (found at https://www.sysinternals.com) or my utility, NtObjects (found at https://www.smidgeonsoft.com), and select View | Executive Objects, you will find an object directory called KnownDlls. Under this directory, you will see the item KnownDllPath (a symbolic-link object) that contains the value for your system. LdrpCheckForKnownDll allocates two Unicode strings to hold the fully qualified DLL file name and the file name by itself. By calling NtOpenSection with the fully qualified file name, a FileExists operation is performed and LdrpCheckForKnownDll returns a section handle if the DLL is known. Otherwise, the two Unicode strings are freed and LdrpMapDll must call on the services of LdrpResolveDllName and LdrpCreateDllSection.
      LdrpResolveDllName returns two Unicode strings, the loading DLL's file name and its fully qualified file name. If a search path was not provided, the routine uses the path found at the hardcoded address 0X77FCE30C in NTDLL.DLL, which points to a default search path. RtlDosSearchPath_U, though, performs the real work in this routine. If RtlDosSearchPath_U can find the loading DLL, it returns the length of the path where it resolved the DLL's name.
      If you try to load a DLL that cannot be found in the search path, the search returns 0. LdrpMapDll then responds to this result and returns with the status code STATUS_DLL_NOT_FOUND, which leads to a quick exit from LdrpLoadDll. If you change the test program so that it tries to load a DLL with the name "bogus," you can check this for yourself.
      Assuming all is well and the loading DLL has been found, LdrpCreateDllSection must take over and create the all important section handle. Since sections are classified as kernel objects, the path needs to be converted into something that the Executive Object Manager can understand. This is where the routine RtlDosPathNameToNtPathName comes into play. It takes a fully qualified file name "C:\Projects\LoadLibrary\Debug\TestDll.dll" and returns something like "\??\C:\Projects\LoadLibrary\Debug\TestDll.dll." If the file name cannot be interpreted, then the routine returns STATUS_OBJECT_PATH_SYNTAX_BAD and ends execution.
      LdrpCreateDllSection is just a thin wrapper around the so-called native API, NtCreateSection. First, a file handle is obtained with NtOpenFile. If that call fails, then a return code of STATUS_INVALID_IMAGE_FORMAT is generated. Otherwise, the Object Manager creates the section handle and the file handle is closed via NtClose.
      Now that LdrpMapDll has the section handle, it can actually load the DLL into the process's address. The DLL is brought in as a memory-mapped file through the services of NtMapViewOfSection. First, the defaults are set for the base address and the size of the mapping object—with the latter, a value of 0 indicates to the system that the entire section object will be mapped. Then, a field in the PEB reserved for subsystem calls is loaded with the value of the fully qualified file name.
      Those of you who have written debug loops and have handled the debug event LOAD_DLL_DEBUG_EVENT may be interested to know that the lpImageName field of the LOAD_DLL_DEBUG_INFO structure is filled with the contents of this field, which is reserved for subsystem calls in the process that's being debugged.
      Now, NtMapViewOfSection is called. This triggers the notification of the loading DLLs as seen in debuggers like the one in Visual C++®. LdrpMapDll restores the previous value of the PEB's subsystem data field and checks on the success of the operation. If the DLL has been successfully mapped, LdrpMapDll now possesses an actual memory address with which it can work. How NtMapViewOfSection returns the image base when no hint was passed to it is a question that could be resolved with further exploration.

PEB Load List

      After a sanity check on the image now mapped into memory, LdrpMapDll continues with some bookkeeping. Earlier I mentioned a data structure I named MODULEITEM that is created and added to the process's PEB. In the code for LdrpLoadDll.h you will find a reconstruction of this structure. The job of LdrpAllocateDataTableEntry is to allocate this object and initialize the ImageBase field using three values: the HMODULE handle returned by NtMapViewOfSection, the ImageSize field using the SizeOfImage value found in the portable executable (PE) file's optional header, and the TimeDateStamp from the equivalent field in the PE file header. If the memory allocation fails, the routine returns a NULL pointer. LdrpMapDll then removes the file mapping, closes the section handle, and fails, returning STATUS_NO_MEMORY.
      Assuming that all is well, the LoadCount field in the newly built ModuleItem is zeroed out and ModuleFlags gets initialized. (LdrpLoadDll.h provides the possible values for this field.) The two Unicode string fields containing the full path to the DLL and the file name are filled in. Then a call placed to the small helper routine, LdrpFetchAddressOfEntryPoint, inserts the structure's EntryPoint field.
      With the ModuleItem partially initialized, LdrpMapDll calls LdrpInsertMemoryTableEntry to insert the entry into the process's module list located in the PEB. At offset 0x0C in the PEB, which can usually be found at the address of 0x7FFDF000, you will find the pointer to a structure I have named MODULELISTHEADER. Windows NT and Windows 2000 maintain three doubly linked lists that describe the load, memory, and initialization order for the modules in a process.
      If you have ever used the PSAPI routines or the newly available ToolHelp32 functions in Windows 2000 to enumerate the modules loaded in a process's address space, you should be having an epiphany. You have an alternate method to gather this information, in three different flavors, by using ReadProcessMemory and these undocumented structures. These structures have remained stable from Windows NT 4.0 through Windows 2000, and through the betas for Windows XP that were available when I did the research for this article. LdrpInsertMemoryTableEntry adds the ModuleItem structure to the end of the load and memory chains and fixes up the links appropriately. In my experience, I have found that the load and memory chains possess the same order; only the initialization list varies with its own order of the modules. See Matt Pietrek's column for more information on the initialization chain.
      Returning to LdrpMapDll, you can now see the ModuleFlags field receiving some additional attention. If the executable is not IMAGE_FILE_LARGE_ADDRESS_AWARE and it is not of type IMAGE_DLL, the EntryPoint field gets zeroed out. LdrpMapDll tests the return value from NtMapViewOfSection to determine if the image was loaded at its preferred image base. Unfortunately, scenarios in which your DLL is parked in a different location is beyond the scope of this discussion, but now you know the way, so you can investigate this phenomenon on your own.
      Finally, after a validation of the image on multiprocessor systems, LdrpMapDll closes the section handle obtained from LdrpCreateDllSection and returns with the results of its work. The validation only proceeds if your DLL contains data in the directory IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG and the LockPrefixTable field in this directory is nonzero. (NTDLL and Kernel32 contain the IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG directory.) You can use PEBrowse to examine this directory, and those of you fortunate enough to have a multiprocessor machine can continue your own exploration down this path.

LdrpWalkImportDescriptor

      You may think that the route through LdrpLoadDll has been relatively straight and uncomplicated so far. But there still may be a maze of passages ahead. Your attempt to load LdrpLoadDll may have generated the need for additional modules, and this is where LdrpWalkImportDescriptor comes in. (In order to better understand my pseudocode, it will help to have some knowledge of the PE header definitions found in WinNt.h, especially those relevant to imports and exports. You can take a look at "Inside Windows: An In-Depth Look into the Win32 Portable Executable File Format" by Matt Pietrek in the February 2002 issue of MSDN Magazine.
      Your typical module load takes you through the twists and turns of LdrpWalkImportDescriptor. However, if you specify DONT_RESOLVE_DLL_REFERENCES in your call to LoadLibraryExW, then you will avoid the upcoming maze. You should read the SDK documentation carefully to make certain that this is what you want. There is also a mechanism, which I'll explain later, to help you avoid the loops and recursion that are part of LdrpSnapIAT and LdrpSnapThunk. But if you don't take either of these alternatives, then you need to know what happens with LdrpWalkImportDescriptor.
      Orient yourself once again by reexamining Figure 3, locating LdrpWalkImportDescriptor. LdrpWalkImportDescriptor has two subroutines: LdrpLoadImportModule and LdrpSnapIAT. This does not seem so bad, but one tip-off that this code will soon become interesting is that there are four nesting levels in the routines for LdrpSnapIAT. The number and depth of nested functions is one metric that indicates the complexity of code. You should take note that recursion is possible in not one, but two locations in LdrpSnapIAT. You may recall that in the section on APIs exported by NTDLL.DLL I mentioned the apparent simplicity of the call to LdrpLoadDll and the fifth parameter that took a 0 or a 1. LdrpSnapIAT can also be recursive inside LdrpGetProcedureAddress. Finally, to make things even more complex than they already were, it's possible that a typical DLL may import other modules that start a cascade of additional library loads. The loader will need to loop through each module, checking to see if it needs to be loaded and then checking its dependencies.
      With that in mind, let's take a look at the pseudocode found in LdrpWalkImportDescriptor.cpp. (If you are following along with the debugger, change the test program to load the Forwarded.DLL module and restart the debugger.) Execution starts with two calls to RtlImageDirectoryEntryToData to locate the Bound Imports Descriptor and the regular Import Descriptor tables. For the moment, ignore the call for that bound import thing except to notice that the code checks for its presence first. (I'll discuss binding later.) In Forwarded.DLL, LdrpWalkImportDescriptor detects two imported modules, User32.DLL and Kernel32.DLL, and now calls upon LdrpLoadImportModule for assistance.
      LdrpLoadImportModule constructs a Unicode string for each DLL found in the import table and then employs LdrpCheckForLoadedDll, using the hash table in NTDLL that was mentioned earlier to see if they have already been loaded. Note that the call here is made with only the file name (no fully qualified path) and the process's search path. If you have ever had your application complain that it cannot find a DLL, you should realize that it may not be the loading module's fault. Check to see that all of its dependent modules can be found. If a module is found and already loaded, life is good because there is one less module to worry about. If it has not been loaded, then you've found an instance of recursion. A call to LdrpMapDll to bring the DLL into the process' address space is followed by a call to LdrpWalkImportDescriptor. Now you're in the middle of the twisting mazes I mentioned.

The IAT and Forwarded API Processing

      Once LdrpWalkImportDescriptor knows that the module is in memory (either through a call to LoadLibraryExW way back at the beginning or via the LdrpMapDll call in LdrpLoadImportModule), the next step is to examine each and every API referenced in Forwarded's imported module list with the call to LdrpSnapIAT. Why is this necessary? There are two reasons. First, you want to replace the placeholders in the Import Address Table (IAT) with real entry points. Second, you need to locate and process APIs that may have been forwarded on to another DLL. (Forwarding is one technique Microsoft employs to expose a common Win32 API set and to hide the low-level differences between the Windows NT and Windows 9x platforms.)
      You may have observed either in the debug output strings or by carefully stepping through the loader code that at some point during Kernel32 processing, a check on NTDLL.DLL was made. Since just about any DLL with some executable code contains references to Kernel32, you may wonder why this is happening at all. You already know that NTDLL.DLL is loaded into every process. The answer is that Kernel32 contains APIs that are "forwarded" to NTDLL. Refer to Figure 4 for a complete list of these APIs for Kernel32 (version 5.0.2195.1600).
      You may also notice that the list contains functions that are common requirements for just about any kind of serious programming. Remember, LdrpLoadDll needs to load every module referenced by the loading DLL, including those "hidden" references contained in forwarded APIs. Thus, a reasonable conclusion from all of this is that an import table walk will take place every time you load a DLL—at least for Kernel32's forwarded APIs and for any additional forwarded APIs you may have decided to include in your application!
      To allow you to experiment with this concept, I have included Forwarder.DLL and Forwarded.DLL in the download, as I've mentioned previously. The code for Forwarder.DLL is extremely simple—a DllMain with an export pragma. If you run DumpBin or PEBrowse on this module, you will see that it exports only one routine, and that the routine is marked with a designation that it is forwarded on to Forwarded.ShowMessage.
      Now, turn your attention to Forwarded.DLL and take a look at what it exports. You will see a typical DLL with a DllMain and the single API, ShowMessage. However, if you changed the test program to load Forwarder and then observed what happened in the debugger, you might have wondered why no reference to Forwarded appeared. You will encounter this confusion unless you also included the GetProcAddress statement in your compile. Make certain that you include the GetProcAddress line and you will now see that both DLLs are loaded. You're observing another form of delay-load occurring and, perhaps, another example of the optimization that has been done on the loading engine in Windows.
      Returning to LdrpWalkImportDescriptor, you'll find that it tests several items before calling LdrpSnapIAT. (For the purposes of following the next few steps, change the sample back to use Forwarded.DLL.) If you look up Section 6.4.1 of the Portable Executable Specification (found on the MSDN Library CD under Specifications), you will read that the time/date stamp field of an import directory will contain a value of 0 unless the DLL has been bound. One of the fields tested is the time/date stamp field, which is 0 in Forwarded.DLL, so let's step into LdrpSnapIAT.
      LdrpSnapIAT wraps its execution around a __try/__except block first before locating the IAT in Forwarded.DLL, and then hunts for the export directory in the module that Forwarded is attempting to load (assume it is Kernel32 for now). It then changes the memory protection on the IAT of Forwarded.DLL to PAGE_READWRITE and proceeds to examine each entry in the IAT. (If you are able to examine the protection for this chunk of memory, you will see that it is normally PAGE_READONLY for your executables.) Going a bit further, you'll encounter LdrpSnapThunk.
      LdrpSnapThunk requires an ordinal to locate an entry point and to determine whether or not the API is forwarded. If the hint value in Forwarded.DLL's import directory is correct, you can use that (generally, I have found this not to be the correct value). Otherwise, LdrpSnapThunk calls on the services of the helper routine, LdrpNameToOrdinal, to look up the correct value. Observe that LdrpNameToOrdinal uses a binary search on the export table to quickly locate the ordinal—more optimization in the loader—and note that the table must be sorted in alphabetical order for the search to work.
      Now that you have an ordinal, you can look up the entry point for the API in Kernel32. LdrpSnapThunk first plugs the loading module's IAT entry with an address derived from the export table for Kernel32; see Section 6.4.4 of the PE specifications (which can be found on the October 2001 MSDN CD under Specifications | Microsoft Portable Executable and Common Object File Format) for more information. (This explains why the page protection was changed in LdrpSnapIAT.)
      Section 6.3.2 of the PE specifications says:

If the address (the entry-point) is not within the export section (as defined by the address and length indicated in the Optional Header), the field is an Export RVA: an actual address in code or data. Otherwise, the field is a Forwarder RVA, which names a symbol in another DLL.

      So now you can finally decide whether or not the API has been forwarded. In the vast majority of cases, the API is not forwarded, but let's assume you are looking at Forwarded's reference to HeapAlloc. (Check the math first. Kernel32's image base (0x77e80000) + HeapAlloc's entry point (0x0005b658) is 0x77edb658, which is inside the range for the export table, 0x77ed5c20 to 0x77edb770.)
      LdrpSnapThunk now proceeds to break apart the forwarded reference for HeapAlloc, which will have the format NTDLL.RtlAllocateHeap, and then calls LdrpLoadDll to obtain NTDLL's image base—hmm, this looks like you're back at the beginning. But note that the fifth parameter is passed with a value of 0. Also note that the DLL name that was constructed before making the call to LdrpLoadDll lacks the .DLL extension. Fortunately, the call to LdrpLoadDll will succeed when LdrpCheckForLoadedDll figures out that NTDLL.DLL is already loaded.
      But do you remember that experiment where I changed the extension for Forwarded to .cpl? Try this again and you will see that LdrpLoadDll now fails on the LdrpMapDll call with a STATUS_DLL_NOT_FOUND return code. Now I have an explanation for the earlier results. With the module name out of the way, LdrpSnapThunk grabs the API name, RtlAllocateHeap, and forges on to LdrpGetProcedureAddress.
      At this point, I am getting close to the end of this forwarded API processing, I promise. LdrpGetProcedureAddress is another routine that wraps its processing around a __try/__except block. The routine determines what type of API information it has been handed, either an ordinal if the API name parameter is null, or the name itself. The test is needed in order to properly set up parameters for an upcoming call to LdrpSnapThunk. But wait a moment. Didn't LdrpSnapThunk just bring us to this point? The two parameters are a pointer to the API's entry in the IAT and an overloaded item that contains either the image base of the loading DLL or a pointer to an IAT entry. If the flag, LdrpInLdrInit, is turned on, the process's critical section is entered. And now let's really dive in deep and step into LdrpCheckforLoadedDllHandle.
      Fortunately, the functionality here is pretty simple to describe and understand. I need a MODULEITEM before I can continue. LdrpCheckforLoadedDllHandle first examines a handle cache residing at LdrpLoadedDllHandleCache to see if the image base there is the same as its input parameter, hDll. If not, the routine perseveres by walking the LoadOrder list, searching for the MODULEITEM whose image base matches hDll and whose linkage in the memory order list has been established. Once it finds an entry matching these criteria, it updates the cache with it and hands back to LdrpGetProcedureAddress the MODULEITEM that it found, or returns a 0 to indicate failure.
      With a MODULEITEM in hand, LdrpGetProcedureAddress now calls an old friend, RtlImageDirectoryEntryToData, to locate the item's export directory and starts that twisty, recursive call to LdrpSnapThunk. What is going on here? Why is this call necessary? The answer, in part, is that this new API itself may be forwarded! I am not aware of such a situation in Windows 2000, but the possibility certainly exists. Happily, HeapAlloc's processing ends with RtlAllocateHeap inside of NTDLL, and LdrpSnapThunk returns an IAT entry with the entry point to this API. LdrpGetProcedureAddress frees up any work areas it might have created, exits the critical section (if it was acquired), and returns. Whew!
      Next, LdrpSnapThunk checks the return code and returns STATUS_ENTRYPOINT_NOT_FOUND if the API was not found. Otherwise, it replaces the entry in the IAT with the API's entry point and continues on. Study Section 6.4.4 in the PE specifications and especially the references to binding for a more complete picture of what is happening.
      Now let's return to LdrpSnapIAT and move on to the next imported API in Kernel32 (or break from the loop if the LdrpSnapThunk call failed). Once all of the entries are processed in Kernel32's import table, LdrpSnapIAT restores the memory protection it changed at the beginning of its work, calls NtFlushInstructionCache to force a cache refresh on the memory block containing the IAT, and returns back to LdrpWalkImportDescriptor. The cache refresh might be a little surprising, but in many executables the IAT can be found in the .text section where code is found. If LdrpWalkImportDescriptor does not flush the memory block containing the updated IAT, then all of the previous work will have been for naught because the processor may continue to use the old version of the memory block. (For more information read the SDK documentation for the Kernel32 API FlushInstructionCache, which is just a thin wrapper around NtFlushInstructionCache.)
      If you want to see the results of RunTime Binding via LdrpSnapIAT and LdrpSnapThunk on the IAT for Forwarded.DLL (SP1), take a look at Figure 5.

Bound DLL Processing

You may recall that LdrpWalkImportDescriptor tested for the existence of two directories or descriptors, the regular Import Descriptor and something called the Bound Imports Descriptor, and tried to use the Bound Imports Descriptor first if it was available. Also, LdrpSnapIAT examined the time/date stamp in the Import Directory Table for a special value of -1 before moving on to LdrpSnapThunk. There just may be an alternative to all of this import table munging by pre-binding your DLL. Now is the time to change my test project to load TestDll and see what happens in the Windows 2000 loader with a module that has been bound ahead of time.
If you have not created the environment variable "MSSdk" as I mentioned in the readme.txt file, and set it equal to the root directory where your Platform SDK is located, do so now and rebuild TestDll (a post-link step should kick in that performs the binding operation). Or, from a command line you can enter and run the following command:

  bind -u testdll.dll

      When you examine the resulting executable, you should see that a new directory in the optional header array has been filled: the slot corresponding to IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT. Launching the result in the debugger, you will also observe that the tests for the Bound Imports Descriptor will succeed. Try the test without a call to GetProcAddress on "fnTestDll" and you will see that when LdrpWalkImportDescriptor issues its call to LdrpLoadImportModule, the check for an already loaded module (Kernel32.DLL) will succeed and that means you can avoid the nasty bumps and turns that made the code very complex earlier in the discussion of LdrpWalkImportDescriptor. My DLL loads faster because there is no looping through the APIs I imported from Kernel32 since the fix-ups have already been done (including the forwarded references). I feel like I've found the Holy Grail.
      But this old cup loses some of its shine when I change the sample to call fnTestDll. Before continuing, see if you can foretell why you will be walking that twisty maze again. The reason is that Forwarded.DLL, the module that contains the real code for my forwarded API, fnTestDll, was not itself bound. Run the bind utility on Forwarded.DLL and the brilliance of my newfound treasure returns. The moral of this little exercise is that in order to gain the full measure of efficiency that pre-binding a module provides, make certain that all the subordinate modules have been bound, too.
      There is a slight downside to pre-binding a DLL, though. What happens when the next version of the operating system appears with a new version of Kernel32 and new locations for the exported functions? Or consider the consequences of another module loading and occupying the slot reserved in memory for that DLL you have bound your executable to. Your bindings, the hardcoded addresses in the IAT, will be incorrect and considered stale by the loader. Under these conditions, the binding is effectively ignored, the LdrpWalkImportDescriptor processing takes place, and you are no better off than you were before.
      On the other hand, if you can manage to keep your DLLs in sync with the current versions of the system DLLs and any others you may use, you should see an improvement in your module loads. As the SDK documentation states: "You can minimize load time by using Bind to bypass this lookup." (For more discussion concerning BIND.EXE and other load issues, see Under the Hood in the May 2000 issue of MSDN Magazine. Also, note that the system DLLs, Kernel32, GDI32, User32, AdvApi32, and so on have been pre-bound.) Figure 6 shows the results of pre-binding using the SDK Bind utility on the IAT for TestDll.DLL (SP1).

LdrpUpdateLoadCount

      If you take stock of what you have seen and learned so far, you will realize that the first three parts in LdrpLoadDll's processing have been completed. The last part of LdrpLoadDll to explore involves an update to module reference counts. That is the job for LdrpUpdateLoadCount.
      LdrpUpdateLoadCount is a dual-purpose routine; it is called when the DLL is both loading and unloading. It attempts to walk either the Bound Imports table or the Imports table, and it will recurse on itself for any subordinate modules. The result is code that will likely be difficult for you to follow, but LdrpUpdateLoadCount.cpp contains my attempt to write pseudocode for this procedure. Distilling the essence of the pseudocode leaves the following: LdrpUpdateLoadCount walks through either the Bound Imports Descriptor or the Imports Descriptor looking for imported modules using LdrpCheckForLoadedDll and the NTDLL hash table. If the module was newly loaded by a LoadLibraryExW call, then LdrpUpdateLoadCount updates its reference count and walks its tables for any imports.
      You can easily imagine a tree structure that describes the relationships between DLLs and their imports, and LdrpUpdateLoadCount must walk the tree completely to update everyone's reference count correctly. Some modules enter the process with a reference count of -1 and are skipped by this update. I leave here a question for future exploration: why do some DLLs have a reference count of -1 and the others contain an actual count?
      Longtime readers of Under the Hood may recall a handy utility Matt Pietrek wrote named NukeDll. Back in the ancient days of 16-bit Windows 3.x, an application that General Protection Faulted had the nasty habit of leaving wreckage strewn about in memory in the form of orphaned Dlls—their reference counts never reached 0 and were not released from the common address space that was part of Windows. Using NukeDll, you could nuke these orphans out of existence and free precious resources. The need for a utility like that has been virtually eliminated with Windows NT and its successors because of the compartmentalization imposed upon processes by the operating system. Still, the reference count is important; your application could be loading and unloading modules but leaking HINSTANCE's because of a mismatch in the LoadLibrary/FreeLibrary pairs. But this will only hurt your own buggy application and you will only chomp through your own resources; other processes in Windows NT and Windows 2000 will be isolated from this behavior.
      Once LdrpUpdateLoadCount has finished its work and has returned, LdrpLoadDll checks the return code from LdrpWalkImportDescriptor. If the code is STATUS_SUCCESS, processing continues on to DLL initialization (which was described in Matt Pietrek's September 1999 Under the Hood column) and is followed by leaving the process's critical section. But if there was a problem in LdrpWalkImportDescriptor, then LdrpLoadDll must back out all of the work done up until now, mostly by invoking LdrpUnloadDll. An image base is sent back to LoadLibraryExW in the form of an HMODULE.

The LOAD_WITH_ALTERED_SEARCH_PATH Option

There is one scenario that I have not described yet, and that is when a call to LoadLibraryExW is made with dwFlags equal to LOAD_WITH_ALTERED_SEARCH_PATH. Now that you have grown somewhat accustomed to wandering around DLLs, you might want to experiment on your own with this small wrinkle. Change the Test sample program to issue this version of a LoadLibraryExW call and pay particular attention to the first parameter for LdrpLoadDll and note any differences.
You might also want to create two copies of TestDll, storing one copy in your \TEMP directory, and observing what happens. With a minimum amount of effort you will be able to manufacture a situation where two copies of TestDll are loaded, one from the current working directory and the second from the temp directory. That would demonstrate how the .NET CLR support for side-by-side execution of different versions of the same assembly might be implemented (although this side-by-side capability has been available since at least Windows NT 4.0).

What You've Learned

      Next time the debugger displays a DLL load notification, you will know with some degree of confidence the state that the module is in: it has been mapped into memory, it has not been added to the PEB's housekeeping area, and, most importantly, it has not been initialized yet.
      You also have learned that the PEB contains not one, but three lists enumerating loaded modules in load, memory, and initialization order. (There are also many other fields in the PEB worthy of examination since they lead to other vital pieces of information on your process.) As Matt Pietrek has pointed out, the order of the DLLs you see displayed inside the debugger is not the order in which DLLs are initialized, as many people mistakenly believe.
      Probably the most important fact to hold onto is that a simple call to LoadLibrary results in many more things occurring under the covers than might initially be apparent. The loader must examine each and every API that DLL imports from other DLLs in order to calculate a real address in memory and perhaps load additional DLLs and check to see if an API may have been forwarded on to another procedure housed in another DLL.
      A loading DLL may bring in additional modules where the process just described will be repeated over and over again.
      The overhead that all of this processing brings to your application may be reduced by investigating the use of the SDK utility, BIND.EXE. The loader still checks the reference to each DLL contained in your program, but as long as the entries are not stale (in other words, the entries are still correct), the address calculation and forwarded API processing will be safely bypassed.
      Finally, you have seen that DLLs are reference-counted, just as they were in the ancient Windows 3.x days. Although this count does not have the same systemwide effect that it once had, a DLL that you are trying to manage dynamically will still produce resource leaks if you have not properly matched up FreeLibrary calls with each LoadLibrary call.

Conclusion

You should prepare yourself for additional trips into NTDLL.DLL during future debugging sessions because, like LoadLibraryEx, many Kernel32 APIs lead inevitably to undocumented routines that reside in NTDLL. If you end your investigation prematurely because of a reluctance to enter this uncharted territory, you may miss the real cause of your bug or, at least, a better understanding of your problem. I plan to maintain the pseudocode at my Web site, https://www.smidgeonsoft.com, so if you find any errors or improvements, please pass them along and I will incorporate them into the code listings.

For background information see:
Under the Hood -- MSJ, February 1998
Under the Hood -- MSJ, June 1998.

Russ Osterlund is a software engineer in the BoundsChecker group with Compuware/NuMega Labs. He, his wife, and their cat are new residents in the state of New Hampshire. He can be reached at RussellOsterlund@cs.com or at his Web site, https://www.smidgeonsoft.com.