Mixed DLL Loading Problem
Summary: Applications using mixed DLLs, a combination of both native and Microsoft Intermediate Language (MSIL) DLLs, created with the Visual C++® .NET and Visual C++ .NET 2003 compiler can encounter deadlock scenarios under some circumstances while being loaded. This document explains the problem in detail, describes the solutions that are expected in the next version of the Visual C++ .NET compiler and the common language runtime (runtime), and provides information about workarounds for the Visual C++ .NET 2002 and the Visual C++ .NET 2003 toolsets. All developers who build DLLs using the Visual C++ .NET compiler option should read this document. (6 printed pages)
This article describes the mixed DLL loading problem in detail. Users might be affected by this problem if they are using Visual C++ .NET 2002 or Visual C++ .NET 2003 with mixed DLLs. This paper explains the managed/native code model and describes the scenarios under which mixed DLLs are generated. It describes DllMain restrictions, the rules about legal operations inside DLL entry points, and then the scenarios under which managed code running inside DllMain could violate the DllMain restrictions. In addition, this article covers the specific problems that arise with the current mixed DLL loading algorithm, and describes the new mixed DLL loading algorithm, which is expected in the next version of the compiler and runtime and will fix the loading problem. Finally, this paper provides information about a workaround using the Visual C++ .NET 2002 and Visual C++ .NET 2003 tools.
Before delving into the technical reasons behind the mixed DLL loading problem, it is important to understand mixed DLLs and the scenarios under which they are generated. The Visual C++ .NET compiler is capable of generating both native code, for example, x86 machine instructions, and managed code (MSIL). The compiler or the user can make the decision about which type of code to generate on a per-function basis. That is, a single DLL or EXE can contain some native code functions and some managed code functions.
A native image, DLL or EXE, is an image where all functions are implemented in native code. This is the only type of image that could be generated by the Visual C++ compiler prior to Visual C++ .NET. Native images remain the most common type of image for C++ users. They are loaded by the OS loader and execute directly on the hardware with some OS intervention. Native images continue to work as they always have. The information in this article does not apply to native images.
A pure MSIL image, DLL or EXE, is an image where all functions are implemented in MSIL. The Visual C# .NET and Visual Basic .NET compilers can only generate pure MSIL images. Furthermore, it is possible to generate pure MSIL images using the Visual C++ .NET compiler under certain circumstances. For more information and guidelines on constructing a pure image using the Visual C++ .NET compiler with the runtime compiler option, see "Producing Verifiable Components with Managed Extensions for C++" in the Visual Studio .NET 2003 documentation. Pure MSIL images are loaded by the common language runtime loader and execute on top of the runtime. Pure MSIL images are not affected by the mixed DLL loading problem. The information in this article does not apply to pure MSIL images.
A mixed image, DLL or EXE, is an image in which at least one function is implemented in native code and at least one function is implemented in MSIL. When compiling with the common language runtime compiler option in Visual C++ .NET 2002 and Visual C++ .NET 2003, mixed images will usually be generated. Since the OS does not understand managed code and the runtime does not understand native code, the OS and runtime must work together to load and execute mixed images. This interoperation leads to some complications and is the source of the mixed DLL loading problem.
Note While mixed EXEs have some limitations, they are unaffected by the mixed DLL loading problem. This article does not apply to mixed EXEs. This article only applies to mixed DLLs.
The DllMain entry point function is intended to perform only simple initialization and termination tasks. Doing more than simple initialization and termination can create deadlocks and circular dependencies. This restriction stems from the fact that the DllMain entry point function runs while holding a locking mechanism for the OS loader. This mechanism ensures that code inside the DLL cannot run before the DLL has been initialized. Furthermore, the OS loader lock prevents multiple threads or processes from attempting to load DLLs at the same time, which could corrupt global data structures used during the loading process. In order to help customers avoid problems with their DllMain functions, the MSDN library documents those operations that can and cannot be performed safely inside a DLL entry point. For more information, see DllMain.
The following operations are specifically identified as being safe to perform inside a DllMain function:
- Initialization statics and globals.
- Calling functions in Kernel32.dll. This is always safe since Kernel32.dll must be loaded by the time DllMain is called.
- Creating synchronization objects such as critical sections and mutexes. For more information, see Synchronization Objects.
- Accessing Thread Local Storage (TLS).
The following operations are specifically identified as being unsafe inside a DllMain function under most circumstances:
- Calling the LoadLibrary, LoadLibraryEx, or FreeLibrary functions directly or indirectly.
- Calling the registry functions.
- Calling imported functions other than those located in Kernel32.dll.
- Communication with other threads or processes.
The rules for DllMain functions are not yet enforced by the system. Instead, the OS will attempt to detect a deadlock or dependency loop and terminate the process if the rules are broken and something goes wrong. Usually, the OS will not detect the deadlock, and the process will hang.
It is never safe to run managed code inside DllMain. This means that it is not safe for DllMain to be implemented in MSIL, nor is it safe for DllMain to directly or indirectly call a function that is implemented in MSIL. If managed code is run inside DllMain, deadlock is a possibility.
There are several circumstances under which the runtime must perform operations that are not permitted under DllMain in order to guarantee correct semantics. The following two sections provide specific examples of these circumstances.
Accessing Unloaded Types
Managed code does not require modules or assemblies to be explicitly loaded. The runtime handles this under the covers. Whenever a type is accessed, the runtime checks to see whether the assembly containing that type has been loaded into the running AppDomain. If it has, the type is used as specified. If it has not yet been loaded, the runtime automatically calls LoadLibrary on the DLL assembly containing the desired type. Once the assembly has been loaded and initialized, the desired type is free to use. Under most circumstances, this automatic library loading serves as a convenience. When running managed code inside DllMain, though, automatic library loading has the potential to violate DllMain rules.
The runtime provides a garbage collection mechanism that frees programmers from the burden of manual memory management. There are a variety of situations under which preemptive garbage collection can cause deadlock while loading mixed DLLs. For example, if garbage collection is triggered while a thread holding the OS loader lock is in preemptive GC mode, that thread will be suspended. If any other thread is running in cooperative mode and attempts to take the OS loader lock, deadlock will result. Furthermore, the garbage collector will attempt to take the OS loader lock itself under some circumstances, such as when the garbage collector establishes write-watch protection over pages or monitors the OS memory load. If executing during DllMain, this will also cause a guaranteed deadlock. There are also cases in multiprocessor systems with multiple garbage collector threads where deadlock can be an issue during DllMain. In general, it is unsafe to invoke the garbage collector during DllMain or at any time when the OS loader lock is held.
Unfortunately, under the current implementation, the scenarios described above can sometimes occur inside DllMain when loading mixed DLLs. The loading algorithm employed by mixed images in order to force both managed and native code to run together creates a situation where mixed DLLs that have not been linked with the /noentry option can perform unsafe operations under OS loader lock inside the DLL entry point. Furthermore, even DLLs linked with the /noentry option may deadlock on versions 1.0 and 1.1 of the common language runtime. DLLs linked with the /noentry option should not deadlock on the next version of the runtime.
When the Visual C++ .NET compiler is used to compile a DLL, using /clr but not using /noentry, an unmanaged DllMain entry point will always be generated. This compiler generated entry point will call the runtime startup code, which loads MSCOREE.dll and MSCORWks.dll. The runtime startup code then calls DllMainCRTStartup if the C Runtime is used, which is very common. DllMainCRTStartup initializes user code and CRT statics and globals. Then, DllMainCRTStartup calls the user provided DllMain, if it exists. In mixed DLLs, the user provided DllMain is usually compiled to MSIL, which means that managed code will run under the loader lock, causing deadlock possibilities as described in the sections Accessing Unloaded Types and Garbage Collection in this article. Even if the user provided DllMain is implemented in native code (for example if it was surrounded by #pragma unmanaged in the source), managed code is still guaranteed to have run during this process, because some of the stubs and thunks that are called during the loading process are implemented in managed code.
Consequently, due to the way the loader works in version 1.0 and 1.1 of the common language runtime, deadlock situations are always possible with mixed DLLs, even though they are often quite rare in practice. The worst part of this is that mixed DLLs that happen to work on most systems can begin deadlocking if the system is stressed, the image is signed (since the security functionality require more managed code to run during assembly load), hooks are installed into the system, or the behavior of the runtime changes through service packs or new releases. In summary, this is a serious problem that must be addressed for all mixed DLLs.
Moreover, the implementation of versions 1.0 and 1.1 of the common language runtime loader can encounter deadlock situations, even when the /noentry linker option is specified, such as when unmanaged DLL exports or unmanaged VTable Fixups (VTFixups) are part of a mixed DLL. These problems should be fixed in the next version of the runtime, but there is currently nothing that can be done to completely eliminate the deadlock risks on versions 1.0 and 1.1 of the common language runtime associated with unmanaged exports and unmanaged VTFixups with Visual C++ .NET 2002 or Visual C++ .NET 2003. These specific deadlock risks should not exist when running the same images on the next version of the runtime. Consequently, Microsoft recommends that unmanaged exports and unmanaged VTFixups not be unnecessarily removed from images unless a deadlock situation is experienced.
Also note that due to the larger amount of code that is run when an assembly is signed using the .NET assembly signing technology, the chance of deadlock under all of the scenarios previously described is amplified when a mixed DLL is signed.
Unfortunately, the breadth and depth of changes required to the common language runtime, CRT, Visual C++ compiler and linker prevented Microsoft from fully fixing the problem in the Visual C++ .NET 2003 release. Fixing this problem for Visual C++ .NET 2003 would have risked destabilizing other pieces of core managed functionality. Instead, the Visual C++ team developed a solution that can fix this problem in most cases for mixed DLLs built with Visual C++ .NET 2002 and Visual C++ .NET 2003 and running on the next version of the common language runtime, but that requires the programmer to change some code. The following paragraphs describe the solution that addresses all of the mixed DLL loading problems and is expected to be implemented in the next release of Visual C++ .NET (after Visual Studio .NET 2003) and the next version of the common language runtime (after version 1.1).
In particular, the common language runtime is adding a new load time event that signals the loading of a module into an application domain. This new event is similar to the native DLL_PROCESS_ATTACH event. When a module is loaded, the common language runtime will check the module for a .cctor method in the global scope. The global .cctor is the managed module initializer. This initializer runs just after the native DllMain (in other words, outside of loader lock) but before any managed code is run or managed data is accessed from that module. The semantics of the module .cctor are very similar to those of class .cctors and are defined in the ECMA C# and Common Language Infrastructure Standards.
The goal of this solution is to separate managed and native initialization so that only the core native pieces that must be run under the OS loader lock will actually run there. The managed initialization and non-core native initialization will be run inside the module initializer, outside loader lock but before the module is used. Note that the module .cctor is being brought before the ECMA committee for standardization, though Visual C++ will most likely be the only Microsoft language to support it in the post Visual Studio .NET 2003 release.
In addition to providing the managed module initializer mechanism to fix the loader lock problem in newly compiled images, this solution also provides checks to prevent the common language runtime from executing unsafe images that may have been built with old tools. The next version of the common language runtime will have the ability to immediately terminate DLLs that attempt to execute managed code inside the OS loader lock. This mode would replace non-deterministic and unsafe behavior with a deterministic failure that provides enough information to fix the problem. If the application developer is unable to address the problem using Visual C++ .NET 2003 tools or obtain the next version of the compiler, a process-wide compatibility mode will be available to permit unsafe mixed DLLs to execute. Both the compatibility and unsafe termination modes can be enabled through the XML configuration file for the application. Furthermore, applications that will require unsafe mixed DLLs to execute could also specify a specific version of the runtime (such as version 1.1) on which to run using the XML configuration file for the application.
Microsoft has prepared a Knowledge Base article that fully describes the steps that must be taken to ensure that a mixed DLL built with the Visual C++ .NET 2003 toolset can be safely run with the corrected runtime. The Knowledge Base article is available at http://support.microsoft.com/?id=814472. Briefly, the solution requires that the programmer link all mixed DLLs with /noentry and manually initialize DLLs as appropriate. Helper functions are provided to simplify the manual initialization process. Note that the Knowledge Base article describes the steps that are necessary to create a mixed DLL using the Visual Studio .NET 2002 and Visual Studio .NET 2003 tools that should be safe on the next version of the common language runtime. On the 1.0 and 1.1 versions of the common language runtime, there will be a slight chance of deadlock possibilities even if these guidelines are followed. Further KB articles will be released at product launch that describe ways in which to work around some of the remaining deadlock possibilities. See the Knowledge Base article for more information.
The Visual C++ and common language runtime teams made engineering choices for mixed (managed and native) DLL loading and initialization that they have since decided to revisit. These choices have negatively impacted the reliability of common mixed DLLs through the mixed DLL loading problem. This document discusses some of the core technical issues that contribute to the mixed DLL loading problem. A reasonable workaround for the loading problem has been identified and documented for the Visual C++ .NET 2003 product. This workaround will make mixed DLLs much safer on versions 1.0 and 1.1 of the common language runtime. Furthermore, this workaround should make mixed DLLS completely safe (for example, no risk of deadlock) on the next version of the common language runtime. Failure to implement the proposed workaround with Visual C++ .NET 2002 and Visual C++ .NET 2003 toolsets may cause application failure in the next version of the common language runtime. The next release of Visual C++ .NET and the common language runtime is expected to contain a complete fix for this problem that will generally not require any user code changes.