| TypeRefViewer Utility Shows TypeRefs and MemberRefs in One Convenient GUI |
|Matt Pietrek |
| love writing software tools. Whenever I'm learning about a new operating system or technology, I dig through its specifications to see the sort of details I can get. I then write tools to process, view, dissect, or whatever. In doing so, I learn how the system works. One of the most interesting types of tools is one that lets you see what external items a component relies on. |
This month, I'll describe just such a tool for Microsoft .NET. No, I'm not talking about the Microsoft Intermediate Language Disassembler (ILDasm). Instead, I've written a tool that shows you the moral equivalent of API imports for .NET binaries. As part of this, I'll describe one of the major functional differences between .NET's managed and unmanaged metadata APIs. Finally, I'll show how Visual C++ with managed extensions makes it relatively painless to mix hip, modern C# code with old school C++ code that calls COM methods.
All About .NET Imports
In plain Win32®, an EXE or DLL that calls APIs from another DLL is said to import the other DLL. .NET, EXEs still contain regular Win32-style imports, since they're basically a superset of Win32 Portable Executable (PE) files. However, it's rare for a .NET executable to import anything more than a single API from MSCOREE.DLL.
The real action is in a .NET executable's metadata, where references to other assemblies are stored. A .NET assembly is composed of one or more modules. In Win32 terminology, a .NET module is a DLL. It's very common for a single .NET DLL to be both a module and the assembly itself; however multiple module assemblies are certainly possible. The key point is that normal Win32 executables import DLLs directly, while .NET executables import assemblies, which are containers for DLLs.
In Win32, all of an executable's imports are stored in a well-defined location in the executable. The PE file header contains an offset to the imports section. The imports section begins with an array of structures, one structure for each imported DLL. For each imported DLL, there are data structures that tell exactly which APIs are imported from that DLL. (For you PE experts, these are the Import Address Table and Import Names Table.) You can see evidence of this information by running the executable through Depends (from the Platform SDK), or DUMPBIN /imports.
Fast forward to .NET and its vast library of classes that do practically everything but butter your toast. How would you see what classes and methods a given .NET executable is using? .NET has the System.Reflection classes which not only let you examine the extremely detailed metadata of an executable, but also allow you to create new methods dynamically in memory. For our purposes, we'll stick to examining metadata. So what can you get from the reflection classes?
Burrowing into the reflection class hierarchy, it doesn't take too long to find the Assembly.GetReferencedAssembly method. This method returns an array of AssemblyName instances. Unfortunately, the AssemblyName class doesn't contain any information about which types were actually used from the referenced assembly. Even worse, there's no information about which methods and properties of the classes were used.
Surely this information must exist somewhere. Without it, using code from another module would be horribly expensive. Search all you want in the .NET classes but you won't find this information. Luckily, there's an escape hatchthe unmanaged metadata APIs. I discussed these APIs, as well as the System.Reflection classes in my October 2000 article, "Avoiding DLL Hell: Introducing Application Metadata in the Microsoft .NET Framework
". In a nutshell, the unmanaged metadata APIs are regular old COM interfaces. They're more work to use than the reflection classes, but as you'll see, they provide more information.
Seeing .NET Imports with Existing Tools
ILDasm uses the unmanaged metadata APIs. Unfortunately, it doesn't show anything about which classes and methods are imported. There is another Microsoft tool that uses the unmanaged APIs, and which does tell you about imported classes and methods. Alas, Microsoft has done a good job of hiding this tool in the current Beta 2 release.
The tool is called MetaInfo, and the source code can be found in the Program Files\Microsoft.NET\FrameworkSDK\Tool Developers Guide\Samples\metainfo directory. MetaInfo is the tour de force of the unmanaged metadata APIs. Currently there's no binary supplied, so you have to compile it. Hey, it's almost like going the open source route!
Let's say you've compiled MetaInfo and run it. How would the imported classes and methods appear? Figure 1 shows a snippet of MetaInfo output showing two imported types with fine imported methods. It's important to note that for each referenced class, there's something called a TypeRef. For each imported method from that class, there's a MemberRef. TypeRefs and MemberRefs are key components of the unmanaged metadata APIs, and I'll show later how I exposed them to my C# code.
Figure 2 TypeRefViewer
While MetaInfo is functional, it writes to stdout or a file. But people want GUIs! ILDasm is a GUI app, but doesn't show the TypeRefs and MemberRefs. MetaInfo does, but it's a text mode program. Sensing an opportunity, I wrote the TypeRefViewer program shown in Figure 2.
The TypeRefViewer Program
When writing TypeRefViewer, I wanted to use C#, WinForms, and the reflection classes as much as possible. On the other hand, I needed to be able to call the unmanaged metadata APIs. In ordinary circumstances, this wouldn't be a big deal, since .NET makes it relatively easy to call existing COM methods. However, if you dig a little deeper, you'll find that the simplest form of the .NET Interop code assumes that you're calling a COM component with a type library. The TLBIMP program takes the COM type library and translates it into an assembly with equivalent metadata.
In my case, I was out of luck. The unmanaged metadata APIs don't come with a type library. While I could muck around with IDL files to synthesize a type library, it's a pain, and oh so Twentieth Century. Or, if I was into pain, and had lots of time to kill, I could create custom wrappers for the dozens of methods in the unmanaged API. Not something I felt like doing.
So I chose a third route. I had existing unmanaged C++ code (the Meta programs from the aforementioned article) that encapsulates the messy details of using the unmanaged APIs. This code readily compiles as managed .NET code. Just add the /CLR switch to Visual C++ and stir! It's then trivial to write a managed class that wraps the C++ code. Compile this into a .NET assembly, and the code is easily imported into a C# program.
The only real work was to stuff the data from the unmanaged APIs into managed types accessible to C# code. The beauty of metadata is that I simply declared the managed data structures in my C++ code, and the C# code implicitly knew about the types from the metadata. No need to declare the same structure in both the C++ and C# code!
The basic design is as follows: I wrote the main program (TypeRefViewer) in C#, and included a WinForms-based GUI. When the user selects an assembly, the main program calls down to MetaDataHelper.DLL. This DLL is a fully fledged .NET assembly, written in Visual C++ with managed extensions. The MetaDataHelper code calls the unmanaged APIs, and stores the results in managed data structures that are returned to the calling C# code. As a bonus, the .NET garbage collection means that I don't have to worry about freeing the data, since it in the managed heap.
Looking at Figure 2, you'll see that the TreeView control is the focal point. In the treeview, there are three levels. The topmost level nodes are the namespaces imported by the executable under examination. The children of the top-level nodes are the classes imported from the namespace. Finally, beneath the class nodes are the class's methods that are imported.
When you first start TypeRefViewer, you'll want to select an executable to view. This is done with the Browse button, which brings up the familiar Open dialog. You can also specify an executable name on the command line.
At the bottom of the TypeRefViewer window is the Export button. Clicking it causes TypeRefViewer to write a text mode version of the information to the file you specify. Also at the bottom is an assembly name which changes to reflect whichever assembly the selected treeview node comes from.
Let's take a look at how TypeRefViewer is implemented. First, let's examine MetaDataHelper.DLL, which calls the unmanaged metadata APIs. The primary source file for this DLL is MetaDataHelper.CPP (see Figure 3). Within this file, I put all the code and data structures in the Wheaty.UnmanagedMetaDataHelper namespace.
Right inside the namespace declaration are two class definitions: MemberRefInfo and TypeRefInfo . These are managed types (note the __gc modifier) that contain the information returned to the calling code. What's returned is an array of TypeRefInfo class instances, one for each imported type. A TypeRefInfo contains the name of the imported type and the assembly it comes from.
In addition, each TypeRefInfo contains an array of MemberRefInfo classes. Each MemberRefInfo represents one imported method. Along with the name of the imported method, the MemberRefInfo also has the method's signature blob which is a very compact binary encoding of the method's calling convention, return type, and parameters. This encoding only includes the parameter types, not their names.
It's worth noting that the signature blob is only exposed via the unmanaged metadata APIs. You won't see any mention of signature blobs in the reflection APIs, even though they use the blobs internally. I included the signature blobs in the MemberRefInfo as an aid when working with the reflection APIs in the C# portion of the code. The gist is that because of method overloading, a method name by itself isn't unique. The signature blob can help narrow down the exact MethodInfo instance when using the reflection APIs. More on this later.
After the first two class definitions is the TypeRefInfoHelper class which contains a single method: GetTypeRefInfo. This method takes a filename as a parameter, and attempts to read the metadata for that file. The unmanaged metadata interfaces used are IMetaDataImport and IMetaDataAssemblyImport, defined in cor.h. Getting hold of these interface pointers is boilerplate code which I encapsulated into yet another class, MetaDataImportWrapper, defined in MetaDataImportWrapper.cpp (see Figure 4).
After creating an instance of the CMetaDataImportHelper class, the code has IMetaDataImport and IMetaDataAssemblyImport interface instances. Using these, a variety of methods are called (including EnumTypeRefs and EnumMemberRefs) to pull out the desired information. All relevant information is placed into managed MemberRefInfo and TypeRefInfo class instances.
If you examine the MetaDataHelper code, you'll see that it looks like fairly standard C++ code that uses COM. The only unusual spots are where I needed to declare the managed data types, and the new syntax needed to allocate instances of these types. There are also a few lines of code that use System::Runtime::InteropServices::Marshal methods to convert the file name (passed as a .NET String) into a classic char * string.
With MetaDataHelper.DLL covered, let's turn to the main program, TypeRefViewer, shown in Figure 5. Most of the code at the top of the file is standard Windows Forms code that sets up the main window and its controls. After the user selects a file with the Browse button, the selected file name is passed to MetaDataHelper.DLL, and an array of TypeRefInfos are obtained. The primary task of the main program is to process each TypeRefInfo, and store the results in the treeview control.
Most of the logic for processing the TypeRefInfos is in the DisplayTypeRefsFromFile method. This routine attempts to obtain a Type class for each of the TypeRefInfos. The Type class is the primary starting point for working with metadata via the System.Reflection classes. I used the reflection classes since they're easy and intuitive to work with. For example, it's almost trivial to create a formatted string representing a method's parameters.
To put what I was trying to do another way, I used the unmanaged APIs first to get metadata information not obtainable via the reflection classes. Then I switched to the reflection classes to finish the job. It's in this transition where things can get messy.
In earlier versions of .NET, you could pass just about any name to Type.GetType, and the runtime would do a decent job of locating which assembly the Type came from. By default, starting with Beta 2 of .NET, the runtime only searches in the calling assembly and the System assembly (mscorlib.dll ).
Thus, if you call Type.GetType with just the string "System.Windows.Forms.TreeView", the runtime won't find the type, even though System.Windows.Forms.dll is in the path. To make this work properly, you need to qualify the type with the assembly name. You can take formal type qualification to elaborate lengths, as described in the .NET documentation. However, for our purposes, it's usually sufficient just to postfix the type name with the name of the enclosing assembly, preceded by a comma.
Where did I get the assembly name from? Back when MetaDataHelper.dll was building the TypeRefInfos for each type, it also queried for and stored away the assembly name. When running TypeRefViewer, you may occasionally run into a situation where the Type.GetType can't find the assembly. When this happens, the treeview entry says "Error with .XXX", where "XXX" is the type name.
Once the Type instance is obtained, the next job is to figure out what namespace the type comes from. The unmanaged metadata APIs don't make it easy to figure this out, nor do they keep the imported types sorted in any sort of namespace order. Thus, the TypeRefViewer code uses a dictionary class instance (called imported_namespaces) that maps a namespace to the treeview node representing that namespace. As each TypeRefInfo is processed, the Type information is either stored under an existing namespace node, or a new node is created as needed. The dictionary is just a convenience to prevent searching through all the treeview namespace nodes whenever a new Type is added.
Once the appropriate namespace node has been identified, and an appropriate class node created under it, the next major task is to add all the imported members from the class. The TypeRefInfo contains an array of MemberRefInfos, each of which contains the name of the imported method. As a side note, you may sometimes see class nodes with no method nodes underneath them. This is to be expected. When this occurs, the unmanaged APIs are indicating that the class is imported, but that no methods from that class are actually used.
In the first versions of TypeRefViewer, each method name was simply inserted under the appropriate class node. It didn't take long before I became dissatisfied with this approach. The .NET classes make heavy use of overloading, so it's important to know exactly which method is being imported. The most direct way of showing the actual method is to append the method's parameters and return value to the method name. All that's needed is to get the appropriate System.Reflection.MethodInfo or System.Reflection.ConstructorInfo instance. From that, you can get a ParameterInfo array, and format each element. The FormatParameterString method in TypeRefViewer.CS does that.
The tough part is getting hold of the correct MethodInfo or ConstructorInfo class. (From here on, I'll use "methods" to mean both methods and constructors.) The normal way to get a MethodInfo is via Type.GetMethod, which itself is overloaded. The simplest form of Type.GetMethod takes a method name, but can return multiple MethodInfos. Which one do we want? Other forms of Type.GetMethod let you be very specific about which overloaded method you want. However, calling it requires that you know in advance what all the parameters are. Catch 22!
Sensing that there had to be a better way, I thought back to how the unmanaged metadata APIs provide a signature blob for each imported method. With lots of work, you could take a signature blob, create the values necessary to call Type.GetMethods, and get back the exact matching method. However, with just a little work, you can read the signature blob to figure out how many parameters the method takes. With this additional knowledge, you can call the simple version of Type.GetMethods, and filter out the ones that don't have the desired number of parameters. In lots of cases, this is sufficient to single out the exact MethodInfo we're looking for. If this hack doesn't succeed, TypeRefViewer punts and displays the method name with "(???)Overloaded" after it.
The last loose end of TypeRefViewer is the TypeRefTreeNode node class in TypeRefTreeNode.CS (see Figure 6). This is a simple class derived from the .NET TreeNode class. It exists solely to associate the node with an assembly. If you select a node in the treeview, the bottom of the form updates to show the assembly name.
TypeRefViewer is a utility I cooked up for my own use. It comes in handy when trying quickly to get an idea of how some .NET component does its magic. Sure, I could run the component though ILDASM and sort through thousands of instructions to see what imported methods are called. TypeRefViewer is a much simpler way to see what's being imported.
In writing TypeRefViewer, I've shown that the managed and unmanaged metadata APIs can be mixed in a reasonably simple manner. I've also demonstrated mixing C# code with a managed C++ DLL, and passing significant amounts of data between the two. I was pleased once again to see how simple it is to use multiple languages in the same project.
TypeRefViewer packs a lot of functionality into a small amount of code. However, there are some easily added features that I'll leave to the ambitious reader. One such feature would be to add a search capability that would find and highlight the tree nodes matching a search string. Another would be to use the signature blob information to find the matching MethodInfo more aggressively. If you do add something significant to the code, I'd love to hear about it!
Send questions and comments for Matt to email@example.com.
| Matt Pietrek is an independent writer, consultant, and trainer. He was the lead architect for Compuware/NuMega's Bounds Checker product line for eight years and has authored three books on Windows system programming. His Web site, at http://www.wheaty.net, has a FAQ page and information on previous columns and articles. |