C++ At Work

Enum Declarations, Template Function Specialization

Paul DiLascia

Code download available at:CAtWork0507.exe(204 KB)

Q I have just read your article, "Wrappers: Use Our ManWrap Library to Get the Best of .NET in Native C++ Code," in the April 2005 issue, but the utility DumpEnum was not provided. Where may it be obtained?

Q I have just read your article, "Wrappers: Use Our ManWrap Library to Get the Best of .NET in Native C++ Code," in the April 2005 issue, but the utility DumpEnum was not provided. Where may it be obtained?

Jarrad Waterloo

Given the name of a Microsoft® .NET Framework class like MenuItem or Form, is there some way to programmatically find which assembly contains that class?

Several Readers

A I'm happy to explain how I wrote DumpEnum and I'll even give you the code. In the process I'll answer the second question, too. But first let me explain for other readers just what DumpEnum does. One of the things I had to do in my April 2005 article was write a C++ enum to exactly match the .NET Framework type RegexOptions. RegexOptions is an enumerated (enum) type you can use with methods like Regex::Match and Replace to control matching. For example, you can call Regex::Match with RegexOptions::IgnoreCase to ignore case or RegexOptions::Singleline in order to treat the input string as one line. Figure 1 shows the values in RegexOptions.

A I'm happy to explain how I wrote DumpEnum and I'll even give you the code. In the process I'll answer the second question, too. But first let me explain for other readers just what DumpEnum does. One of the things I had to do in my April 2005 article was write a C++ enum to exactly match the .NET Framework type RegexOptions. RegexOptions is an enumerated (enum) type you can use with methods like Regex::Match and Replace to control matching. For example, you can call Regex::Match with RegexOptions::IgnoreCase to ignore case or RegexOptions::Singleline in order to treat the input string as one line. Figure 1 shows the values in RegexOptions.

Figure 1 RegexOptions

Member Name Value
Compiled 8
CultureInvariant 512
ECMAScript 256
ExplicitCapture 4
IgnoreCase 1
IgnorePatternWhitespace 32
Multiline 2
None 0
RightToLeft 64
Singleline 16

To pass RegexOptions from native C++ code, you need a C-style enum with the same values. The quickest way to write such code if you only need to do it once is to copy from the documentation into your C++ file, then edit it to follow C syntax. But what if you make a typo, or the options ever change, or you want to wrap several enum types? Better and more reliable is to write a tool that generates the C++ code automatically—especially since reflection makes it easy: the .NET Framework provides all the information needed to describe itself. So I wrote a little program called DumpEnum that you can run from a command prompt, like so:

DumpEnum RegexOptions

DumpEnum writes the name/value pairs as C/C++ code to the standard output. You can redirect the output to a file by typing

DumpEnum RegexOptions > regopt.h

then inserting regopt.h into your header file. That's what I did for RegexWrap.h in my article. Figure 2 shows the actual file DumpEnum generates for RegexOptions.

Figure 2 RegexOptions in C++

////////////////// // Enumeration for System.Text.RegularExpressions.RegexOptions // in System.dll. Underlying type is System.Int32. // Automatically generated by DumpEnum. // enum RegexOptions { None = 0, IgnoreCase = 1, Multiline = 2, ExplicitCapture = 4, Compiled = 8, Singleline = 16, IgnorePatternWhitespace = 32, RightToLeft = 64, ECMAScript = 256, CultureInvariant = 512, };

So now that you know what DumpEnum does, you should see how I implemented it.

Every Framework class is described by a System.Type class. It has properties like Name to get the name of the type and IsEnum to tell if the type is an enumeration. The first thing DumpEnum has to do is get hold of the Type for whatever type name was passed on the command line—for example, RegexOptions. It turns out this isn't as easy as you might hope. The two most common ways to get the Type are to invoke obj->GetType if you have an object instance, or __typeof if you don't. For example:

#using <System.dll> using namespace System::Text::RegularExpressions; ... Type *t = __typeof(RegexOptions);

In C++ you have to use __typeof for managed types because typeof is already defined (in the new C++\CLI you would use typeid<>). But __typeof only works with a symbol name, not a string, which presupposes you know the name at compile time, as well as which assembly and namespace it inhabits. DumpEnum doesn't have this information at compile time; it gets the type name as a command-line parameter. DumpEnum doesn't have an object instance either, so it can't call Object::GetType. What to do?

There is another way to get the Type. If you know which assembly the type lives in, you can load it and call Assembly::GetType, which takes a string. But Assembly::GetType requires the full type name, and you have to know which assembly to load. My first stab at DumpEnum required me to type this information on the command line, as shown in the following:

DumpEnum System.dll System.Text.RegularExpressions.RegexOptions

You know RegexOptions lives in System.dll and you know the namespace is System.Text.RegularExpressions because the documentation says so. But looking in the documentation is such a bore, and typing the full namespace is enough to give you carpal tunnel syndrome. I'd rather spend three hours writing a program that lets me type 12 characters instead of 54, than spend 30 seconds looking up the DLL/namespace and typing the full type name. This may seem like a laziness paradox, but it's what the good-programmer ethos demands because you end up with a cool tool you can use forever—and can share with your friends!

But before I can write DumpEnum the way I want, I have to solve the more general problem, the one the second question poses: given a (possibly unqualified) type name like RegexOptions, how do I find which Framework DLL contains that type? There's no central database or registry entry that holds this information (and a good thing, too), or method to call, so it appears hopeless. But wait a minute—think again! All the Framework DLLs live in one folder, and there were only 70 or so of them the last time I counted. Why not use brute force? Just load every assembly in the known Framework universe, looking for ones that have a type whose name matches the one desired. This is straightforward and only takes a few seconds on my lowly 1 GHz P3. I wrote a program called FindType that does the job:

FindType MenuItem

Figure 3 shows the results. FindType lists all the Framework DLLs that export a type with MenuItem in the name. FindType looks for types that contain the text appearing as a word. In other words, if you run "FindType Control", FindType reports System.Windows.Forms.Control and System.Web.UI.Control, but not System.Web.UI.WebControls.WebControl. The idea is you're supposed to enter the short name, the one you'd use if you were #using the right namespace. FindType accomplishes this by building a regular expression of the form "\bControl\b". The special anchor character (atomic zero-width assertion) '\b' matches a word break without including the break character in the match.

Figure 3 FindType in Action

Figure 3** FindType in Action **

How does FindType know which DLLs to search? All the Framework DLLs live in a single folder with a name like C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322. You can get the actual path for your machine by concatenating the environment variables %FrameworkDir% (which resolves to "C:\WINDOWS\Microsoft.NET\Framework") and %FrameworkVersion% ("v1.1.4322"). To make life easy, I encapsulated the process in a class CFindType. To use it, you have to derive your own specialization and implement the virtual function OnMatch:

class CMyFindType : public CFindType { protected: virtual BOOL OnMatch(LPCTSTR typName, LPCTSTR asmPath) { // print it } };

Then you can instantiate and call the method FindExact:

CMyFindType ft; int nFound = ft.FindExact("MenuItem");

CFindType grovels through the assemblies, looking for ones with types whose names match the one passed. Whenever it finds a type that matches, it calls your virtual OnMatch handler with the type and assembly names. If you want to use a different regular expression, you can call CFindType::Find instead of FindExact. CFindType, the class, makes implementing FindType, the program, easy: Just write an OnMatch handler that displays the information on stdout. For details, download the code available on the MSDN®Magazine Web site.

CFindType is itself derived from an even more general class, CEnumTypes, which enumerates all the types in the Framework, calling a virtual OnType function for each one. CFindType::OnType uses Regex to compare the type name with the one requested, and calls OnMatch if it matches. CEnumTypes uses _tgetenv to get the environment variables to build a filespec of the form "FrameworkDir\FrameworkVersion\*.dll", and it uses MFC's CFileFind class to enumerate the DLLs. CEnumTypes attempts to load each DLL as an assembly. If the load fails (perhaps the DLL is not a managed assembly), CEnumTypes ignores it and keeps searching. If the load succeeds, it calls Assembly::GetExportedTypes to get an array of Types the assembly exports, and calls OnType for each one. Figure 4 shows the code.

Figure 4 FindType.h

//////////////////////////////////////////////////////////////// // MSDN Magazine — July 2005 // If this code works, it was written by Paul DiLascia. // If not, I don't know who wrote it. // Compiles with Visual Studio .NET 2003 (V7.1) on Windows XP. Tab size=3. // #pragma once using namespace System::Reflection; using namespace System::Text::RegularExpressions; ////////////////// // Class to enumerate all the types in all the assemblies in // the Framework.You must derive from this and override OnType // to do something with each type found. // class CEnumTypes { protected: BOOL m_bAbort; // derived classes can set this to stop enumerating // pure virtual function—you must override virtual BOOL OnType(LPCTSTR typName, LPCTSTR asmPath) = 0; public: CEnumTypes() : m_bAbort(FALSE) { } virtual ~CEnumTypes() { } // enumerate all types, calling virtual OnType for each one. int Enumerate() { CFileFind finddlls; int nfound=0; // Create filespec: look for *.dll in Framework dir CString dllspec; dllspec.Format(_T("%s\\%s\\%s"), _tgetenv(_T("FrameworkDir")), _tgetenv(_T("FrameworkVersion")), _T("*.dll")); BOOL bMore = finddlls.FindFile(dllspec); while (bMore && !m_bAbort) { // for each DLL: bMore = finddlls.FindNextFile(); CString asmPath = finddlls.GetFilePath(); try { // load assembly, enumerate all types Assembly* a = Assembly::LoadFile(asmPath); Type* types[] = a->GetExportedTypes(); int n = types->Length; for (int i=0; i<n; i++) { if (OnType(CString(types[i]->FullName), asmPath)) nfound++; } } catch (Exception* /*e*/) { // couldn't load: skip (do nothing) } } return nfound; } }; ////////////////// // Derived specialization to find all the types/assemblies that contain // a type whose name matches a given string or regex. You must derive // from this and override OnMatch to do something with each type found. // class CFindType : public CEnumTypes { protected: CString m_regex; // regular expression to match type name // pure virtual function—you must override virtual BOOL OnMatch(LPCTSTR typName, LPCTSTR asmPath) = 0; // override CEnumTypes to count only types whose name matches regex. virtual BOOL OnType(LPCTSTR typName, LPCTSTR asmPath) { return Regex::Match(typName, m_regex)!=Match::Empty && OnMatch(typName, asmPath); } public: // Search for the type appearing as a word by itself—use the regex // "\btypename\b" (typename surrounded by word breaks). So if you // search for "Menu" CFindType will find System.Foo.Menu but not // "System.Foo.MenuItem." // int FindExact(LPCTSTR typname) { m_regex.Format(_T("\\b%s\\b"), typname); return Enumerate(); } // Search using arbitrary regex int Find(LPCTSTR regex) { m_regex = regex; return Enumerate(); } };

Finally, now that I have CFindType, I can use it to solve my original problem: fixing DumpEnum so I don't have to tell it the assembly and full type names. DumpEnum uses CFindType with a different OnMatch handler. The handler in DumpEnum checks that the type is in fact an enumerated type (Type::IsEnum returns True) and, if so, dumps the enum's name/value pairs as C++ code, as in Figure 2. The DumpIt function that actually does the work is shown in Figure 5. DumpIt uses Enum::GetUnderlyingType to get the underlying type of the enumeration (usually System.Int32), Enum::GetValues to get the enumeration values, and Convert::ChangeType to convert the enumeration values to their underlying type. Here's the code that prints the name/value pairs:

Type* entype = // managed enum type Array* values = Enum::GetValues(entype); for (int i=0; i<values->Length; i++) { Object* enval = values->GetValue(i); Object* unval = Convert::ChangeType(enval, untype); _tprintf(_T("\t%s = %s,\n"), enval, unval); }

Figure 5 DumpIt.cpp

////////////////// // Dump enumerated Framework type as C/C++ code. // By this point, entype is guaranteed enum type. // void DumpIt(Type* entype, LPCTSTR asmPath) { // Dump name/value pairs as C code Type* untype = Enum::GetUnderlyingType(entype); _tprintf(_T("//////////////////\n")); _tprintf(_T("// Enumeration for %s in %s.\n"), CString(entype->FullName), ShortName(asmPath)); _tprintf(_T("// Underlying type is %s. Automatically " "generated by DumpEnum.\n"), CString(untype->ToString())); _tprintf(_T("//\n")); _tprintf(_T("enum %s {\n"), CString(entype->Name)); Array* values = Enum::GetValues(entype); for (int i=0; i<values->Length; i++) { Object* enval = values->GetValue(i); Object* unval = Convert::ChangeType(enval, untype); _tprintf(_T("\t%s = %s,\n"), CString(enval->ToString()), CString(unval->ToString())); } _tprintf(_T("};\n")); }

So now you have two programs for the price of one: FindType lets you find which DLL/assembly contains a specific type; DumpEnum generates C/C++ code to wrap Framework enums.

Before I move on, there's a footnote to the story. After I wrote FindType, I discovered to my dismay there's already a FindType program that comes with Visual Studio® .NET—it even has the same name! The Microsoft version does the same thing as mine, only better. (Ohmygosh, the Redmondtonians are a step ahead of me; I must be slipping!)

Microsoft FindType has options to specify exact or partial match, which directories and namespaces to search, whether to show methods, properties, events, and so on. Figure 6 shows the help screen listing all the options. FindType is a great program to study if you want to learn more about reflection, assemblies, and types, or if you have nothing better to do on a Friday night. You can find FindType in VS.NET\SDK\v1.1\Samples\Applications\TypeFinder, with projects for C# and Visual Basic. I built the C# version and copied the EXE to my bin directory; I use FindType all the time now to look up types.

Figure 6 FindType from Visual Studio

Does the Redmontonian FindType render my own version totally useless? Of course not. For one thing, CFindType and CEnumTypes are classes, not programs, which means you can use them to write your own tools that need to look up type information, programs like DumpEnum. And my classes come in the best programming language of all: C++. The Microsoft version only comes in Visual Basic® and C#. If you decide to use CFindType within a larger application of your own, you should be aware that CFindType makes no attempt to unload the assemblies as they're enumerated. Since each assembly can consume a fair amount of memory, you could consider changing the implementation to load each assembly into its own application domain, and then unload the application domain when you're done searching the assembly.

Q I'm using the source code from a template-based library. This library includes some specializations of a template function for a specific type. The class template, function template, and template function specialization are all in header files. I #included the headers into my .cpp file and my project compiled and linked. But to use the library in my whole project I #included the headers in stdafx.h. Now I get multiply defined symbol errors for the specialized template functions. How should I organize the header files to avoid multiply defined symbol errors? At the moment, I'm using /FORCE:MULTIPLE, but I would like a more elegant solution.

Q I'm using the source code from a template-based library. This library includes some specializations of a template function for a specific type. The class template, function template, and template function specialization are all in header files. I #included the headers into my .cpp file and my project compiled and linked. But to use the library in my whole project I #included the headers in stdafx.h. Now I get multiply defined symbol errors for the specialized template functions. How should I organize the header files to avoid multiply defined symbol errors? At the moment, I'm using /FORCE:MULTIPLE, but I would like a more elegant solution.

Lee Kyung Jun

A Indeed, there is a more elegant solution. I'll explain in a moment, but first let me review how template function specializations work. Suppose you have a template function that compares two objects based on operator> and operator==:

template <typename T> int compare(T t1, T t2) { return t1==t2 ? 0 : t1 > t2 ? 1 : -1; }

A Indeed, there is a more elegant solution. I'll explain in a moment, but first let me review how template function specializations work. Suppose you have a template function that compares two objects based on operator> and operator==:

template <typename T> int compare(T t1, T t2) { return t1==t2 ? 0 : t1 > t2 ? 1 : -1; }

The function this template generates returns zero or +/-1 depending on whether the first argument is equal to, greater than, or less than the second argument. It's the typical sort of function you need for sorting collections. It assumes the type T has operator== and operator> and works fine for ints, floats, doubles, or DWORDS. But it doesn't work as expected if you use it to compare character strings (char* pointers) because the function generated compares pointers, not the strings themselves:

LPCTSTR s1,s2; ... int cmp = compare(s1,s2); // s1<s2? Oops!

To make compare work for strings, you need a template specialization that uses strcmp or its TCHAR equivalent, _tcscmp:

// specialization for strings template<> int compare<LPCTSTR>(LPCTSTR s1, LPCTSTR s2) { return _tcscmp(s1, s2); }

So far, so good. Now the question is: where should you put this specialization? The obvious place is in the header file with the template. But this can lead to the multiply defined symbol errors that Lee encountered. The reason is clear if you remember that a template specialization is a function, not a template. It's the same as if you'd written:

int compare(LPCTSTR s1, LPCTSTR s2) { return _tcscmp(s1, s2); }

There's no reason you can't define a function in a header file—but if you do, you can't then #include the header in multiple files. At least, not without the linker complaining. So what should you do?

If you hold tight to the concept that a template function specialization is a function, not a template, you'll realize there are three options, exactly the same as for ordinary functions: you can make the specialization inline, extern, or static. For example, take a look at the following:

template<> inline int compare<LPCTSTR>(LPCTSTR s1, LPCTSTR s2) { return _tcscmp(s1, s2); }

This is the easiest and most common solution for most template libraries. Since the compiler expands inline functions directly without generating an external symbol, it's okay to #include them in multiple modules. The linker won't complain because there are no symbols to be multiply defined. For small functions like compare, inline is what you want anyway (it's faster).

But what if your specialization is long, or you don't want to make it inline for some reason? Then you can make it extern. The syntax is the same as for normal functions:

// in .h header file template<> extern int compare<LPCTSTR>(LPCTSTR s1, LPCTSTR s2);

Of course, now you have to implement compare somewhere. Figure 7 shows part of a program I wrote to illustrate the details. I implemented the specialization in a separate module Templ.cpp that's linked with the main project. Templ.h is #included in stdafx.h, which is included in both Templ.cpp and the main module—but the project builds with no link errors. Go ahead, download the source and try it yourself.

Figure 7 Extern Template Specialization

templ.h

#pragma once ////////////////// // Basic function template to compare two types based on operators. // Returns zero or +/-1 depending on whether t1 is equal to, greater // than, or less than t2. // template <typename T> int compare(T t1, T t2) { return t1 == t2 ? 0 : t1 > t2 ? 1 : -1; } ////////////////// // Specialization for LPCTSTR: note only DECLARED here. // Implementation is in Templ.cpp // template<> extern int compare<LPCTSTR>(LPCTSTR s1, LPCTSTR s2);

templ.cpp

#include "stdafx.h" #include "Templ.h" ////////////////// // Implementation for LPCTSTR template specialization. // template<> int compare<LPCTSTR>(LPCTSTR s1, LPCTSTR s2) { return _tcscmp(s1, s2); }

If you're writing a template library for other developers, the extern approach can be a pain because now you have to create a link library (.lib) with the object module that contains the specialization. If you already have a .lib, it's no big deal; if you don't, you might wish for a way to avoid introducing one. It's more elegant (and less trouble) to implement your templates using only header files. The easiest way is to use inline, but you could also put your specializations in a separate header file from their declarations and instruct developers to #include the specializations in only one module. Alternatively, you can keep everything in one file and use a preprocessor symbol to control instantiation:

#ifdef MYLIB_IMPLEMENT_FUNCS template<> int compare<LPCTSTR>(LPCTSTR s1, LPCTSTR s2) { return _tcscmp(s1, s2); } #endif

With this approach, all modules include the header file, but only one #defines MYLIB_IMPLEMENT_FUNCS before including it. This approach won't work with precompiled headers because the compiler loads the precompiled version with whatever value MYLIB_IMPLEMENT_FUNCS had in stdafx.h.

The last and definitely least way to avoid multiply defined symbol errors is to make your specialization static:

template<> static int compare<LPCTSTR>(LPCTSTR s1, LPCTSTR s2) { return _tcscmp(s1, s2); }

This keeps the linker quiet because static functions aren't exported outside their modules, and it lets you keep everything in one header file without introducing preprocessor symbols. But it's inefficient because now every module gets a copy of the function. If the function is small, it doesn't matter—but then why not use inlining?

So the short answer is: make your specialization inline or extern. Usually inline is the way to go. Either way, you have to edit the header file. If this isn't possible because you're using a third-party library and you can't or don't want to edit the source, you have no choice but to use the linker option /FORCE:MULTIPLE. And while you're waiting for your project to build, you can send the folks who wrote the library a nastygram explaining why it's not cool to define function template specializations that aren't inline or extern. Tell them I sent you.

Send your questions and comments for Paul to  cppqa@microsoft.com.

Paul DiLascia is a freelance software consultant and Web/UI designer-at-large. He is the author of Windows++: Writing Reusable Windows Code in C++ (Addison-Wesley, 1992). In his spare time, Paul develops PixieLib, an MFC class library available at www.dilascia.com.