Wrappers

Use Our ManWrap Library to Get the Best of .NET in Native C++ Code

Paul DiLascia

This article discusses:

  • Using managed classes from native C++ without /clr
  • GCHandle, gcroot, and building mixed-mode DLLs
  • Regular expressions in the .NET Framework
This article uses the following technologies:
C++ and the .NET Framework

Code download available at:C.exe(248 KB)

Contents

A Simple Question
ManWrap
Hiding the Managed Stuff
RegexWrap
Coping with Collections
Dealing with Delegates
Handling Exceptions
Wrapping Arrays
Encapsulating Enums
Building Mixed-Mode DLLs
Fun with RegexWrap
Conclusion

The Managed Extensions for C++ make it possible to mix native and managed code freely, even in the same module. Wow, life is good! Compiling with /clr, however, has consequences you may not want. It forces multithreading and dispenses with some useful runtime checks. It interferes with MFC's DEBUG_NEW, and some .NET Framework classes may conflict with your namespace. And what if you have a legacy application that uses an older version of the compiler that doesn't support /clr? Isn't there some way to reach into the Framework without the Managed Extensions? Yes!

In this article, I'll show you how to wrap Framework classes in a native way so you can use them in any C++/MFC app without /clr. As my test case, I'll wrap the regex classes from the .NET Framework in a DLL and implement three MFC programs using it. You can use RegexWrap.dll to add regular expressions to your own C++/MFC applications, or use my ManWrap tools to wrap your own favorite Framework classes.

A Simple Question

It all started when reader Anirban Gupata sent me a simple question: is there some regular expression library he could use in his C++ app? My response was "Sure, there are several, but .NET already has a Regex class. Why not use that?" Regular expressions are so fabulously useful (see the sidebar "Regular Expressions") they're enough to make even the most diehard C++ fan yearn for the .NET Framework. So I wrote a little program called RegexTest to see what Regex can do. Figure 1 shows it running. You enter a regular expression and input string, press a button, and RegexTest displays the Matches, Groups, and Captures. It all happens in a single function called FormatResults(see Figure 2), which formats a big CString when the user presses OK. FormatResults is the only function in RegexTest that calls the Framework, so it's easy to put in its own module compiled with /clr.

Figure 2 FormatResults.cpp

//////////////// // Format match results into MFC CString. // From original version of RegexTest not using wrappers. // This shows the vanilla way to call managed objects using Managed // Extensions. // CString FormatResults(LPCTSTR lpszRegex, LPCTSTR lpszInput) { CString result; CString temp; Regex* r = new Regex(lpszRegex); MatchCollection* mc = r->Matches(lpszInput); int n = mc->Count; temp.Format(_T("Number of Matches: %d\n"), n); result += temp; for (int i=0; i<n; i++) { // for each lpszInput: Match *m = mc->Item[i]; temp.Format(_T("Match %d at %d: %s\n"), i, m->Index, CString(m->Value)); result += temp; // Show groups GroupCollection *gc = m->Groups; for (int j=0; j<gc->Count; j++) { Group *g = gc->Item[j]; if (g->Success) { temp.Format(_T(" Group %d match at %d: %s\n"), j, g->Index, CString(g->Value)); } else { temp.Format(_T(" Group %d failure\n"), j); } result += temp; } // Show captures CaptureCollection *cc = m->Captures; for (j=0; j<cc->Count; j++) { Capture *c = cc->Item[j]; temp.Format(_T(" Capture at %d: %s\n"), c->Index, CString(c->Value)); result += temp; } result += _T("\n"); } // Use Regex to convert all newlines to \r\n for edit control r = new Regex("\n"); result = r->Replace(result,"\r\n"); return result; }

Figure 1 RegexTest

Figure 1** RegexTest **

If my only goal was to write RegexTest, I'd be finished. But writing RegexTest got me thinking. Real apps need to hold their objects longer than the duration of a function call. Suppose I want to store my regular expression in my window class? Good idea, but unfortunately you can't store __gc pointers in unmanaged memory:

class CMyWnd ... { protected: Regex* m_regex; // NOT! };

Instead, you need GCHandle or its templatized cousin, gcroot:

class CMyWnd ... { protected: gcroot<Regex*> m_regex; // swell! };

GCHandle and gcroot are well described in the docs and elsewhere (see "Tips and Tricks to Bolster Your Managed C++ Code" by Tomas Restrepo in the February 2002 MSDN®Magazine), so all I'll say here is that gcroot employs templates and C++ operator overloading, which makes handles look and act like pointers. You can copy, assign, and convert; and gcroot's dereference operator-> lets you use pointer syntax to invoke your managed object:

m_regex = new Regex("a+"); Match* m = m_regex->Match("S[aeiou]x");

Managed objects, C++, and you—all together in one happy family. What's not to love?

The only possible complaint in this situation is that to use gcroot, you need /clr. How else can the compiler even know what gcroot/GCHandle are? This doesn't mean your code must be managed; you can use #pragma unmanaged to generate native code. But as I mentioned earlier, using /clr has repercussions. It forces multithreading (which breaks some shell functions like ShellExecute), and it's incompatible with options like /RTC1, which generates useful runtime checks for stack errors and buffer under- and overruns (see John Robbins' August 2001 Bugslayer column). If you use MFC, you've probably already had problems with /clr and DEBUG_NEW. You can also have namespace collisions with functions like MessageBox, which exists in the .NET Framework, MFC, and the Windows® API.

In my January column, I showed how to create a project where only one module uses /clr. That works fine when your Framework calls are localized to a few functions like FormatResults that you can put in a separate file, but it fails if you have broadly used classes with gcroot members, because many more modules #include your class. So you flip the /clr switch and—whoosh!—your whole app is beamed to Managedland. It's not that /clr is so terrible, it's that there are times when you may prefer to stay native. Can't you have your Framework classes and native compilation too? Yes, but you need a wrapper.

ManWrap

ManWrap is a set of tools I built to wrap managed objects in native C++ classes. The idea is to create classes that use the Managed Extensions internally to call the Framework, but expose a purely native interface to the outside world. Figure 3 illustrates this approach. You need the Managed Extensions to build the wrapper itself, and apps using the wrappers need the Framework to run, but the apps themselves compile in native mode without /clr. How does ManWrap accomplish this feat? The best way I can explain is to simply begin blithely wrapping and see what happens. Since every .NET object derives from Object, I'll start there:

class CMObject { protected: gcroot<Object*> m_handle; };

Figure 3 ManWrap

Figure 3** ManWrap **

CMObject is a native C++ class that holds a handle to a managed object. To do anything useful, however, I need some standard constructors and operators, and while I'm at it, I'll wrap Object::ToString, which will come in handy. Figure 4 shows my first stab. CMObject has three constructors: a default constructor, a copy constructor, and a constructor from Object*. There's also an assignment operator from CMObject and a ThisObject method that returns the underlying Object, which the dereference operator-> uses. These are the minimal methods every wrapper class needs. The wrapper methods themselves (in this case, ToString) are trivial:

CString CMObject::ToString() const { return (*this)->ToString(); }

Figure 4 CMObject—First Pass

////////////////// // First-attempt implementation for CMObject, base for all wrappers. // Class declaration — .h file // class CMObject { protected: gcroot<Object*> m_handle; // handle to managed object public: // fns whose signatures use managed types CMObject(Object* o) : m_handle(o) { } Object* ThisObject() const { return (Object*)m_handle; } Object* operator->() const { return ThisObject(); } // fns whose signatures use native types CMObject() { } CMObject(const CMObject& o) : m_handle(o.m_handle) { } CMObject& operator=(const CMObject& r) { m_handle = r.m_handle; // copies underlying GCHandle.Target return *this; } // wrapped methods/properties CString ToString() const { return (*this)->ToString(); } };

It's only one line of code, but there's more happening than meets the eye: (*this)-> invokes gcroot's dereference operator-> which casts the underlying GCHandle.Target to an Object*, whose ToString method returns a managed String. The Managed Extensions and IJW (It Just Works) interop magically convert the string to LPCTSTR, which the compiler uses to construct a CString on the stack automatically, because CString has a constructor-from-LPCTSTR. Isn't C++ a really amazing language?

So far, CMObject is pretty useless because all I can do is create null objects and copy them. How much good is that? But CMObject isn't designed for the single life; it's designed to serve as the base for more wrappers. Let's try another class. The Framework's Capture class is a very simple class that represents a single subexpression match in a regular expression. It has three properties: Index, Value, and Length. In order to wrap it, I'll do the obvious thing: Capture derives from object, so I'll derive CMCapture from CMObject, as shown here:

class CMCapture : public CMObject { // now what? };

CMCapture inherits m_handle from CMObject, but m_handle is gcroot<Object*>, not gcroot<Capture*>. So, do I need a new handle? No. Capture derives from Object, so a gcroot<Object*> handle can hold a Capture object too.

class CMCapture : public CMObject { public: // call base ctor to initializes m_handle CMCapture(Capture* c) : CMObject(c) { } };

CMCapture needs all the same constructors and operators as CMObject, and I have to override ThisObject and operator-> to return the new type.

Capture* ThisObject() const { return static_cast<Capture*>((Object*)m_handle); }

The static_cast is safe because my interface guarantees that the underlying object can only be a Capture object. Wrapping the new properties is again trivial. For example:

int CMCapture::Index() const { return (*this)->Index; }

Hiding the Managed Stuff

So far, so good. I have what seems like a brainless way to wrap managed objects in C++. But my C++ classes still require /clr to compile. My whole goal is to build native wrappers so apps using them don't need /clr. To get rid of the /clr requirement I have to hide all the managed stuff from native clients. For example, I have to hide the gcroot handle itself because native code doesn't know what a GCHandle is. What to do?

I once had a math professor who said every proof is either a bad joke or a cheap trick. What I'm about to describe is definitely in the cheap trick category. The key to ManWrap is the special predefined compiler symbol _MANAGED, which is 1 when compiling with /clr and otherwise undefined. _MANAGED makes hiding the handle trivial, as shown here:

#ifdef _MANAGED # define GCHANDLE(T) gcroot<T> #else # define GCHANDLE(T) intptr_t #endif

Now I can revise CMObject like so:

class CMObject { protected: GCHANDLE(Object*) m_handle; ••• };

Modules compiling with /clr (namely, the wrapper itself) see a gcroot<T> handle. C++ apps not using /clr see only a (possibly 64-bit) raw integer. Pretty clever, eh? I told you it was a cheap trick! If you're wondering why intptr_t, it's because gcroot's one-and-only data member, its GCHandle, is specifically designed to operate as an integer, with op_Explicit conversions to and from IntPtr. intptr_t is merely the C++ equivalent of IntPtr, so whichever way you compile CMObject (either native or managed), it has the same size in memory.

Size is one thing, but there's more to going native than just size. What about the other managed stuff, like the methods labeled "signatures use managed types" in Figure 4? I can use _MANAGED to hide them too:

#ifdef _MANAGED // managed-type methods here #endif

By "managed-type methods" I mean methods whose signatures use managed types. Enclosing them in #ifdefs makes them invisible to native clients. In Nativeland, these functions don't exist. These are things like a constructor that takes an argument of type X, where X is managed, and operator-> that native code has no use for, nor ability to comprehend or compile. I need these methods only inside the wrapper itself—which compiles with /clr.

So I've hidden the handle and all the "managed-type" functions. Is there anything else? What about the copy constructor and operator=? Their signatures use native types, but their implementations access m_handle:

class CMObject { public: CMObject(const CMObject& o) : m_handle(o.m_handle) { } };

Suppose I have a CMObject obj1 and I write CMObject obj2=obj1. The compiler invokes my copy constructor. This works fine in managed code where m_handle is gcroot<Object*>, but in native code m_handle is intptr_t, so the compiler copies the raw integer. Oops! You can't copy a GCHandle as if it were an integer. You have to go through proper channels by reassigning the GCHandle's Target or letting gcroot do it for you. The problem is my copy constructor is defined inline. All I have to do is make it a true function and move the implementation to the .cpp file:

// in ManWrap.h class CMObject { public: CMObject(const CMObject& o); }; // in ManWrap.cpp CMObject::CMObject(const CMObject& o) : m_handle(o.m_handle) { }

Now when the compiler invokes the copy constructor, it calls into ManWrap.cpp, where everything executes in managed mode and treats m_handle like the gcroot<Object*> it truly is, not the lowly intptr_t native clients see; gcroot sets the GCHandle's Target. The same applies to the operator= and the wrapper functions themselves, like CMObject::ToString or CMCapture::Index. Any member function that accesses m_handle must be a true function, not inline. The function call is the price you pay for going fully native. (Life is tough, I know.) But you can't have everything and the overhead is insignificant in all but the most performance-critical situations. If you need to manipulate 17 trillion objects in real time, by all means, don't use wrappers! But if all you want is to access a few .NET Framework classes without /clr, the function call is relatively insignificant.

Figure 5 shows the final CMObject from ManWrap. Once you understand how CMObject works, creating new wrappers is merely a matter of cloning the process: derive from CMObject, add the standard constructors and operators, use _MANAGED to hide the ones that use managed types, and implement the rest as true functions. The only difference with derived objects is you can leave the copy constructor and operator= inline because they can invoke their base class without accessing m_handle directly:

class CMCapture : public CMObject { public: CMCapture(const CMCapture& o) : CMObject(o) { } };

Figure 5 CMObject

CMObject.h

////////////////// // Wrapper for Object, base of .NET hierarchy. // CMObject doesn't use DECLARE_WRAPPER because it has no base class. // class CMObject { protected: GCHANDLE(Object*) m_handle; public: #ifdef _MANAGED // visible to managed clients only: anything that deals with __gc // objects CMObject(Object* o) : m_handle(o) { } Object* ThisObject() const { return (Object*)m_handle; } Object* operator->() const { return ThisObject(); } #endif // visible to all clients CMObject(); CMObject(const CMObject& o); BOOL operator==(const CMObject& r) const; BOOL operator!=(const CMObject& r) const { return ! operator==(r); } CMObject& operator=(const CMObject& r); CString ToString() const; CString TypeName() const; };

CMObject.cpp

//////////////// // CMObject implementation, from ManWrap.cpp // If this code works, it was written by Paul DiLascia. // If not, I don't know who wrote it. // Compiles with Visual Studio .NET 2003 on Windows XP. Tab size=3. // CMObject::CMObject() { // default ctor must be here and not inline, so m_handle is // appropriately initialized to a NULL GCHandle. Otherwise it will // have unpredictable values since native code sees it as intptr_t } CMObject::CMObject(const CMObject& o) : m_handle(o.m_handle) { // See remarks above } CMObject& CMObject::operator=(const CMObject& r) { m_handle = r.m_handle; // copies underlying GCHandle.Target return *this; } BOOL CMObject::operator==(const CMObject& r) const { return (*this)->Equals(r.ThisObject()); } CString CMObject::ToString() const { return (*this)->ToString(); } CString CMObject::TypeName() const { return (*this)->GetType()->Name; }

CMCapture's copy constructor can be inline because it simply passes its native argument to CMObject. You have to pay the function toll when you construct an object, but at least you don't have to pay it twice.

Following the rules I've sketched, writing wrappers is so easy you can do it in your sleep. Or better yet, write some macros to streamline the process as I did for ManWrap. Here's the final CMCapture, from RexexWrap.h:

class CMCapture : public CMObject { DECLARE_WRAPPER(Capture, Object); public: // wrapped properties/methods int Index() const; int Length() const; CString Value() const; };

It uses the macro DECLARE_WRAPPER defined in ManWrap.h, to save typing. There's another macro IMPLEMENT_WRAPPER that makes the implementation equally trivial (see the code download). The macros declare and implement all the basic constructors and operators I've described. In case you hadn't noticed, the names are designed to feel familiar to MFC programmers. DECLARE/IMPLEMENT_WRAPPER assumes you follow my naming convention: CMFoo is the native wrapper for managed Foo objects. (I'd have used CFoo, but that collides with the MFC CObject for Object, so I added the M in CM for Managed.) Figure 6 shows DECLARE_WRAPPER. IMPLEMENT_WRAPPER is similar. You can download and ponder it for homework.

Figure 6 declare_wrapper.h

// DECLARE_WRAPPER macro from ManWrap.h #ifdef _MANAGED // This declares the managed part of DECLARE_WRAPPER. For internal use // only. Use DECLARE_WRAPPER. // #define DECLARE_WRAPPER_M(MT,BT) \ public: \ CM##MT(MT* o) : CM##BT(o) { } \ MT* ThisObject() const \ { \ return static_cast<MT*>((Object*)m_handle); \ } \ MT* operator->() const \ { \ return ThisObject(); \ } \ CM##MT& operator=(MT* o) \ { \ m_handle = o; \ return *this; \ } \ #else // NOT _MANAGED // native code doesn't see the managed stuff: macro does nothing #define DECLARE_WRAPPER_M(MT,BT) #endif // _MANAGED ////////////////// // Use this to declare a wrapper class. Declares the basic ctors and // operators needed for type-safeness usage. MT=managed type, BT=base // (managed) type. You must follow the naming convention CMFoo = wrapper // for Foo. // #define DECLARE_WRAPPER(MT,BT) \ public: \ CM##MT() { } \ CM##MT(const CM##MT& o) : CM##BT(o) { } \ CM##MT& operator=(const CM##MT &r) \ { \ CM##BT::operator=(r); \ return *this; \ } \ DECLARE_WRAPPER_M(MT,BT) \

Astute readers may have noticed something odd. The only constructors I've written to this point are default constructors, copy constructors, and constructors from pointer-to-managed-type. The last is hidden from native code, so it seems all native clients can do is create Null objects and copy them. What good is that? The lack of constructors is simply an unfortunate result of my choice of classes. You never create an Object by itself, and Capture objects only come from other objects like Match or Group. But Regex has a real constructor that takes a String, so CMRegex wraps it like so:

// in RegexWrap.h class CMRegex : public CMObject { DECLARE_WRAPPER(Regex,Object); public: CMRegex(LPCTSTR s); }; // in RegexWrap.cpp CMRegex::CMRegex(LPCTSTR s) : CMObject(new Regex(s)) { }

Here again the constructor must be a true function because it calls "new Regex," which requires the Managed Extensions and /clr. In general, DECLARE/IMPLEMENT_WRAPPER only declare and implement the canonical constructors and operators you need to manipulate wrapper objects in type-safe fashion. If the class you're wrapping has "real" constructors, you have to wrap them yourself. DECLARE_WRAPPER is cool, but it's not clairvoyant.

If you're wrapping a method that returns a managed object of some other type, then you have to wrap that type too, because you obviously can't return the managed object directly to native code. For example, Regex::Match returns a Match*, so wrapping Regex::Match requires wrapping Match:

CMMatch CMRegex::Match(LPCTSTR input) { return CMMatch((*this)->Match(input)); }

This is where the constructor from pointer-to-managed-type comes in. Just as the compiler automatically converts the String from Object::ToString into a CString, here it automatically converts the Match* from Regex::Match into a CMMatch object, because CMMatch has the proper constructor (automatically defined by DECLARE/IMPLEMENT_WRAPPER). So while native code can't see the construct-from-pointer-to-managed-type ctors, they're indispensable for writing wrappers.

Regular Expressions

Some newbies out there may not be fully familiar with regular expressions, so a brief review seems in order. If you're a regular expressions savant, feel free to skim.

A regular expression is a string that describes a family of strings. For example, the regular expression "Mic*" describes all strings that contain "Mic" followed by zero or more characters. Mickey, Microsoft, Michelangelo, or just Mic are all examples. Dot (a period) matches any character; + is like * but requires at least one character, so "Mic.+" matches all the previous examples except "Mic". [a-z] matches a range, so [a-zA-Z_0-9] matches a letter, digit, or underbar. Regex calls these word characters, which you can write as \w. So "\w+" matches sequences of word characters with at least one letter—in other words, C tokens. Well, almost. C tokens can't start with a digit, so here's a regular expression that works: "^[a-zA-Z_]\w*$". The special character ^ means "begins with" (unless it comes inside a range, in which case it means NOT) and $ means "ends", so "^[a-zA-Z_]\w*$" means: a string of letters, digits, or underbars that begins with a letter or underbar.

Regexes are extremely useful for input validation. \d matches digits and {n} matches n repetitions of something, so ^5\d{15}$ matches 16-digit numbers that begin with 5, in other words, MasterCard numbers. And, ^[45]\d{15}$ adds Visa cards too, which begin with 4. You can use parens to group expressions, so here's a test. What does the following regular expression describe?

^\d{5}(-\d{4}){0,1}$

Hint: {0,1} matches 0-1 repetitions (which you can abbreviate as ?). Have you figured it out yet? The expression reads: five digits followed by zero or one repetitions of: (a dash followed by four digits). This matches 02142 and 98007-4235, but not 3245 or 2345-98761. In other words, US ZIP codes. The parens group the ZIP+4 part so the {0-1} modifier applies to the whole group.

I've only scratched the surface of what regular expressions can do. I haven't mentioned replace, and I don't dare describe what happens in Unicode because I have no clue. But you can get a sense of how powerful regular expressions are. They've been a mainstay of UNIX for ages, and made a big comeback with Web programming and languages like Perl, where serving HTML is almost entirely an exercise in text manipulation. Not until the .NET Framework have regular expressions become a full-fledged Windows citizen. Better late than never!

Framework Regex

The .NET Framework implements regular expressions with a Regex class and three supporting classes: Match, Group, and Capture (see Figure A). Typically, you create a Regex and call Regex::Match with your input string to get the first Match, or Regex::Matches to get them all together:

Regex *r = new Regex("\b\w+\b"); MatchCollection* mc = r->Matches("abc ,_foo ,<& mumble7"); for (int i=0; i<mc->Count; i++) { Match *m = mc->Index(i); Console.WriteLine(m->Value); }

This should display "abc", "_foo", and "mumble7", each on a line. This example introduces another special character, \b, called an anchor or atomic zero-width assertion, just like ^ (begin) and $ (end). \b designates a word boundary, so "\b\w+\b" means one or more word characters separated by word breaks.

Figure A Regex Classes

Figure A** Regex Classes **

Each parenthesized expression in a regular expression constitutes a Group. Regex::Groups returns all the Groups as a collection, which is never empty because the entire regular expression itself is a group. Groups are important because they let you do logical OR matches like "(ying|yang)", they let you apply quantifiers to subexpressions, and they let you extract individual parts of matches. Figure 1 in the main article shows my RegexTest program running with the zip example to reveal the groups.

The most powerful function of all is Regex::Replace, which makes regular expressions amazingly powerful. Like many developers, I've had to manually convert "\n" to "\r\n" before passing strings to a multiline edit control too many times, but with Regex::Replace, it's trivial:

s = Regex::Replace(s,"\n","\r\n");

Regex::Match and Replace have static overloads so you can do quick regular expressions without creating objects. One of my favorite Regex::Replace overloads is one that takes a delegate that lets you compute replacement text on the fly using procedural code—see the main article for a fun example of this.

Some words of caution: every regular expressions implementation is slightly different. For example, the Perl I'm familiar with allows {,1} as shorthand for {0,1}, but the Redmontonians don't do it that way. Just beware of minor differences. The definitive .NET Regex reference is "Regular Expressions as a Language" in the MSDN Library.

RegexWrap

In honor of MSDN Magazine's 20th anniversary, now that I've explained ManWrap, it's time to do some wrapping! I used ManWrap to wrap the .NET Regex class in a DLL called RegexWrap.dll. Figure 7 shows the abridged header file. I won't explain all the details since they're mostly trivial, but here's a typical wrapper:

CString CMRegex::Replace(LPCTSTR input, LPCTSTR replace) { return (*this)->Replace(input, replace); }

In practically every case, the implementation is one line: just call the underlying managed method and let the compiler convert the parameters. Isn't interop lots of fun? This works even when the parameter is another wrapped class, as I have already explained for CMRegex::Match.

Figure 7 RegexWrap.h

//////////////////////////////////////////////////////////////// // MSDN Magazine — February 2004 // If this code works, it was written by Paul DiLascia. // If not, I don't know who wrote it. // Compiles with Visual Studio .NET 2003 on Windows XP. Tab size=3. // // This file declares all the wrappers for RegexWrap. The implementation // is in RegexWrap.cpp // #pragma once #include "ManWrap.h" #ifdef _MANAGED using namespace System::Runtime::InteropServices; using namespace System::Text::RegularExpressions; #endif ////////////////// // You MUST instantiate one of these somewhere in your app // BEFORE you call any RegexWrap DLL functions, to initialize // RegexWrap.dll. DLLs that use Managed Extensions and MFC/ATL require // special initialization because they don't use the standard startup // code that calls DllMain. The best thing to do is create a static // instance somewhere (outside function scope), or in your CWinApp- // derived application class (MFC). // class WREXPORT CRegexWrapInit { public: CRegexWrapInit(); ~CRegexWrapInit(); }; ////////////////// // Wrapper for .NET Capture class // class WREXPORT CMCapture : public CMObject { DECLARE_WRAPPER(Capture, Object); public: // wrapped properties/methods int Index() const; int Length() const; CString Value() const; }; ////////////////// // Wrapper for Group. // class WREXPORT CMGroup : public CMCapture { DECLARE_WRAPPER(Group, Capture); public: // wrapped properties/methods bool Success() const; CMCaptureCollection Captures() const; }; ////////////////// // Wrapper for Match. // class WREXPORT CMMatch : public CMGroup { DECLARE_WRAPPER(Match, Group); public: // wrapped properties/methods CMMatch NextMatch() const; CString Result(CString sReplace) const; CMGroupCollection Groups() const; static const CMMatch Empty; // constant empty match typedef CString (CALLBACK* evaluator)(const CMMatch&, void* param); }; ////////////////// // Wrapper for Regex. This one has all the good stuff. // class WREXPORT CMRegex : public CMObject { DECLARE_WRAPPER(Regex,Object); public: enum Options { None = 0, . . }; CMRegex(LPCTSTR s); CMRegex(LPCTSTR s, Options opt); CMMatch Match(LPCTSTR input); static CMMatch Match(LPCTSTR input, LPCTSTR pattern); . . // lots more . };

Of course, nothing is ever totally trivial. I did encounter some bumps and snags along the way to RegexWrap: collections, delegates, exceptions, arrays, and enums. I'll describe how I dealt with each one in turn.

Coping with Collections

Collections are ubiquitous throughout the Framework. For example, Regex::Matches returns all the matches as a MatchCollection and Match::Groups returns all the Groups as a GroupCollection. My first thought on how to deal with collections was to convert them to STL containers of wrapped objects. On second thought, I realized this was a bad idea. Why create a new set of handles that point to objects that are already in the collection? While .NET Collections resemble STL containers in some respects, they're not exactly the same. For example, you can access a GroupCollection by integer index or by string name.

Rather than using an STL vector or map, a simpler approach is to use the system I've already built, namely, ManWrap. Figure 8 shows how I wrapped GroupCollection. It's what you'd expect, only now there's a new macro, DECLARE_COLLECTION, that does the same thing as DECLARE_WRAPPER, but adds three methods that all collections have: Count, IsReadOnly, and IsSynchronized. Naturally, there's IMPLEMENT_COLLECTION to implement these. Since GroupCollection lets you index by integer or string, the wrapper has two operator[] overloads.

Figure 8 GroupCollection.cpp

////////////////// // ManWrap wrapper for GroupCollection. // // In header, RegexWrap.h class WREXPORT CMGroupCollection : public CMObject { DECLARE_COLLECTION(GroupCollection, Object); public: CMGroup operator[](int i); CMGroup operator[](LPCTSTR name); }; // In module, RegexWrap.cpp IMPLEMENT_COLLECTION(GroupCollection, Object) CMGroup CMGroupCollection::operator[](int i) { return (*this)->Item[i]; } CMGroup CMGroupCollection::operator[](LPCTSTR name) { return (*this)->Item[name]; }

Once I wrap Match, Group, and CaptureCollections, I can wrap the methods that use them. Regex::Matches returns a MatchCollection, so here's the wrapper:

CMMatchCollection CMRegex::Matches(LPCTSTR input) { return (*this)->Matches(input); }

CMMatch::Groups and CMGroup::Captures are entirely similar. Once again, the compiler silently performs all the type conversions. I love C++ and interop!

Dealing with Delegates

One of the most significant innovations in programming history is the concept of callbacks. They let you call a function that isn't known until runtime. Callbacks provide the basis for virtual functions and all forms of event programming. But in the managed world, one doesn't say "callback," one says "delegate." For example, one form of Regex::Replace lets you pass a MatchEvaluator:

MatchEvaluator* delg = // create one String *s = Regex::Replace("\\b\\w+\\b", "Modify me.", delg);

Regex::Replace calls your MatchEvaluator delegate with each successive Match. Your delegate returns the replacement text. Later, I'll show you a cute example using MatchEvaluator. Right now, I'm focused on wrapping it. The Framework speaks delegates whereas C++ speaks callbacks. To make them talk, I first need a typedef:

class CMMatch ... { public: typedef CString (CALLBACK* evaluator)(const CMMatch&, void* param); };

CMMatch::evaluator is a pointer-to-function that takes a CMMatch and void* param, and returns CString. Putting the typedef inside CMMatch is a matter of style more than anything else, but it does avoid cluttering the global namespace. The void* param provides a way for native callers to pass their state. Delegates always have an associated object (which can be null if the method is static), but in C/C++ all you get is a function pointer, so callback interfaces usually add a void* that gives you a way to pass state information. Such is the lowly life of C. With my new typedef and these remarks in mind, I can declare CMRegex::Replace, like so:

class CMRegex ... { public: static CString Replace(LPCTSTR input, LPCTSTR pattern, CMMatch::evaluator me, void* param); };

My wrapper resembles the real Replace method (both are static), with an extra void* param. But how do I implement it?

CString CMRegex::Replace(...) { MatchEvaluator delg = // how to create? return Regex::Replace(..., delg); }

To create the MatchEvaluator, I need a __gc class with a method that calls the caller's native callback with the caller's void*. I wrote a little managed class called WrapMatchEvaluator that does just that (you can download the code for details). To save typing, WrapMatchEvaluator has a static Create function that returns a new MatchEvaluator, so CMRegex::Replace is still just one line:

CString CMRegex::Replace(LPCTSTR input, LPCTSTR pattern, CMMatch::evaluator me, void* lp) { return Regex::Replace(input, pattern, WrapMatchEvaluator::Create(me, lp)); }

Well, it's one line in the source. Here I had to split it to fit in the printed magazine. Since native code has no use for WrapMatchEvaluator (it's a __gc class), the implementation is inside RegexWrap.cpp, not the header.

Handling Exceptions

Sooner or later, the .NET Framework is going to complain about something you did. Rude, I know, but if you pass Regex a bad expression, what do you expect? Native code can't handle managed exceptions, so I've got to do something. Dumping the user into the CLR debugger certainly won't win me any brilliance points so I'll just wrap the Exceptions too. I'll catch them at the border and make them don their wrappers before escaping into Nativeland. Trap-and-wrap is tedious but necessary. Regex's constructor can throw an exception, so I need to revise my wrapper, as shown here:

Regex* NewRegex(LPCTSTR s) { try { return new Regex(s); } catch (ArgumentException* e) { throw CMArgumentException(e); } catch (Exception* e) { throw CMException(e); } } CMRegex::CMRegex(LPCTSTR s) : CMObject(NewRegex(s)) { }

The basic formula involves trapping the exception inside the wrapper and re-throwing it in wrapped form. The reason for introducing NewRegex is so I can use initializer syntax instead of assigning m_handle inside the constructor (which would be less efficient because it sets m_handle twice). Once I trap-and-wrap Exceptions, native code can handle them the native way. Here's how RegexTest handles it when the user types a bad expression:

// in FormatResults try { // create CMRegex, get matches, build string } catch (const CMException& e) { result.Format(_T("OOPS! %s\n"), e.ToString()); MessageBeep(0); return result; }

One thing to consider when wrapping exceptions is whether you need to wrap every kind of exception thrown. For Regex, there's only ArgumentException, but .NET has dozens of exception types. Which ones you wrap and how many catch blocks you add depends on how much information your app needs. Whatever you do, be sure to always trap the base exception class in the last catch block, so nothing slips out to crash your app.

Wrapping Arrays

I've dealt with collections, delegates, and exceptions. Now it's time for arrays. Regex::GetGroupNumbers returns an array of ints whereas Regex::GetGroupNames returns an array of Strings. I have to convert managed arrays to native types before passing them to Nativeland. C-style arrays are one option, but there's just no reason to use C-style arrays when there's STL. ManWrap has a template function that converts a managed array of Foo objects into an STL vector of CMFoo. CMRegex::GetGroupNames uses it, as you can see here:

vector<CString> CMRegex::GetGroupNames() { return wrap_array<CString,String>((*this)->GetGroupNames()); }

Again, the wrapper is still one line. There's another wrap_array to convert int arrays, because the compiler needs a __gc declarator to tell the difference between native and managed int arrays. I leave you to download and ponder the source for homework.

Encapsulating Enums

Finally, we come to enums, the last problem child in the RegexWrap family. Not a problem really, just a typing headache. Some Regex methods allow RegexOptions to control what you want to do. For example, if you want to ignore case, you can call Regex::Match with RegexOptions::IgnoreCase. To let native apps access these options, I defined my own native enum with the same names and values as RegexOptions, as you can see in Figure 7. In order to save typing and eliminate errors, I wrote a little utility called DumpEnum, which spits out C code for any .NET Framework enum class.

Building Mixed-Mode DLLs

With all the programming problems solved, the last step is to package RegexWrap as a DLL. You have to do the usual __declspec(dllexport) or __declspec(dllimport) on all your classes (which I simplify with macros), and you have to play some tricks to build a managed DLL. Managed DLLs require special initialization because they can't use the normal DllMain startup code and they require /NOENTRY and manual initialization. See my February 2005 C++ At Work column for details. The bottom line for RegexWrap is that to use RegexWrap.dll, you must instantiate a special DLL-initialization class somewhere at global scope as shown in the following line of code:

// somewhere in your app CRegexWrapInit libinit;

I also encountered a minor problem debugging. To debug wrapper DLLs from your native app, you need to set "Debugger Type" to "Mixed" in your project's Debug settings. The default (Auto) determines which debugger to load by looking at the EXE. With ManWrap apps, the EXE is native so the IDE uses the native debugger and you can't step into managed code. If you select Debugger Type=Mixed, the IDE loads both debuggers.

Figure 9 File and Module Relationships

Figure 9** File and Module Relationships **

Once you overcome these little nuisances, RegexWrap works like any other C++ DLL. Clients #include the header file and link the import library. Naturally, you need RegexWrap.dll in your PATH and the .NET Framework installed to run. Figure 9 shows the relationships among the files and modules in a typical client application like RegexTest.

Fun with RegexWrap

With Regex finally wrapped, it's time for some fun! The whole reason I wrote RegexWrap was to bring the power of regular expressions to native MFC apps.

The first thing I did was port my original mixed-mode RegexTest program with its managed FormatResults function to a purely native version using RegexWrap. Every pointer to Regex, Match, Group, and Capture is now a CMRegex, CMMatch, CMGroup, or CMCapture object. The same applies for collections (you'll need to download the code for the details). The important thing is that now RegexTest is fully native, there's not a /clr to be found in any project or make file. If you're new to regular expressions, RegexTest is a great way to explore them.

My next sample is a fun program that turns English into semi-readable gibberish. Linguists have long observed the following curious phenomenon: if you scramble the letters in the middle of each word in a sentence, but preserve the letters at the beginning and end, the result is more readable than you might expect. Apparently our brains read by scanning the beginning and end of words and fill in the rest. I used RegexWrap to implement a WordMess program that demonstrates this phenomenon. Type a sentence and WordMess scrambles it as described. Figure 10 shows it running. Here's what WordMess did to the first sentence in this paragraph: my nxet sapmle is a fun prgaorm taht tnurs Ensiglh itno smei-reabldae gibbiserh.

Figure 10 WordMess

Figure 10** WordMess **

WordMess uses the MatchEvaluator delegate form of Regex::Replace (through its wrapper, of course):

// in CMainDlg::OnOK static CMRegex MyRegex(_T("\\b[a-zA-Z]+\\b")); CString report; m_sResult = MyRegex.Replace(m_sInput, &Scrambler, &report);

MyRegex is a static CMRegex object that matches words, that is, sequences of one or more letters (no digits) surrounded by word breaks. (The hardest part of writing regular expressions in C++ is remembering to type two backslashes every time you want one.) So CMRegex::Replace calls my Scrambler function once for each word in the input. Figure 11 shows the Scrambler function. Look how easy it is with STL strings and the swap and random_shuffle algorithms. This would have been tons of code without STL. Scrambler uses its void* param as a CString to which it appends each substitution it makes. WordMess adds the report to its results, as Figure 10 shows. Amzanig!

Figure 11 WordMess Scrambler Function

////////////////// // Match evaluator callback: This function receives a match and returns // the text to replace it with. The algorithm is: scramble the middle of // the word. // // - Words with fewer than four letters are unchanged. // - Four letter words have their middle two letters swapped. // - Five letter words preserve their first and last letters. // - Longer words preserve the second letter as well. // // Look how easy this is with STL algorithms!!—swap and random_shuffle. // Scrambler reports its activity in a CString passed as the void* param. // CString CALLBACK Scrambler(const CMMatch& m, void* param) { tstring word = m.Value(); // STL string easier to manipulate than CString size_t len = word.size(); if (len>=4) { if (len==4) { swap(word[1],word[2]); // use STL swap algorithm! } else { // STL shuffle algorithm! random_shuffle(word.begin()+(len <=5 ? 1 : 2), word.end()-1); } } if (param) { CString temp; temp.Format(_T("Scramble: '%s' -> '%s'\n"), m.Value(), word.c_str()); (*(CString*)param) += temp; } return word.c_str(); }

For my final application, I chose something more serious and practical, a program called RegexForm that validates various kinds of input: ZIP code, Social Security number, telephone number, and C tokens. See this month's C++ at Work column for a discussion of RegexForm.

Conclusion

Well, that's a wrap! I hope you've enjoyed wrapping Regex with me, and I hope you'll find ways to use ManWrap to wrap other Framework classes you want to call from native code. ManWrap isn't for everyone: you only need it if you want to call the .NET Framework while keeping your code native. Otherwise, just use /clr and call the Framework directly.

Paul DiLascia is a freelance writer, consultant, and Web/UI designer-at-large. He is the author of Windows++: Writing Reusable Windows Code in C++ (Addison-Wesley, 1992). Paul can be reached at www.dilascia.com.