Netting C++
The .NET Wrap
Stanley B. Lippman
Code download available at:
NettingC2006_06.exe
(165 KB)
Browse the Code Online

Contents
This month, we are changing the column name from Pure C++ to Netting C++ to better reflect our focus on Microsoft® .NET programming using C++/CLI, the .NET extensions to Visual C++® that are supported in Visual Studio® 2005. In this first Netting C++ column, and in a number of upcoming ones, I'll look at moving a working native C++ application onto .NET through a series of incremental transformations using the C++/CLI language extensions of Visual C++ 2005. The application is a Text Query Language (TQL) for processing natural language texts, first developed back in 1996 for the third edition of my C++ Primer. It serves as a good illustration of object-oriented design (OOD), using the Standard Template Library (STL), and string manipulation in general. It should make a good vehicle for drilling down into .NET-based programming.
Here's the scenario: my company has a working native C++ application, TQL, that is the foundation of the business. We at the company realize it needs to be extended to make it Web-sensitive and to provide XML support. Moreover, it's currently a console application (it began life as a UNIX command-line application), and we want to front it with a cutting-edge GUI, as well as add more functionality that users have requested.
However, before doing all this neat stuff, the application must be brought over to .NET. Because the application works, and by some accounts works well, I don't necessarily want to change it, at least not without some profit from the change. So, the first step in netting TQL is to get it to compile as a .NET executable. That's the focus of the first section, "Compiling Native Code to .NET."
Though the process I describe enables TQL to execute under .NET, it doesn't use any of the .NET runtime services or provide interoperability with any other .NET-compliant languages. I'll get to this in the second section, "Wrapping Native Code to Publish within .NET," where wrapping TQL with a C++/CLI proxy class will provide the necessary elbow room for adding to and replacing pieces of the existing TQL application.
Subsequent columns will focus on mapping the native types to types supported by the .NET Common Type System (CTS) and examining the performance characteristics of the application as it transitions. We'll also look at the type information available to the runtime, using the ildasm command to explore the Common Intermediate Language (CIL) into which all .NET-based languages are compiled.
When that's done, we'll explore multithreading, Web services, cross-language interoperability with a C# ASP.NET front, XML support, and integrating with Windows Vista™. So, we have our work cut out for us. (Note that all the development work for this column is done through the Visual C++ IDE. I won't directly address compiler switches, therefore, but speak in terms of Projects.)
Compiling Native Code to .NET
The ultimate goal is to get the native code compiled as a .NET executable. To start out, I selected a Visual C++ | CLR | CLR Console Application from the New Project wizard, which generated the following code shell:
// Project wizard generated code
#include "stdafx.h"
using namespace System;
int main()
{
Console::WriteLine(L"Hello World");
return 0;
}
(I simplified it slightly by removing the array access to the command-line arguments as I'm not yet ready to use this.)
The TQL application has a tripartite design:
- A Query class hierarchy to support user-directed Boolean queries, such as AND (&&), OR (||), and NOT(!), against user-specified text. (You'll see an example of a user query session in Figure 1.)
- A UserQuery class that handles all aspects of the interactive user text query session.
- A TextQuery class that handles all processing of the actual text, from reading in the file(s) to building up an internal normalized structure against which to search to displaying the results.

Figure 1 TQL Query
please enter file name: alice_emma
Enter a query -- please separate each item by a space.
Terminate query (or session) with a dot( . ).
==> fiery .
fiery ( 1 ) lines match.
Location(s): [ { 3, 3 } { 3, 9 } ]
Requested query: fiery
( 3 ) like a fiery bird in flight. A beautiful fiery bird, he tells her,
Enter a query -- please separate each item by a space.
Terminate query (or session) with a dot( . ).
==> beautiful && fiery .
beautiful ( 1 ) lines match.
Location(s): [ { 3, 8 } ]
fiery ( 1 ) lines match.
Location(s): [ { 3, 3 } { 3, 9 } ]
beautiful && fiery ( 1 ) lines match.
Location(s): [ { 3, 8 } { 3, 9 } ]
Requested query: beautiful && fiery
( 3 ) like a fiery bird in flight. A beautiful fiery bird, he tells her,
Enter a query -- please separate each item by a space.
Terminate query (or session) with a dot( . ).
==> fiery || magical .
fiery ( 1 ) lines match.
Location(s): [ { 3, 3 } { 3, 9 } ]
magical ( 1 ) lines match.
Location(s): [ { 4, 1 } ]
fiery || magical ( 2 ) lines match.
Location(s): [ { 3, 3 } { 3, 9 } { 4, 1 } ]
Requested query: fiery || magical
( 3 ) like a fiery bird in flight. A beautiful fiery bird, he tells her,
( 4 ) magical but untamed. "Daddy, shush, there is no such creature,"
Each part corresponds to a pair of files: Query.h and Query.cpp; UserQuery.h and UserQuery.cpp; and TextQuery.h and TextQuery.cpp. These are the six native C++ files I'll import into my Project and ideally compile without having to make significant changes to the code. That, at least, is one of the primary ideas behind the C++/CLI extensions.
One strategy, of course, is to simply import all the files at once, hit build, and see what happens. Though this may work with a small number of files, generally it doesn't scale because warnings and errors tend to cascade. I prefer to add files incrementally, one at a time. And I'd rather compile the header files first, since usually you have to work through inclusion issues. Besides, they're generally easier to compile than program files.
So that's the process I used, and it was almost totally painless except for two compiler warnings resulting from comparing a signed and unsigned int:
// add existing header files from TQL one at a time
// 1. no compilation errors ...
#include "Query.h"
// 2. two warnings: signed/unsigned comparisons
#include "TextQuery.h"
// 3. again, two warnings on signed/unsigned comparisons
#include "UserQuery.h"
The warnings stem from this line:
if ( _paren < _current_op.size() )
where _current_op is an STL stack class, size() returns an unsigned int, and _paren is an int counter of parentheses in a user-query such as "alice | ( hair & ( magical | fiery) )". The compiler correctly advises:
warning C4018: '<' : signed/unsigned mismatch
In the application itself, however, _paren can't be either negative or so large as to invalidate the comparison, so there's no semantic danger in the comparison, only the possible performance cost of promoting the int to unsigned. When we look at performance issues, we can decide if this actually matters. For now, just live with the warnings. The idea is to get the code transitioned.
Once all the header files compile, I add the program files, again one at a time. The only error, it turns out, was my failure to add precompiled header support. For example:
// 4. add Query.cpp
// compiler message:
// fatal error C1010: unexpected end of file while looking for
// precompiled header. Did you forget to add '#include "stdafx.h"' // to your source?
In fact, that's exactly what I did, not just with Query.cpp, but with TextQuery as well:
// 5. add TextQuery.cpp
// a. same fatal error C1010 ... and same solution ...
// b. compiled fine ...
Happily, by UserQuery.cpp, I had been trained:
// 6. add UserQuery.cpp and add #include "stdafx.h"
If only porting were so simple! One design decision the Visual C++ team made to minimize the potential impact of introducing new keywords into the language was to make them contextual rather than reserved. A contextual keyword has a special meaning only within specific program contexts. Within the general program, for example, sealed is treated as an ordinary identifier. However, when sealed occurs within the declaration portion of a class or virtual function, it is treated as a keyword only within the context of that declaration.
A spaced keyword is a special case of a contextual keyword. It literally pairs an existing keyword with a contextual modifier separated by a space. The pair, such as "ref class", is treated as a single unit rather than as two separate keywords. For example:
ref class Foo // spaced contextual keyword
{ ... } ^ref; // non-contextual use as identifier
In this case, the first use of ref indicates that Foo is a .NET reference class type. The second use of ref is the identifier of the Foo object. The hat (^) symbol identifies the object type to be a .NET reference type.
Now that the source code compiles, the next step is to replace the canned Hello, World greeting with an invocation of the application, as shown in the following:
#include "stdafx.h"
#include "Query.h"
#include "TextQuery.h"
#include "UserQuery.h"
using namespace System;
int main()
{
TextQuery tq;
tq.build_up_text();
tq.query_text();
return 0;
}
When compiled and executed, the application just works, which is pretty cool. The code in
Figure 1 shows a slightly cleaned up session, with the input in red.
This ability of Visual C++ to turn native code into .NET without significant changes to the source code was one of the primary requirements of both the original Managed Extensions for C++ released with Visual Studio .NET back in 2001, and for the C++/CLI extensions that replace the Managed Extensions with Visual Studio 2005.
Wrapping Native Code to Publish within .NET
Now the application is up and running under .NET, but its type hierarchy is opaque both to the runtime and to other languages because the original source code, not surprisingly, is not built within the CTS. So, the next step is to integrate some managed code into the application.
What I want to do is open up TQL to .NET, but without investing heavily in either programmer resources or schedule. My preferred strategy is to wrap the native application in a CTS reference class type—in TQL's case, this means writing a reference class that holds a TextQuery object.
So, the first question is, how does a reference class type declare a native class object as a member? The jokey answer is, very carefully! An actual object of a native class cannot be stored directly within a reference class type; rather, you can only store a pointer to the native class object. So, the initial class will look something like the following:
#include "TextQuery.h"
ref class TQL {
private:
// our native object through which to invoke our application
TextQuery *pTQuery;
};
The TextQuery pointer member, as in native C++, is best initialized within each associated constructor; similarly, I can free it within the destructor, as follows:
ref class TQL {
public:
// constructor creates a TextQuery object on native heap
// destructor frees it
TQL() { pTQuery = new TextQuery; }
~TQL() { delete pTQuery; }
};
This sort of maneuver—acquiring resources through initialization and freeing them within the destructor—should look familiar to the ISO-C++ programmer. Things are a bit more subtle, though, because of how the .NET managed heap handles garbage collection. But I'll save that discussion until my next column.
So far, the class has been concerned only with maintaining the lifetime of the native class object within the managed reference class. What the native application needs now is some form of publication for its interface within .NET. Initially, I'll simply provide a one-to-one mapping of the public operations available to TQL users. To simplify things, I've included just the two main functions:
ref class TQL {
public:
// the operations we wish to publish within .NET
// these will generally be inlined by the compiler
void build_up_text(){ pTQuery->build_up_text(); }
void query_text() { pTQuery->query_text(); }
};
In practice, a one-to-one mapping is likely to be far too expensive since each invocation represents a boundary crossing between .NET and native code. But again, the goal is just to get up and running.
The TQL class definition is placed in a header file called TQL.h, which will replace TextQuery.h in the include file list. Now, I'll revise the main function to use the TQL class rather than the native TextQuery class, like so:
// Our Managed TQL Wrapper Put to Work ...
int main()
{
TQL ^tq = gcnew TQL;
tq->build_up_text();
tq->query_text();
return 0;
}
Notice the gcnew keyword. You use this within C++/CLI when you want to allocate an object from the .NET managed heap (but continue to use new to allocate a non-CTS type within the native heap). The declaration
declares tq to be a tracking handle to a TQL reference class object. The "tracking" adjective underscores the idea that a reference type sits on the managed heap and can therefore transparently move locations during garbage collection heap compaction. A tracking handle is automatically updated by the runtime during the execution of the program.
Note that tracking handles use the arrow member selection operator (->) rather than the dot member selection operator (.) to invoke member functions or access class data members. When compiled and executed, the application works exactly as in the first section.
So that's a wrap, folks! I've taken a 10-year old native application and stubbed it into .NET with nary a whimper. That's the good news. The not-so-good news is that this implementation has a number of problems reflecting subtleties of programming in .NET. I'll address these in detail in the next column.
Send your questions and comments for Stanley to purecpp@microsoft.com.
Stanley B. Lippman began working on C++ with its inventor, Bjarne Stroustrup, in 1984 at Bell Laboratories. Later, Stan worked in feature animation both at Disney and DreamWorks and served as a Software Technical Director on Fantasia 2000. He has since served as Distinguished Consultant with JPL, and an Architect with the Visual C++ team at Microsoft.