Netting C++

Mapping Native C++ to the Common Type System

Stanley B. Lippman

Contents

Keyword Surprises
Inheritance
Copy Semantics
New Language Lessons
Next Time

In this column, I want to pick up where I left off in the December 2006 installment and begin translating the Text Query Language (TQL) Query class hierarchy from C++ to the .NET Common Type System (CTS). In this instance, I'll use C++/CLI, but you could use C# too. The primary benefit of C++/CLI is that the app can be incrementally translated, maintaining a source-level mix of native and managed C++. To get started, let's look at the native Query base class implementation as presented in Figure 1.

Figure 1 Native Abstract Query Base Class

#include <vector>
#include <iostream>
#include <set>
#include <string>
#include <utility>

using namespace std;

typedef pair< short, short > location;

class Query {
public:
   virtual ~Query(){ delete _solution; }
   Query& operator=( const Query& );

   // the virtual set of operations ...
   virtual void eval() = 0;
   virtual Query *clone() const = 0;
   virtual ostream& print( ostream &os ) const = 0;
   virtual bool add_op( Query* ) { return true; }
   virtual void display_solution( ostream &os = cout, int tabcnt = 0 );

   // factored query-wide operations
   void  handle_tab( ostream &os, int tabcnt ) const;
   void display_partial_solution();

   const set< short > *solution();
   const vector< location > *locations() const { return &_loc; }

   // support for recognizing embedded parentheses within the query
   // as in alice && ( fiery || magical )
   void lparen( short lp ) { _lparen = lp; }
   void rparen( short rp ) { _rparen = rp; }

   short lparen() { return _lparen; }
   short rparen() { return _rparen; }

   void print_lparen( short cnt, ostream& os ) const;
   void print_rparen( short cnt, ostream& os ) const;

protected:
    
   Query() : _lparen( 0 ), _rparen( 0 ), _solution( 0 ) {}
   Query( const Query &rhs )
      : _loc( rhs._loc ), _solution( 0 ),
        _lparen( rhs._lparen ), _rparen( rhs._rparen )
      {}

   Query( const vector<location> &loc )
      : _loc( loc ), _solution( 0 ), 
        _lparen( 0 ), _rparen( 0 ) 
      {}

    set<short>* _vec2set( const vector<location>* );
    short  _lparen;
    short  _rparen;

    vector<location> _loc; 
    set<short> *_solution;
};

The Query class implementation represents a hybrid abstract class design: abstract because it provides a pure virtual interface that each derived class instance must override, and hybrid because it also provides a concrete implementation of the infrastructure shared among all the derived types. The benefit of a hybrid design is twofold: first, it simplifies the delivery of specialized derived class types because it allows their designers to focus on the specialized algorithms where their expertise lies without requiring the repetitive implementation of non-domain infrastructure in which they may have little interest or understanding. Second, it allows non-virtual and therefore potentially inline implementations of the shared functionality. While this efficiency gain can prove significant in native mode, the gain in .NET is less certain because of platform security issues.

The CTS can elegantly capture the hybrid nature of the design in a way that native C++ cannot-the set of pure virtual functions is captured in an IQuery interface class and the concrete shared infrastructure is factored into an abstract Query reference base class, the outline of which is presented in Figure 2.

Figure 2 Query Reference Base Class

// pure virtual interface goes into IQuery
interface class IQuery { 
    void eval();
    void display_solution();
    // ... 
};
 
// shared implementation goes into Query
public ref class Query abstract : IQuery { 
    
    short lparen;
    short rparen;
 
    list<location>^ _loc; 
    set<short>^_     solution; 
 
public: 
    // ...
};
 
// AndQuery inherits Query and implements IQuery
public ref class AndQuery : Query { ... };

Recall you must specify the reference classes as public if you want them to be accessible outside the assembly. Interfaces are public by default, and so the public keyword, while permitted, is redundant. (C++/CLI has a lot of optional redundancy like this.)

Keyword Surprises

Two surprises for the native C++ programmer in Figure 2 are the keyword abstract in our declaration of the Query class, which is not present in ISO-C++, and the absence of the public keyword before the Query base class specification in the declaration of AndQuery. Let's look briefly at each.

Bjarne Stroustrup introduced support for a language-level notion of an abstract class in the late beta phase of what was then called cfront Release 2.0 back around 1988. This product straddled multiple project groups within Bell Laboratories with different management priorities. The instantiation of an abstract class generates a compile-time error, preventing a potentially disastrous run-time failure. While everyone agreed this was a good thing, not everyone agreed that it was a good thing to add a new keyword this late in the release cycle. As I recall, Bjarne was judged to have the freedom to add support for an abstract base class, being the language inventor, but he did not enjoy the freedom to add a product keyword when doing so was judged potentially invasive to the successful release of that product. To move forward, Bjarne invented the rule that a class with an associated pure virtual function is to be treated as an abstract class. Interestingly enough, this design aspect of C++ has a remarkably long and unbroken history of being criticized.

For C++/CLI, it was decided that we'd add the keyword abstract, and it was, in my opinion, an elegant design decision to place it after the class name, as it appears in the Query base class in Figure 2. The abstract keyword was also extended to be used as an alternative syntax for declaring a pure virtual function. (Of course, this is analogous to the use of abstract in C#.) Both forms of pure virtual function declaration are considered equivalent within C++/CLI, and over time I suspect the form you choose to code in will give away your programming platform origins, be it native or managed. An additional keyword that can be used to explicitly constrain either inheritance of a class or the ability to override a virtual function is the sealed keyword. It would be very cool if in the next iteration of ISO-C++, the committee backpatched the language with these extensions.

Inheritance

In the CTS, only public inheritance is supported. This makes sense given its restriction to single inheritance only. The primary design use of private inheritance is the mix-in model of multiple inheritance. In this design model, an implementation is privately inherited while the public inheritance represents the interface of the abstraction. For example, a fixed-length queue in the visionary Booch Components back in the 1990s was, if I recall correctly, implemented as follows:

class FixedLengthQueue : 
      public Queue, private Vector { ... };

The dynamic-length queue, on the other hand, maintained the same public interface, but had a private backing store of a list rather than a vector, with the gain in flexibility at the expense of space and time. This kind of design lost favor in both the inheritance community and within C++ with the success of the Standard Template Library (STL).

In the CTS, interfaces remove the need for multiple inheritance and, with that, the need for private inheritance. In the earlier Managed Extensions to C++ that was released with Visual Studio® .NET back in 2001 (Visual C++® 7.0), the public keyword was required even though public inheritance is the only possible relationship between the base and derived class. If you left it off, it was a compiler error. It was felt that this was necessary back then because there was concern that native C++ programmers moving to .NET would otherwise become confused. (The absence of the public keyword in native C++, of course, indicates a private inheritance relationship.) In C++/CLI, the public keyword is optional. The idea now basically is that there are plenty of more substantial issues that the native C++ programmer may find confusing, so let's choose our battles carefully. In fact, this is a sort of set-up to my next topic, which is the difference in copy semantics between native C++ and .NET, and how that impacts your designs. In particular, if you look again at Figure 1, you'll see that the issue of copy construction and the use of the virtual clone pattern within the native Query implementation needs to be addressed.

Copy Semantics

Before considering how to address copy construction and the virtual clone pattern, let's make sure you know why it is necessary to do so within a native implementation.

In native C++, a frequent design pitfall manifests in the following way. You first declare a pointer member within the class that addresses some memory allocated on the heap, either within the class constructor or through a factory pattern. You then define a destructor that deletes the memory associated with that pointer member, but you forget or fail to realize you need to provide a copy constructor, and so the combination of a pointer member and destructor makes the program technically undefined in its behavior. Statistically, this is not a good thing.

The job of the copy constructor (and the copy assignment operator, of course) is to effect a deep copy of the object addressed by the pointer member (of course, this can scale to any number and any depth of pointer members), to create a new object on the native heap, and to initialize that object with a value-by-value copy of the original. This is necessary (in lieu of a more complex reference-counting scheme) because the default behavior of native pointers within native C++ is a shallow copy; that is, the target pointer is assigned to address the same object in memory as addressed by the source pointer. Without the deep copy (or some form of reference counting), the aliasing of a single object across multiple class objects is disastrous.

This problem is compounded when the pointer addresses the base class of an inheritance hierarchy, such as Query. In this case, the copy constructor cannot be correctly invoked because the pointer is polymorphic and requires a virtual method that is resolved at run time. A constructor, however, cannot be virtual because it is invoked on raw memory prior to the creation of the type you want to select on.

These are the kinds of problems that pioneers in a language either celebrate through feats of cleverness, or find most foul. At this point within the history of C++, the solution is widely familiar. The accepted design pattern is to provide a virtual clone function, as in Query, that is overridden within each derived class. The implementation of the clone method is trivial; it's the pattern that is powerful:

//native code clone pattern ... 
class AndQuery : public Query {
public:
    virtual Query *clone() const
          { return new AndQuery( *this ); }

    // ...
};

New Language Lessons

As a first attempt to translate the solution into a new language, I am often tempted to mechanically map the original implementation to the new language, and if the obvious mapping is unavailable, my instinct is often to criticize the new language. In this case, I might be tempted to map this deep copy mechanism to .NET. After all, C++/CLI offers a copy constructor (and copy assignment operator) and the .NET Framework offers the IClone interface. But doing so would be wrong.

The first problem is that the C++/CLI copy constructor, like the C++/CLI destructor, may or may not get invoked. It depends largely on how the reference class object is declared, and the class designer cannot impose that on the programmer. For example, if I provide a copy constructor for the AndQuery class, and someone writes this

AndQuery ^aq1 = gcnew AndQuery;
AndQuery ^aq2 = aq1;

the copy constructor for AndQuery is not invoked by the second declaration statement; rather, aq2 and aq1 both reference the same AndQuery object allocated in the first declaration statement. That's nonintutive to a lot of programmers. When we couldn't guarantee the invocation of the default constructor in a value class, we decided it was safer to remove the default constructor for a value class supported within the Managed Extensions. (I don't personally see how this case differs in terms of potential trouble.)

Microsoft itself fell into the same sort of "well, it was a good idea but it just didn't pan out" challenge with the IClone interface. In the .NET Framework 2.0 version of the common language runtime (CLR), the use of IClone has been deprecated. If you look at the generic collection types, you'll see that they do not provide an IClone implementation although the non-generic collection types they mirror do. The problem in this case is an ambiguity as to just how deep a copy IClone guarantees.

In both cases, if you can't depend on consistent semantics, it seems prudent to avoid the use of the mechanism altogether. So, where does that leave things? Actually, just fine, if you step back and think about it. That's one of the rewarding side-effects of learning another language; you realize how often you have stopped thinking in a habitually familiar language.

I needed to introduce clone because a copy constructor cannot be declared virtual. I needed to introduce a copy constructor because the class destructor deallocated the heap object, and so I had to avoid inadvertent aliasing of that object across class instances. I needed to introduce a destructor because the native heap requires that I explicitly manage heap memory.

Hopefully, a lightbulb should go off right now, you know, like in the cartoons. In .NET, the managed heap is garbage-collected. That means you don't need the destructor. But then that also means you don't need to manage a deep copy. And it means you don't need the clone pattern either. The whole complex machinery, in fact, can be discarded. It proves to be an artifact of the programming language, not of the program.

Next Time

The next snarly concerns for reengineering from native C++ to C++/CLI are the general issue of templates, and the specific one of the STL. Look again at Figure 1; you'll see that the Query class implementation makes considerable use of the STL as well as the pair template, neither of which are available in a managed form (yet). I'll explore how to cope in their absence in the next column.

Send your questions and comments for Stanley to purecpp@microsoft.com.

Stanley B. Lippman currently works for Perpetual Entertainment, a company specializing in the development of massively multiplayer online games, and the infrastructural technology to develop and support them. Stan is best known for working on the original C++ cfront compiler with its creator, Bjarne Stroustrup. His favorite all-time job was Software Technical Director on the Firebird segment of Fantasia 2000. He worked with the Visual C++ team at Microsoft in the invention of C++/CLI.