New information has been added to this article since publication.
Refer to the Editor's Update below.

Netting C++

Mapping Templates to Generics

Stanley Lippman

Code download available at:  Netting C++ 2007_04.exe(165 KB)

In this month's column, I'll conclude the port of my ISO-C++ Text Query Language (TQL) application to the Microsoft® .NET Framework and C++/CLI. In particular, I'll drill down into mapping templates and the Standard Template Library (STL) in to the .NET generic facility. Although I worked on the original template implementation for the 3.0 cfront release from Bell Laboratories back in 1991 and have been a strong advocate of their use since, I can't recommend the use of templates within C++/CLI.

Under .NET, the static nature of C++ templates instantiation makes them less than first-class citizens of C++/CLI. In language design, we speak of the primary language entities as exhibiting first-class status. Think of an int: the movement in the 1970s to model abstract data types was an attempt to provide a mechanism so that user-defined types could approach the first-class status of primitive language types such as integer. So it is always legitimate to compare the first- and second class-citizens of a language. And if we do that with a generic and a template object under the .NET Framework, we could say that generics fly first class, and templates unfortunately are not only back in coach, but they're in the last row so they can't even recline.

The problem is that a template is local to the assembly within which it is built. This is because an assembly type is identified not simply by its name, but by its location. Thus, two objects of myTemplate<int> passed between assemblies in a method would be flagged as a type error. So templates cannot be part of the public interface of a type. In addition, they are not language-interoperable.

Generics are tightly integrated into .NET because the underlying intermediate language was extended to explicitly support them. In addition, the runtime automatically manages the instantiation of generics on demand as they are needed. That problem was never solved in C++ due to the nature of the native compilation model in which files are compiled separately and linked together only at run time.

More generally, .NET solves three programming-in-the-large problems that proved intractable in C++ due to the native compilation model: the initialization order of complex global objects (that is, the assurance that an object is constructed before its first use when the use and initialization are in separate files), the management of dynamic heap-allocated memory, and the on-demand instantiation of a parameterized type. These problems, each of which represents a language layer of complexity independent of the actual domain solution, disappear under .NET.

The issue of templates versus generics moves front and center in this column because TQL relies heavily on templates-in particular, on the STL. There are three primary problems to address, and they are all aggravating since this is work we would all prefer not to have to do.

First, my implementation uses the STL vector class. While the STL is not currently available under C++/CLI, the .NET Framework does provide an equivalent generic collection type: List<T>. Declaration-wise, then, I simply have to swap it in. However, I'll have to manually change the usage of each object since the two sets of operations are different. (Alternatively, I could think about writing my own one-to-one mapping of the vector class. I'll consider that later for the STL set associative container.) For example, here are a number of vector declarations:

typedef pair<short,short> location;
typedef vector<location>  loc;
typedef vector<string>    text;
typedef pair<text*,loc*>  text_loc;

To replace vector by List, simply open up the namespace, change the type name, and add the tracking handle (^), since List is a reference class:

using namespace System::Collections::Generic;

typedef pair<short,short> location;
typedef List<location>^   loc;
typedef List<String^>^    text;
typedef pair<text,loc>    text_loc;

Second, as you can see from the previous code sample, this implementation also uses the C++ Standard Library pair<> utility. It is not available with C++/CLI, so I'll have to do something there. It's a pretty simple implementation, so I'll just see if I can clone it. The thing is, nothing maps quite one-to-one between the native and managed platform. Again, that's not surprising, since you need to think of the managed platform as the best thinking on how to make the native platform more successful. (I think the only thing missing right now is equivalent performance.)

Finally, my implementation also uses the STL set class. Unfortunately, there is no equivalent generic collection type, so I'll have to do something. My first thought is to implement a one-to-one mapping of its interface-except I cannot hijack heap allocation under the .NET Framework. But, as you'll see, that turns out to be a less than good idea. Here's an example of the TQL use of set:

class Query {
public:
    // ...

private:

    set<short>*       vec2set( const vector<location>* );
    vector<location>  loc; 
    set<short>        *solution;
};

So let's see how to solve the issue with the C++ Standard Library pair<> utility. The pair<> utility is exceedingly trivial. Its value is not so much in the elegance of its design as the fact that it is standardized as part of the C++ Standard Library. However, it does exhibit some template magic that only partially migrates to the .NET Framework and generics. Figure 1 depicts what the ISO-C++ Standard shows us of pair<>'s implementation.

Figure 1 Standard pair<> Implementation

// from Section 20.2.2 Pairs
template <class T1, class T2>
struct pair {
    typedef T1 first_type;
    typedef T2 second_type;

    T1 first;
    T2 second;

    pair();
    pair(const T1& x, const T2& y);
    template<class U, class V> pair(const pair<U, V> &p);
};

template <class T1, class T2>
bool operator==(const pair<T1, T2>& x, const pair<T1, T2>& y);

template <class T1, class T2>
bool operator<(const pair<T1, T2>& x, const pair<T1, T2>& y);

template <class T1, class T2>
pair<T1, T2> make_pair(T1 x, T2 y);

I'll maintain the unencapsulated public nature of pair<> by keeping it a struct. In addition, I'll define it as a value class since I want to minimize its overhead. And, as I indicated at the start of the column, I'll implement it as a generic. The skeleton declaration looks as follows:

generic <class T1, class T2>
value struct pair { ... };

This supports the previous use of pair<> , such as:

typedef List<String^>^    text;
typedef pair<text,loc>    text_loc;

It also supports the sorts of uses illustrated in Figure 2.

Figure 2 Uses of pair<>

int main(array<System::String ^> ^args)
{
    /* generates:
        first: 10, second: 20
        first: hello, second: world
        first: 1024, second: eek!
    */

    pair< int, int > pi( 10, 20 );
    pair< int, int >::first_type t1 = pi.first;
    Console::WriteLine( “first: {0}, second: {1}”, 
                        t1, pi.second );

    pair< String^, String^ > ps( “hello”, “world” );
    pair< String^, String^ >::second_type t2 = ps.second;
    Console::WriteLine( “first: {0}, second: {1}”, 
                        ps.first, t2 );

    pair< int, String^ > psi = 
        pair< int,String^ >::make_pair( 1024, “eek!” );

    pair< int, String^ >::first_type ft = psi.first;
    pair< int, String^ >::second_type st = psi.second;
    Console::WriteLine( “first: {0}, second: {1}”, ft, st );

    return 0;
}

So, let's look at the implementation, the easiest part of which is the data:

generic <class T1, class T2>
value struct pair 
{
    typedef T1 first_type;
    typedef T2 second_type;

    T1 first;
    T2 second;

I was holding my breath to see whether the trait pattern carries over into generics. The typedef-ing of the type parameter to a type-transparent name is one of the more elegant template patterns. (I would suspect that you cannot do this in C#. If you can't, that is one neat advantage C++/CLI has over C#.) In this case, first_type and second_type can be used transparently, as I did in Figure 2:

pair< int, String^ >::first_type ft  = psi.first;
pair< int, String^ >::second_type st = psi.second;

The next set of pair elements we need to deal with are the three constructors, and generics provides mixed results in this case.

According to the standard, the default constructor for pair zeroes out the pair of objects, T1 and T2. Value types cannot support a default constructor because the compiler cannot guarantee that it can always intervene to generate the appropriate invocation. In this case, however, that is more than fine since each value type is guaranteed to be zeroed out by the runtime prior to being handed over to you. So, for this one, the .NET Framework is the winner, so to speak. Removing code, in general, is always an improvement.

The two-argument constructor takes a T1 and a T2. This is largely the same within C++/CLI, except for two issues in the following declaration:

pair(const T1& t1, const T2& t2 );

The first issue is const, which the .NET Framework does not recognize. On the C++/CLI design team, one school of thought is if the framework doesn't recognize const and a function with a const argument is passed to the framework, then we can't guarantee const-ness and therefore mustn't support it. The counter argument, which initially prevailed, was that C++ programmers use const as an interface design toggle and we can guarantee this surface-that is, compile-time-behavior. And, it was claimed, this is very important to them.

The problem with the initial design choice is that it was justified by a supposition. It claimed to know what is most important to the hearts and minds of C++ programmers. That's a difficult conversation to navigate. After all, most of us can make plausible assertions about what C++ programmers want-naturally we won't agree but we all believe that we know what is right. Unfortunately, beliefs are what we get passionate about, and they typically gain importance over technical issues. In a sense, our implementation of templates was more a supposition than a technical decision.

The immutability of a value (that is, its const-ness) is not something we tend to argue about, however passionate our nature. As much as C++ programmers want their const, they want their const to remain const even more. That was just something the compiler cannot guarantee given the absence of const within the System namespace-except perhaps by disallowing all calls into that namespace that want to pass a const parameter. This was felt to be unsupportable. So we reversed ourselves and removed support for const.

The second issue in the two-constructor signature is the declaration of the parameterized type name as a reference:

const T2& t2

You have to do this in native programming because of the performance penalty of passing a large user-defined type by value to the function being invoked. Under the .NET Framework, this is not an issue. A value type should not be more than 16 bytes; reference types are all passed by reference. So the revised C++/CLI two-constructor signature looks like this:

 pair( T1 t1, T2 t2 ): first( t1 ), second( t2 ){}. 

I would characterize this two-argument constructor as a wash when viewed as moving from one platform to the other.

The third constructor is a template copy constructor (at least in the case where T1 equals U and T2 equals V):

template<class U, class V> pair(const pair<U, V> &p);

This is not supported under C++/CLI generics. Member function generics, in general, are supported, such as this:

generic <class T1, class T2>
value struct pair {
    generic <class U, class V> void foo( U, V ); // ok
    // ...
};

But my attempts to have a generic constructor member function result in a rather inscrutable error message, which left me unsure whether the message was a compiler error or an aspect of the common language runtime (CLR) specification:

error C2061: syntax error : identifier 'pair<T1,T2>'
error C2238: unexpected token(s) preceding ';'

The following shows the two additional syntaxes I tried:

generic <class T1, class T2>
value struct pair 
{
    // neither compile ...
    generic <class U, class V> pair<U,V>( int );
    generic <class U, class V> pair( pair<U, V>% );

    // ok: but not quite equivalent
    // note: not considered a copy constructor
    pair( pair<Object^, Object^>% );

    // ...
};

As it turns out, the CLR implementation does not support a generic constructor. (Unfortunately, a clearer error message probably won't make it into the next Visual Studio® release.) Therefore, in scoring this, I would give the point to templates. The entire pair implementation is shown in Figure 3.

Figure 3 The Pair Implementation

generic <class T1, class T2>
public value struct pair {
    typedef T1 first_type;
    typedef T2 second_type;

    T1 first;
    T2 second;

    pair( T1 t1, T2 t2 ): first( t1 ), second( t2 ) {}  

    static bool operator ==( pair<T1,T2> x, pair<T1,T2> y ) 
    {
        return x.first == y.first && x.second == y.second; 
    }

    static bool operator<( pair<T1,T2> x, pair<T1,T2> y )
    { 
        return x.first < y.first || 
           (!( y.first < x.first) && x.second < y.second );
    }

    static pair<T1, T2> make_pair( T1 x, T2 y )
         { return pair( x, y ); }
};

[ Editor's Update - 3/14/2007: The code in Figure 3 will compile under Visual C++ 2005. However, due to the nature of generics, it shouldn't have, and a fix to the compiler was implemented as part of Visual C++ 2005 SP1 that will prevent this from compiling (SP1 is available for download). For a pair struct like the one implemented in Figure 3, there are several solutions. One is to use templates instead of generics (simply changing "generic" to "template" ). Another is to constrain the generic parameters to implement IComparable<T> methods on IComparable<T> can then be used within the struct definition to compare instances of the generic parameters.]

Finally, I have to deal with the set class abstraction provided by the STL. In the .NET Framework 2.0, there is not a set representation within the Generic collection namespace. I don't see any alternative except to cook up something of my own-which is unfortunate since presumably we'll all end up cooking our own slightly different implementation, and that makes for an awfully big set. Figure 4 shows the ISO-C++ specification of the public interface for set. On one level, I imagine people expect I would implement the full STL specification. That was what I first assumed until I looked at it closely.

Figure 4 ISO/C++ Set Public Specification

template<class Key, class Compare = less<Key>,
         class Allocator = allocator<Key> >
class set {
public:
    // types:
    typedef Key key_type;
    typedef Key value_type;
    typedef Compare key_compare;
    typedef Compare value_compare;
    typedef Allocator allocator_type;
    typedef typename Allocator::reference reference;
    typedef typename Allocator::const_reference const_reference;
    typedef implementation defined iterator; // See 23.1
    typedef implementation defined const_iterator; // See 23.1
    typedef implementation defined size_type; // See 23.1
    typedef implementation defined difference_type;// See 23.1
    typedef typename Allocator::pointer pointer;
    typedef typename Allocator::const_pointer const_pointer;
    typedef std::reverse_iterator<iterator> reverse_iterator;
    typedef std::reverse_iterator<const_iterator> const_reverse_iterator;

    // 23.3.3.1 construct/copy/destroy:
    explicit set(const Compare& comp = Compare(),
                 const Allocator& = Allocator());

    template <class InputIterator>
    set(InputIterator first, InputIterator last,
        const Compare& comp = Compare(), const Allocator& = Allocator());
    set(const set<Key,Compare,Allocator>& x);
    ˜set();
    set<Key,Compare,Allocator>& operator=
        (const set<Key,Compare,Allocator>& x);
    allocator_type get_allocator() const;

    // iterators:
    iterator begin();
    const_iterator begin() const;
    iterator end();
    const_iterator end() const;
    reverse_iterator rbegin();
    const_reverse_iterator rbegin() const;
    reverse_iterator rend();
    const_reverse_iterator rend() const;

    // capacity:
    bool empty() const;
    size_type size() const;
    size_type max_size() const;

    // modifiers:
    pair<iterator,bool> insert(const value_type& x);
    iterator insert(iterator position, const value_type& x);
    template <class InputIterator>
    void insert(InputIterator first, InputIterator last);
    void erase(iterator position);
    size_type erase(const key_type& x);
    void erase(iterator first, iterator last);
    void swap(set<Key,Compare,Allocator>&);
    void clear();

    // observers:
    key_compare key_comp() const;
    value_compare value_comp() const;

    // set operations:
    iterator find(const key_type& x) const;
    size_type count(const key_type& x) const;
    iterator lower_bound(const key_type& x) const;
    iterator upper_bound(const key_type& x) const;
    pair<iterator,iterator> equal_range(const key_type& x) const;
};

Why, for example, would anyone want a reverse iterator for a set, let alone both a const and non-const instance of that? The functional abstraction of a set has little connection with, I would guess, 80 percent of its STL interface. That is, about 80 percent of the members of set are a requirement of the STL, not of the set abstraction. So that's pure STL overhead; there is nothing I really need in that. So I'm going to (virtually) toss it out. My set looks like this:

generic <typename element>
public ref class set : IEnumerable {
public:
    set();

    void add( element );
    bool find( element );
    void remove( element );

    int  count();
    bool empty();
    void clear();

    virtual IEnumerator^ GetEnumerator()
};

And then, of course, I need to swap this set interface in place of the STL set of calls. This is the same problem as moving from vector to List. Space doesn't permit me to walk through any of the implementation details, but you can view that and the entire TQL .NET port in the download from the MSDN®Magazine Web site.

Send your questions and comments for Stanley to  purecpp@microsoft.com.

Stanley Lippman began working on C++ with its inventor, Bjarne Stroustrup, in 1984 at Bell Laboratories. Later, Stan worked in feature animation both at Disney and DreamWorks and served as a Software Technical Director on Fantasia 2000. He has since served as Distinguished Consultant with JPL, and an Architect with the Visual C++ team at Microsoft. He is currently Senior Software Engineer at Emergent Game Technologies.