Netting C++

The Design Space of the Common Type System

Stanley B. Lippman

Contents

Native Enum vs. Managed Enum
Single Class Solution
Five-Class Solution
Six-Class Solution

Back in the June 2006 installment of Netting C++, I began a series of columns that delve into moving a working native C++ application onto the Microsoft® .NET Framework using the C++/CLI language extensions in Visual C++® 2005. The sample application I am moving is a Text Query Language (TQL), which was developed back in 1996 for the third edition of my book, C++ Primer.

In June, I started the series by showing you how to wrap the native TQL application, and then in August 2006 I went through the code and cleaned it up. Last month I gave an overview of using regular expressions in the .NET Framework. You can find these previous columns at Netting C++ (June 2006) and Netting C++ (August 2006).

TQL is designed to support two operations:

  • Normalization of an arbitrary text file specified by the user into a map of line number occurrences for each unique word.
  • Handling of user queries against that map to display matching lines of text.

The purpose for designing the text normalization is to illustrate nontrivial uses of collection classes. In the native implementation, it uses the Standard Template Library (STL). In a subsequent column, I will look at this design to explore the generic collections namespace of the .NET Base Class Library (BCL). The purpose for designing the text query support is to illustrate a nontrivial but basic-level class design. And it is this aspect of C++/CLI programming design that I look at in this column.

TQL allows the user to query a text file using a set of logical tokens that represent relationships-&& for And, || for Or, and ! for Not. So, to find all occurrences of Holmes, you would simply type "Holmes" (not including the quotes). If you want to find all the text lines in which Holmes does not occur, you would simply prepend Holmes with the Not operator (!Holmes). If your goal is to find all occurrences of either Sherlock or Holmes, you would connect the two words with the Or operator (Sherlock || Holmes). Finally, to find any occurrences in which Sherlock immediately precedes Holmes, you would connect the two words with the And operator (Sherlock && Holmes).

Admittedly, in today's world of search engines, this seems quite paltry. But for examining various class design strategies, it ably fits the bill. And remember, the native implementation is over a decade old-what I'm doing here is adapting it to the .NET Framework using C++/CLI.

The candidate abstractions of the design are the four types of queries a user can make. Let's represent them initially as a set of enumerations, as follows:

enum EQueryType { // native enum
    qWord = 1, qNot, qOr, qAnd
}; 

I've chosen to do this for two reasons. First, you'll need this enum for one of my design solutions, and second, I wanted to provide a brief tutorial on enum support with C++/CLI.

Native Enum vs. Managed Enum

The first question programmers new to C++/CLI ask is what are the differences between the native enum I just showed you and the equivalent managed enum, which looks like this:

class enum EQueryType : Byte { // managed enum
    qWord = 1, qNot, qOr, qAnd
};

There are three primary differences.

The first difference is the granularity of size management. In this example, I specify that a byte of storage is sufficient to represent each of the four enumerator values. This feature has been extended to native support of C++ as well, but only in Visual C++ (it is non-standard).

A second difference is the scope of the enumerators. In a managed environment, an enum maintains its own scope, thereby encapsulating the visibility of its enumerators in the same way a class encapsulates the visibility of its members. This is, of course, not true of the native enum, in which the enumerators all spill into the enclosing scope. This can become a problem when combining modules from multiple sites. To prevent enumerators from polluting the global namespace, a design idiom of native C++ is to nest enums within a class, like so:

class ios {
public:
  enum iostates { read, write, append }; 
  // ...
}

To access any of these enumerators, you prepend the enumerator with the class scope operator, such as ios::read. This is not a good design idiom for managed enums since it will result in two layers of encapsulation, such as ios::iostates::read. Therefore, a managed enum is typically defined within the shared scope of the class it is designed to support rather than within that class, as with native enums.

The third difference is that a managed enum is more like an Object than an integer. This permits the correct resolution of overloaded functions, as in the following:

void f( int );
void f( Object^ );

EQueryType et;

f( et ); // which one?

If managed enums were treated as an integral type-as are native enums-this invocation would have to match f(int). However, given that the backing Enum class within the Common Type System (CTS) is a Value type derived from Object, its type must clearly bind to Object, not int. So it was decided that a managed enum is a kind of Object and not a kind of integer.

I hope this is clear, as I don't have the space to argue this decision more forcefully. While this is a reasonable decision, there is an unfortunate side effect: there is no implicit conversion of a managed enum or one of its enumerators to int, and so each use as an arithmetic value requires an explicit cast, which is a real bear.

So the TQL application has four concrete types that it needs to manipulate: Word, And, Or, and Not. But how many classes do you have? My design solutions yield 1, 5, or 6 classes. Let's step through the design math and see where that leads you.

Single Class Solution

When I start a project, my first design effort is always minimalist. I try to reduce everything to a single class abstraction. That inevitably requires a space trade-off. For example, a query is either unary or binary. A single representation, then, would need to support a binary query and for each unary query you swallow the additional unnecessary operand. Similarly, to squish four abstractions into one general representation, you need a discriminant by which to identify it. For example:

public ref class Query sealed {
 EQueryType discriminant;
 String^  op1; // always used
 String^  op2; // maybe used
 // ...

If you're like me, you'll likely squint at the explicit use of string instances as operands within the Query, but that's an implementation issue rather than a design one and I'll look at that in a future column. I'm using real code just so you can all become familiar with the C++/CLI syntax.

This design is adequate for representing simple queries, such as rain || snow. But it does not work for compound queries, where either operand might be a subquery rather than a single word, as in wind && ( rain || snow ).

The operand fields are the sticking point. The operands need to be able to support substitutability. A query can be viewed as having a tree structure-each leaf node is a word while each interior node represents a compound operation. The representational problem is finding a type for your operands that can reference both a leaf and an interior node type. That common type provides the requisite substitutability and is referred to as polymorphism.

In my one class solution, I again trade space for flexibility, as shown here:

public ref class Query sealed {
 EQueryType isA; 
 String^    word; 
 Query^     op1; 
 Query^     op2; 
 // ...

While this is not necessarily the preferable design, it is a legitimate solution, which requires one class to represent the four candidate abstractions.

Because I am programming under .NET, there is a system-defined solution to the problem of substitutability: Object. In .NET, Object represents a common base type between any two .NET types. That's a fundamental aspect of a unified type system, such as the UTS. Therefore, I could revise my native-minded design as follows:

public ref class Query sealed {
 EQueryType isA; 
 Object^    op1; 
 Object^    op2; 
 // ...

In this manner, op1 and op2 can be either a String or Query, and so I no longer need to throw in the entire kitchen sink.

Five-Class Solution

A four-class design solution can also be devised through the use of Object for each requisite operand, as follows:

public ref class Word sealed { String^ word; ... 
public ref class And sealed  { Object^ op1; Object^ op2; ... 
public ref class Or sealed   { Object^ op1; Object^ op2; ... 
public ref class Not sealed  { Object^ op; ... 

void resolveQuery { Object^ query ); 

However, by substituting a domain-specific fifth class in place of Object, I gain significantly both in elegance and performance. This five-class solution is the canonical object-oriented design.

The fifth class is a common Query abstraction from which each of the actual Query types are derived. Simply substitute this abstract Query class in place of the universal abstract Object, as follows:

public ref class Query abstract { ...

public ref class Word : Query { String^ word; ... 
public ref class And : Query { Query^ op1; Query^ op2; ... 
public ref class Or : Query { Query^ op1; Query^ op2; ... 
public ref class Not : Query { Query^ op; ... 

void resolveQuery { Query^ query ); 

Again, this is the canonical object-oriented solution in which you define a shared interface within the abstract Query class and override type-specific operations in each derived class. Query is preferable to Object because it retains and restricts each referenced object to be a kind of Query. Thus you can directly manipulate the object rather than having to downcast it to its explicit type.

The Query class itself can either be a pure abstract class in which all implementation is pushed down into the derived classes or it can be a hybrid class in which shared implementation is managed within the Query class itself. But again, this is an implementation issue rather than a design issue.

Six-Class Solution

The six-class solution makes use of the generic facility of the common Language Runtime (CLR) 2.0. In this scenario, the mapping of a class hierarchy to a parameterized design is a three-step process.

First, the abstract Query class introduced into your object-oriented design is turned into a parameterized class (that is one of the six classes), like so:

generic <typename queryType>
ref class Query { ...

This class provides an equivalent shared interface to that of the abstract Query base class it replaces.

In order to allow that interface to be recognized by the CLR, I need to publish and constrain an interface. This is the second of the six classes. The interface class represents the set of operations each parameterized type bound to Query must implement and therefore can safely be invoked from within the generic class. Not only do I need to define the interface class, but I must also modify the generic declaration to specify it within the constraint clause introduced by the where keyword, as follows:

interface class IQuery { ...

generic <typename queryType>
 where queryType : IQuery
ref class Query { ...

(For a full discussion of the constraint or where clause, see my Pure C++ column from June 2005.)

Finally, I need to define the four explicit query class types. Each of these can be independent and sealed, but must implement the IQuery interface in order to be bound legally to the generic Query class. This looks something like the following:

ref class Word sealed : IQuery { String^ word; ... 
ref class And sealed  : IQuery { IQuery^ op1; IQuery^ op2; ... 
ref class Or sealed   : IQuery { IQuery^ op1; IQuery^ op2; ... 
ref class Not sealed  : IQuery { IQuery^ op; ... 

Note that, to save space, I have left off the public keyword.

There are no doubt several alternative designs you might hit upon. But this column pretty much traverses the abstraction facilities of the .NET Common Type System using C++/CLI. In my next column, I'll choose one of these designs and walk you through the implementation. Until then, may all your exceptions be handled gracefully.

Send your questions and comments for Stanley to purecpp@microsoft.com.

Stanley B. Lippman began working on C++ with its inventor, Bjarne Stroustrup, in 1984 at Bell Laboratories. Later, Stan worked in feature animation at both Disney and DreamWorks and served as a Software Technical Director on Fantasia 2000. He has since served as Distinguished Consultant with JPL, and an Architect with the Visual C++ team at Microsoft.