.NET Column: Introducing Generics in the CLR

Article
10/07/2019

.NET Introducing Generics in the CLR and Jason Clark

SUMMARY

Generics are an extension of the CLR type system that allow developers to define types for which certain details are left unspecified. These details are specified when the code is referenced by consumer code, making for enhanced flexibility. Jason Clark explains how.

Contents

A First Look at Generics
A Simple Example
C# Generic Syntax
Indirection
How Does the Compiler Handle Generic Types?
Rules and Constraints
Generic Interfaces and Delegates
Generics in the Class Library
Conclusion

Generics are a shipping feature of the Microsoft® .NET Framework 2.0, and managed code is faster, more maintainable, and more robust because of it.

Generics are an extension to the CLR's type system that allow developers to define types for which certain details are left unspecified. Instead, these details are specified when the code is referenced by consumer code. The code that references the generic type fills in the missing details, tailoring the type to its particular needs. The name generics reflects the goal of the feature: to enable the writing of code while not specifying details that might limit the scope of its usefulness. The code itself is generic. I'll get more specific in just a moment.

A First Look at Generics

As with any new technology, it is helpful to ask just why it's useful. Those of you who are familiar with templates in C++ will find that generics serve a similar purpose in managed code. However, I hesitate to draw too much of a comparison between CLR generics and C++ templates because generics have some additional benefits, as well as some limitations.

Some of the strengths of CLR generics are compile-time type safety, binary code reuse, performance, and clarity. I'll briefly describe these benefits, and as you read the rest of this article, you'll understand them in more detail. As an example let's take two hypothetical collection classes: SortedList, a collection of Object references, and GenericSortedList<T>, a collection of any typeT.

Type Safety When a user adds a String to a collection of type SortedList, there is an implicit cast to Object. Similarly, if a String object is retrieved from the list, it must be explicitly cast at run time from an Object reference to a String reference. This lack of type safety at compile time is both tedious for the developer and prone to error. In contrast, a use of GenericSortedList<String> where T is typed to String causes all add and lookup methods to work with String references. This allows for specifying and checking the type of the elements at compile time rather than at run time.

Binary Code Reuse For maintenance purposes, a developer may elect to achieve compile-time type safety using SortedList by deriving a SortedListOfStrings from it (or by encapsulating SortedList using a type-safe wrapper). The problem with this approach is that new code has to be written for every type for which you want a type-safe list, which can quickly become laborious. With GenericSortedList<T>, all you need to do is instantiate the type with the desired element type as T. As an added value, generics code is generated at run time, so two expansions over unrelated element types such as GenericSortedList<String> and GenericSortedList<FileStream> are able to reuse most of the same just-in-time (JIT)-compiled code. This is true even if the expansions are performed by different referencing assemblies. By performing the expansion of generic code at JIT-compile time, the CLR reduces bloat on disk and in memory and it maintains type equivalency across different assemblies naturally.

Performance The bottom line is this: if type checking is done at JIT-compile time rather than repeatedly during execution, performance improves. In managed code, casts between references and values incur both boxings and unboxings, and avoiding such casts can have an equally negative impact on performance. Current benchmarks of a quick-sort of an array of one million integers shows the generic method is three times faster than the non-generic equivalent. This is because boxing of the values is avoided completely. The same sort over an array of string references resulted in a 20 percent improvement in performance with the generic method due to the absence of a need to perform type checking at run time.

Clarity Clarity with generics comes in a number of forms. Constraints are a feature in generics that have the effect of making incompatible expansions of generic code impossible; with generics, you're never faced with the cryptic compiler errors that plague C++ template users. In the GenericSortedList<T> example, the collection class would have a constraint that makes it only work with types for T that can be compared and therefore sorted. Also, generic methods can often be called without employing any special syntax by using a feature called type inference. And, of course, type safety at compile time increases clarity in application code. I will look at constraints, type inference, and type safety in detail throughout this article.

A Simple Example

The .NET Framework 2.0 provides these benefits out-of-the-box with a suite of generic collection classes in the class library. But applications can further benefit from generics by defining their own generic code. To examine how this is done, I'll start by modifying a simple linked-list node class to make it a generic class type.

The Node class in Figure 1 includes little more than the basics. It has a field, m_data, which refers to the data for the node and a second field, m_next, which refers to the next item in the linked list. Both of the fields are set by the constructor. There are really only two additional bells and whistles, the first of which is that the m_data and m_next fields are accessed using read-only properties named Data and Next. The second is the override of System.Object's ToString virtual method.

Figure 1 Simple Linked List

using System; // Definition of a node type for creating a linked list class Node { Object m_data; Node m_next; public Node(Object data, Node next) { m_data = data; m_next = next; } // Access the data for the node public Object Data { get { return m_data; } } // Access the next node public Node Next { get { return m_next; } } // Get a string representation of the node public override String ToString() { return m_data.ToString(); } } // Code that uses the node type class App { public static void Main() { // Create a linked list of integers Node head = new Node(5, null); head = new Node(10, head); head = new Node(15, head); // Sum-up integers by traversing linked list Int32 sum = 0; for (Node current = head; current != null; current = current.Next) { sum += (Int32) current.Data; } // Output sum Console.WriteLine("Sum of nodes = {0}", sum); } }

Figure 1 also shows code that uses the Node class. This referencing code suffers from some limitations. The problem is that in order to be usable in many contexts, its data must be of the most basic type, System.Object. This means that when you use Node, you lose any form of compile-time type safety. Using Object to mean "any type" in an algorithm or data structure forces the consuming code to cast between Object references and the actual data type. Any type-mismatch bugs in the application are not caught until run time, when they take the form of an InvalidCastException at the point where the cast is attempted.

Additionally, any assignment of a value type such as an Int32 to an Object reference requires a boxing of the instance. Boxing involves a memory allocation and a memory copy, as well as an eventual garbage collection of the boxed value. Finally, casts from Object references to value types such as Int32, as you can see in Figure 1, incur unboxing which also includes a type check. Since boxing and unboxing hurt the overall performance of the algorithm, you can see why using Object to mean "any type" has disadvantages beyond the absence of compile-time type safety.

A rewrite of Node using generics is an elegant way to address these problems. Take a look at the code in Figure 2 and you will see the Node type rewritten as the Node<T> type. A type with generic behaviors such as Node<T> is parameterized and can be called parameterized node, node of T, or generic Node. I will address the new C# syntax in a moment; first let's delve into how Node<T> differs from Node. The Node<T> type is functionally and structurally similar to the Node type. Both support the building of linked lists of any given type of data. However, where Node uses System.Object to represent "any type," Node<T> leaves the type unspecified. Instead, Node<T> uses a type parameter named T that is a placeholder for a type. The type parameter is eventually specified by an argument to Node<T> when consuming code makes use of Node<T>.

Figure 2 Defining Code: Generic Linked-list Node

class Node<T> { T m_data; Node<T> m_next; public Node(T data, Node<T> next) { m_data = data; m_next = next; } // Access the data for the node public T Data { get { return m_data; } } // Access the next node public Node<T> Next { get { return m_next; } } // Get a string representation of the node public override String ToString() { return m_data.ToString(); } }

The code in Figure 3 makes use of Node<T> with 32-bit signed integers by constructing a type name like so: Node<Int32>. In this example Int32 is the type argument to the type parameter T. (Incidentally, C# would also accept Node<int> to indicate T as Int32.) If the code required a linked list of some other type, such as String references, it could do this by specifying it as the type argument to T, like this: Node<String>.

Figure 3 Referencing Code: Generic Linked-list Node

class App { public static void Main() { // Create a linked list of integers Node<Int32> head = new Node<Int32>(5, null); head = new Node<Int32>(10, head); head = new Node<Int32>(15, head); // Sum up integers by traversing linked list Int32 sum = 0; for (Node<Int32> current = head; current != null; current = current.Next) { sum += current.Data; } // Output sum Console.WriteLine("Sum of nodes = {0}", sum); } }

The power behind Node<T> is that its algorithmic behaviors are well defined while the type of data that it operates on remains unspecified. In this way, the Node<T> type is specific in terms of how it works but generic in terms of what it works with. But unlike with use of Object, the unspecified portions of generic code become specific when they are used. Details such as what type of data a linked list should hold is really best left specified by the code that uses Node<T> anyway.

When discussing generics it can be helpful to be explicit about two roles: defining code and referencing code. Defining code includes code that both declares the existence of the generic code and defines the type's members such as methods and fields. The code shown in Figure 2 is the defining code for the type Node<T>. Referencing code is consumer code that makes use of predefined generic code that may also be built into another assembly. Figure 3 is a referencing code example for Node<T>.

The reason that it is helpful to consider these roles is that both defining code and referencing code play a part in the actual construction of useable generic code. The referencing code in Figure 3 uses Node<T> to construct a new type named Node<Int32>. Node<Int32> is a distinct type that was built with two key ingredients: Node<T> created by defining code, and the type argument Int32 to parameter T specified by referencing code. Only with these two ingredients does generic code become complete.

Note that generic types, such as Node<T>, and types constructed from the generic type, such as Node<Int32> or Node<String>, are not related types in terms of object-oriented derivation. The types Node<Int32>, Node<String>, and Node<T> are all siblings derived directly from System.Object.

C# Generic Syntax

The CLR supports multiple programming languages, and there are a variety of syntaxes for CLR generics. However, regardless of syntax, generic code written in one CLR-targeted language will be useable by programs written in other languages. I am going to cover C# syntax here, though the concepts apply equally to Visual Basic®, C++/CLI, and any other .NET language that targets the CLR 2.0.

Figure 4 shows basic C# syntax for defining generic code and referencing generic code. The syntax differences between the two reflect the different responsibilities of the two parties that are involved with generic code.

Figure 4 C# Essential Generic Syntax

Defining Code

<span class="clsBlue">class</span> Node<<span class="clsRed">T</span>> {   <span class="clsRed">T</span>        m_data;   Node<T>  m_next;}

<span class="clsBlue">class</span> Node8Bit : Node<<span class="clsRed">Byte</span>> { •••}

<span class="clsBlue">struct</span> Pair<<span class="clsRed">T</span>,<span class="clsRed">U</span>> {   public <span class="clsRed">T</span> m_element1;   public <span class="clsRed">U</span> m_element2;}

Pair<<span class="clsRed">Byte</span>,<span class="clsRed">String</span>> pair =     new Pair<<span class="clsRed">Byte, String</span>>();pair.m_element1 = 255;pair.m_element2 = "Hi";

<span class="clsBlue">interface</span> IComparable<<span class="clsRed">T</span>> {   Int32 CompareTo(<span class="clsRed">T</span> other);}

<span class="clsBlue">class</span> MyType : IComparable<<span class="clsRed">MyType</span>> {    <span class="clsBlue">public</span> Int32 CompareTo(<span class="clsRed">MyType</span> other) { ... }}

<span class="clsBlue">void</span> Swap<<span class="clsRed">T</span>>(<span class="clsBlue">ref </span><span class="clsRed">T</span> item1, <span class="clsBlue">ref </span><span class="clsRed">T</span> item2) {   <span class="clsRed">T</span> temp = item1;   item1 = item2; item2 = temp;}

Decimal d1 = 0, d2 = 2;Swap<<span class="clsRed">Decimal</span>>(<span class="clsBlue">ref</span> d1, <span class="clsBlue">ref</span> d2);

<span class="clsBlue">delegate void</span> EnumerateItem<<span class="clsRed">T</span>>(<span class="clsRed">T</span> item);

   •••   EnumerateItem<<span class="clsRed">Int32</span>> callback =       <span class="clsBlue">new</span> EnumerateItem<<span class="clsRed">Int32</span>>(CallMe);}<span class="clsBlue">void</span> CallMe(Int32 num) { ... }

The CLR, and therefore C#, support generic classes, structures, methods, interfaces, and delegates. The left side of Figure 4 shows an example of C# syntax for each of these cases of defining code. Note that angle brackets denote a type parameter list. Angle brackets appear immediately after the name of the generic type or member. Also there are one or more type parameters in the type parameter list. Parameters also appear throughout the definition of the generic code in place of a specific CLR type or as an argument to a type constructor. The right side of Figure 4 shows examples of C# syntax for the matching referencing code cases. Note here that angle brackets enclose type arguments; a generic identifier plus brackets constructs a distinct new identifier. Also note that type arguments specify which types to use for constructing a type or method from a generic.

Let's spend a moment on defining code syntax. The compiler recognizes that you are defining a generic type or method when it encounters a type-parameter list demarcated by angle brackets. The angle brackets for a generic definition immediately follow the name of the type or method being defined.

A type-parameter list indicates one or more types that you want to leave unspecified in the definition of the generic code. Type parameter names can be any valid identifier in C# and are separated by commas. Here are some things to note regarding type parameters from the Defining Code section of Figure 4.

In each code example you can see that the type parameters T or U are used throughout the definitions in places where a type name would normally appear.
In the IComparable<T> interface example, you can see the use of both a type parameter T and a regular type Int32. In the definitions of generic code you can mix both unspecified types through type parameters and specified types using a CLR type name.
You can see in the Node<T> example that type parameter T can be used alone as in the definition of m_data, and can also be used as part of another type construction as in the case of m_next. Use of a type parameter as an argument to another generic type definition, such as Node<T>, is called an open generic type. The use of a concrete type as a type argument, such as Node<System.Byte>, is called a closed generic type.
Like any generic method, the example generic method Swap<T> as seen in Figure 4 can be part of a generic or non-generic type and can be an instance, virtual, or static method.

In this article I am using single character names such as T and U for type parameters, primarily to keep things simple. However, you'll see that it is possible to use descriptive names as well. For example, the Node<T> type could be equivalently defined as Node<ItemType> or Node<dataType> in production code.

Generic referencing code is where the unspecified becomes specified. This is necessary if referencing code will actually use generic code. If you look at the examples in the Referencing Code section of Figure 4, you will see that in all cases, a new type or method is constructed from a generic by specifying CLR types as type arguments to the generic. In generic syntax, code like Node<Byte> and Pair<Byte,String> denote type names for new types constructed from a generic type definition.

I have one more syntax detail to cover before I dig in deeper on the technology itself. When code calls a generic method such as the Swap<T> method in Figure 4, the fully qualified call syntax includes any type arguments. However, it is sometimes possible to exclude the type arguments from the call syntax, as shown in these two lines of code:

Decimal d1 = 0, d2 = 2; Swap(ref d1, ref d2);

This simplified call syntax relies on a feature of the C# compiler called type inference, where the compiler uses types of the parameters passed to the method to infer the type arguments. In this case the compiler infers from the data types of d1 and d2 that the type argument to the type parameter T should be System.Decimal. If ambiguities exist, type inference won't work for the caller and the C# compiler will produce an error suggesting that you use the full call syntax, complete with angle brackets and type arguments.

Indirection

Many elegant programming solutions revolve around adding an additional level of indirection. Pointers and references allow a single function to affect more than one instance of a data structure. Virtual functions allow a single call-site to route calls to a set of similar methods—some of which may be defined at a later time. These two examples of indirection are so familiar that the indirection itself often goes unnoticed by the programmer.

A primary purpose of indirection is to increase the flexibility of code. Generics are a form of compile-time indirection in which the definition does not result in code that is directly usable. Instead, in defining generic code a "code factory" is created. The factory code can then be used by referencing code in order to construct code that can be used directly.

Let's look at this idea first with a generic method. The code in Figure 5 defines and references a generic method named CompareHashCodes<T>. The defining code creates a generic method named CompareHashCodes<T>, but none of the code shown in Figure 5 ever calls CompareHashCodes<T> directly. Instead, CompareHashCodes<T> is used by the referencing code in Main to construct two different methods: CompareHashCodes<Int32> and CompareHashCodes<String>. These constructed methods are instances of CompareHashCodes<T>, and they are called by referencing code.

Figure 5 Indirection as a Generic Method

using System; class App { public static void Main() { Boolean result; // Constructs an instance of CompareHashCodes<T> for Int32 result = CompareHashCodes<Int32>(5, 10); // Constructs an instance of CompareHashCodes<T> for String result = CompareHashCodes<String>("Hi there", "Hi there"); } // Generic method, a "method factory" of sorts static Boolean CompareHashCodes<T>(T item1, T item2) { return item1.GetHashCode() == item2.GetHashCode(); } }

Normally, the definition of a method directly defines what the method does. A generic method definition, on the other hand, defines what its constructed method instances will do. The generic method itself does nothing but act as a model for how specific instances are constructed. CompareHashCodes<T> is a generic method from which method instances that compare hash codes are constructed. A constructed instance, such as CompareHashCodes<Int32>, does the actual work; it compares hash codes of integers. In contrast, CompareHashCodes<T> is one level of indirection removed from being callable.

Generic types are similarly one level of indirection removed from their simple counterparts. A simple type definition such as a class or structure is used by the system to create objects in memory. For example, the System.Collection.Stack type is used to create stack objects in memory. In a way, you can think of the new keyword in C# or the newobj instruction in intermediate language code as an object factory that creates object instances using a managed type as the blueprint for each object.

Generic types, on the other hand, are used to instantiate closed types rather than object instances. A type, constructed from a generic type, can then be used to create objects. Let's revisit the Node<T> type defined in Figure 2 along with its referencing code shown in Figure 3.

A managed application could never instantiate an object of type Node<T>, even though it's a managed type. This is because Node<T> lacks sufficient definition to be instantiated as an object in memory. However, Node<T> can be used to instantiate another type during the execution of an application.

Node<T> is an open generic type and is only used for creating other constructed types. If a constructed type created using Node<T> is closed, such as Node<Int32>, then it can be used to instantiate objects. The referencing code in Figure 3 uses Node<Int32> much like a simple type. It creates objects of type Node<Int32>, calls methods on the objects, and so on.

The extra layer of indirection that generic types provide is a very powerful feature. Referencing code that makes use of generic types results in tailor-made managed types. Framing generic code in your mind as a level of indirection removed from its simple counterpart will help you to intuit many of the behaviors, rules, and uses of generics in the CLR. Let's dig deeper into some details.

How Does the Compiler Handle Generic Types?

Both C++ templates and the generics equivalent in the Java language are features of their respective compilers. These compilers construct code from references to the generic or template type at compile time. In C++ this can cause code bloat as well as reduced type equivalence from one construction to another, even when the type arguments are the same. Java, on the other hand, essentially treats generic code as a sort-of "auto-cast" mechanism, which can cause unexpected type equivalence between generic instantiations. CLR generics do not work like either C++ templates or Java generics.

Generics in the CLR are a first-class feature of the platform itself. To implement it this way required changes throughout the CLR including new and modified intermediate language instructions, changes to the metadata, type-loader, JIT compiler, language compilers, and more. There are two significant benefits to the run-time expansion in the CLR.

Even though each construction of a generic type, such as Node<Form> and Node<String>, has its own distinct type identity, the CLR is able to reuse much of the actual JIT-compiled code between the type instantiations. This drastically reduces code bloat and is possible because the various instantiations of a generic type are expanded at run time. All that exists of a constructed type at compile time is a type reference. When assemblies A and B both reference a generic type defined in a third assembly, their constructed types are expanded at run time. This means that, in addition to sharing CLR type-identities (when appropriate), type instantiations from assemblies A and B also share run-time resources such as native code and expanded metadata.

Type equivalency is the second benefit of run-time expansion of constructed types. Here is an example: referencing code in AssemblyA.dll that constructs Node<Int32> and referencing code in AssemblyB.dll that constructs Node<Int32> will both create objects of the same CLR type at run time. This way, if both assemblies are used by the same application, their constructions of the Node<T> type resolve to the same type and their objects can be exchanged freely. Compile-time expansion would make this logically simple equivalency problematic to implement.

There are a few other benefits to implementing generics at the runtime level, rather than at the compiler level. One such benefit is that generic type information is preserved between compilation and execution and is therefore accessible at all points in the code lifecycle. For example, reflection provides full access to generic metadata. Another benefit is rich IntelliSense® support in Visual Studio® as well as a clean debugging experience with generic code. In contrast, Java generics and C++ templates lose their generic identity by run time.

Another benefit, and mainstay of the CLR, is cross-language use—a generic defined using one managed language can be referenced by code written in another managed language. Meanwhile, the likelihood that a language vendor will put generics support in their compilers is increased by the fact that much of the hard work is done in the platform.

Of all the fringe benefits of run-time type expansion, my favorite is a somewhat subtle one. Generic code is limited to operations that are certain to work for any constructed instantiation of the type. The side effect of this restriction is that CLR generics are more understandable and usable (though admittedly less powerful) than their C++ template counterparts. Let's look at the constraints around generics in the CLR.

Rules and Constraints

One problem that plagues programmers who use C++ templates is the many ways in which a particular attempt at type construction might fail, including the failure of a type argument of a type parameter to implement a method called by the templated code. Meanwhile, the compiler errors in these cases can be confusing and may seem unrelated to the root problem. With run-time expansion of constructed types, similar errors would become JIT-compiler or type-load errors rather than compile-time errors. The CLR architects decided that this would not be an acceptable implementation for generics.

Instead, the architects decided that a generic type, such as Node<T>, must be verifiable at compile time as a valid type for any possible type instantiation of the generic type, even though type instantiation actually occurs at run time. As such, expansion errors around problematic type constructions become impossible. In order to achieve this goal, the architects constrained the features of generics with a set of rules and restrictions that make it possible to ensure the validity of a generic type before attempting an expansion of one of its instantiations.

There are some rules that constrain the types of code that you can write generically. The essence of these rules can be summed up in a single statement: generic code is only valid if it will work for every possible constructed instance of the generic. Otherwise, the code is invalid and will not compile correctly (or if it does, it won't pass verification at run time).

At first this can seem like a limiting rule. Here is an example:

public class GenericMath { public T Min<T>(T item1, T item2) { if (item1 < item2) return item1; return item2; } }

This code is not valid in CLR generics. The error that you get from the C# compiler reads something like this:

invalid.cs(4,11): error CS0019: Operator '<' cannot be applied to operands of type 'T' and 'T'

Meanwhile, syntax differences aside, this very same code is allowed in C++ templates. Why the limitation with generics? The reason is that the less-than operator in C# only works with certain types. However, the type parameter T in the previous code snippet could expand to any CLR type at run time. Rather than risk having invalid code at run time, the preceding example is found invalid at compile time.

Operators aside, though, the same limitation applies to more manageable uses of types, such as method calls. This modification to the Min<T> method is also invalid generic code:

class GenericMath { public T Min<T>(T item1, T item2) { if (item1.CompareTo(item2) < 0) return item1; return item2; } }

This code is invalid for the same reason as the previous example. The CompareTo method, although implemented by many types in the base class library and easily implemented by your custom types, is not guaranteed to exist on every possible type that can be used as an argument to T.

But you've already seen that method calls are not completely disallowed in generic code. In Figure 5 the GetHashCode method is called on the two parameterized arguments, and the Node<T> type in Figure 2 calls the ToString method on its parameterized m_data field. Why are GetHashCode and ToString allowed, when CompareTo is not? The reason is that GetHashCode and ToString are both defined on the System.Object type from which every possible CLR type is derived. This means that every possible expansion of type T has an implementation of the ToString and GetHashCode member functions.

If generics are to be useful for anything beyond collection classes, then generic code needs to be able to call methods other than those defined by System.Object. But remember, generic code is only valid if it will work for every possible constructed instance of the generic. The resolution to these two seemingly incompatible requirements is a feature of CLR generics called constraints.

You should be aware that constraints are an optional component of a generic type or method definition. A generic type may define any number of constraints, and each constraint applies an arbitrary restriction on the types that can be used as an argument to one of the type parameters on the generic code. By restricting the types that can be used in the construction of a generic type, the limitations on the code that uses the constrained type parameter are therefore relaxed.

In Figure 6, it is valid for Min<T> and Max<T> to call CompareTo on its items because the constraint is applied to the type parameter T. In the third line of code you can see the introduction of a where clause that looks like this:

where T : IComparable

This constraint indicates that any construction of the method Min<T> must supply a type argument for parameter T of a type that implements the IComparable interface. This constraint limits the variety of possible instantiations of Min<T> while increasing the flexibility of the code in the method such that it can now call CompareTo on variables of type T.

Figure 6 Defining Code: Generic Constraint, Min and Max

public class GenericMath { public static T Min<T>(T item1, T item2) where T : IComparable { if (item1.CompareTo(item2) < 0) return item1; return item2; } public static T Max<T>(T item1, T item2) where T : IComparable { if (item1.CompareTo(item2) < 0) return item2; return item1; } } // System.IComparable definition: // public interface IComparable { // Int32 CompareTo(Object obj); // }

Constraints make generic algorithms possible by allowing generic code to call arbitrary methods on the expanded types. Although constraints require additional syntax for generic defining code, referencing code syntax is unchanged by constraints. The only difference for referencing code is that the type arguments must respect the constraints on the generic type. For example, the following referencing code to Max<T> is valid:

GenericMath.Min(5, 10);

This is because 5 and 10 are integers and the Int32 type implements the IComparable interface. However, the following attempted construction of Max<T> produces a compiler error:

GenericMath.Min(new Object(), new Object());

Here is the error generated by the compiler:

MinMax.cs(32,7): error CS0309: The type 'object' must be convertible to 'System.IComparable' in order to use it as parameter 'T' in the generic type or method 'GenericMath.Min<T>(T, T)'

System.Object is an invalid type argument for T because it does not implement the IComparable interface required by the constraint on T. Constraints make it possible for the compiler to provide descriptive errors like the one shown in the preceding example, when the type argument is not compatible with the type parameter on the generic code.

Generics support three kinds of constraints: the interface constraint, the base class constraint, and the constructor constraint. An interface constraint specifies an interface with which all type arguments to the parameter must be compatible. Any number of interface constraints can apply to a given type parameter.

A base class constraint is similar to an interface constraint, but each type parameter may only include a single base class constraint. If no constraint is specified for a type parameter, then the implicit base class constraint of System.Object is applied. There are two specializations of the base class constraint called the reference type constraint and the value type constraint. A reference type constraint specifies that the argument may specify any reference type, and the value type constraint indicates that the type argument is limited to value types.

The constructor constraint makes it possible for generic code to create instances of the type specified by a type parameter by constraining type arguments to types that implement a public default constructor. At this time, only default (parameterless) constructors are supported with the constructor constraint.

A where clause is used to define a constraint or list of constraints for a given type parameter. Each where clause applies to only a single type parameter. A generic type or method definition may have between zero where clauses and as many where clauses as it has type parameters. A given where clause may include a single constraint or list of constraints separated by commas. The code in Figure 7 shows the various syntaxes for applying constraints to generic code in C#.

Figure 7 Defining Code: C# Generic Constraint Syntax

// Class class SortedCollection<<span class="clsRed" xmlns="https://www.w3.org/1999/xhtml">T</span>> <span class="clsBlue" xmlns="https://www.w3.org/1999/xhtml">where</span><span class="clsRed" xmlns="https://www.w3.org/1999/xhtml">T</span> : IComparable {...} // Interface interface IBlueAlgorithm<<span class="clsRed" xmlns="https://www.w3.org/1999/xhtml">T</span>> <span class="clsBlue" xmlns="https://www.w3.org/1999/xhtml">where</span><span class="clsRed" xmlns="https://www.w3.org/1999/xhtml">T</span> : IRedAlgorithm {...} // Factory Class class MyFactory<<span class="clsRed" xmlns="https://www.w3.org/1999/xhtml">T,P</span>> <span class="clsBlue" xmlns="https://www.w3.org/1999/xhtml">where</span><span class="clsRed" xmlns="https://www.w3.org/1999/xhtml">T</span> : <span class="clsBlue" xmlns="https://www.w3.org/1999/xhtml">new</span>() <span class="clsBlue" xmlns="https://www.w3.org/1999/xhtml">where</span><span class="clsRed" xmlns="https://www.w3.org/1999/xhtml">P</span> : IConstructionPolicy{...} // Method public static void EvaluateItems<<span class="clsRed" xmlns="https://www.w3.org/1999/xhtml">T</span>>(<span class="clsRed" xmlns="https://www.w3.org/1999/xhtml">T</span>[] items) <span class="clsBlue" xmlns="https://www.w3.org/1999/xhtml">where</span><span class="clsRed" xmlns="https://www.w3.org/1999/xhtml">T</span> : IRedAlgorithm, IBlueAlgorithm<<span class="clsRed" xmlns="https://www.w3.org/1999/xhtml">T</span>> {...} // Delegate delegate void GenericEventHandler<<span class="clsRed" xmlns="https://www.w3.org/1999/xhtml">S</span>,<span class="clsRed" xmlns="https://www.w3.org/1999/xhtml">A</span>>(<span class="clsRed" xmlns="https://www.w3.org/1999/xhtml">S</span> sender, <span class="clsRed" xmlns="https://www.w3.org/1999/xhtml">A</span> args) <span class="clsBlue" xmlns="https://www.w3.org/1999/xhtml">where</span><span class="clsRed" xmlns="https://www.w3.org/1999/xhtml">A</span> : EventArgs;

The fundamental rule of generics—that either all possible instantiations of the generic are valid or the generic type itself is invalid—has a few more interesting side effects. First, there's casting. In generic code a variable of a type-parameter type may only be cast to and from its base class constraint type or bases thereof. This means that if a type parameter T has no constraints, it can only be cast to and from Object references. However, if T is constrained to a more derived type such as FileStream, then the generic definition can include casts to and from T and FileStream, as well as all bases of FileStream, down to Object. The code in Figure 8 shows this cast rule in action.

Figure 8 Custom Rule in Generics

static void Foo<T>(T arg) { Object o = arg; T temp = (T) o; Int32 num; num = (Int32) temp; // Compiler error num = (Int32)(Object) temp; // Compiles, but throws InvalidCastException // at run time if T != Int32 temp = (T) 5; // Compiler error temp = (T)(Object) 5; // Compiles, but throws InvalidCastException at // run time if T is not a numeric type }

In this example, T has no constraint and is considered to be unbounded. It has the implicit base class constraint of Object. Casts of T to and from the Object class, as well as casts to and from interface types, are valid at compile time. Casts to other types, such as Int32, for unbounded T are disallowed by the compiler. The casting rule for generics also applies to conversions and therefore disallows the use of casting syntax to make type conversions such as the conversion from an Int32 to an Int64; conversions like this are not supported with generics.

Here's another interesting tidbit. Consider the following code:

void Foo<T>() { T x = null; // compiler error when T is unbounded •••

While a null assignment like this one has obvious uses, there is one possible problem. What if T is expanded to a value type? A null assignment to a value variable has no meaning. Fortunately, the C# compiler provides special syntax that provides correct results regardless of the runtime type of T:

void Foo<T>() { T x = default(T); // OK for any T }

The expression on the right side of the equals sign is called the default value expression, but you can think of it as an operator, available only to generic code. The operator takes a type parameter and results in the appropriate default expression. If T is expanded to a reference type, then default(T) resolves to null. If T is expanded to a value type, then default(T) is the all-bits-zero value for the variable.

Null assignment to a parameterized variable T is disallowed if T is unbounded, so one might guess that the following would also be invalid, but it is not:

void Foo<T>(T x) { if (x == null) { // Ok ••• } }

If null is allowed here, what happens if T is expanded to a value type? Where T is a value type, the Boolean expression in the preceding example is hardwired to false. Unlike in the null assignment case, null comparison makes sense for any expansion of T.

The premise that generic types are verifiably valid at compile time for any possible instantiation does have a limiting effect on your generic code. However, I have found that the additional structure around generics in the CLR, by comparison to templates in C++, has a clarifying effect. Constraints and their surrounding infrastructure are one of my favorite aspects of CLR generics.

Generic Interfaces and Delegates

Generic classes, structures, and methods are the primary features of CLR generics. Generic interfaces and delegates are really supporting features. A generic interface alone has limited usefulness. But when used with a generic class, struct, or method, generic interfaces (and delegates) have significant impact.

The GenericMath.Min<T> and GenericMath.Max<T> methods in Figure 6 both constrain T to be compatible with the IComparable interface. This allows the methods to call CompareTo on the parameterized arguments to the methods. However, neither of these methods as implemented in Figure 6 enjoy the full benefits of generics. The reason is that calls to non-generic interfaces for value types incur a boxing if the interface takes one or more Object parameters, such as the obj parameter of CompareTo.

Each call to GenericMath.Min<T> in Figure 6 causes a boxing of the parameter to the CompareTo method if the instantiation of the method expands T to a value rather than to a reference. This is where generic interfaces can help.

The code in Figure 9 refactors the GenericMath methods to constrain T over a generic interface IComparable<T>. Now if an instantiation of Min<T> or Max<T> uses a value type as an argument to T, the interface call to CompareTo is part of the interface construction, its parameter is the value type, and no boxing occurs.

Figure 9 Defining Code: Generic Interface, Min and Max

public class GenericMath { public static T Min<T>(T item1, T item2) where T : IComparable<T> { if (item1.CompareTo(item2) < 0) return item1; return item2; } public static T Max<T>(T item1, T item2) where T : IComparable<T> { if (item1.CompareTo(item2) < 0) return item2; return item1; } } // public interface IComparable<T> { // Int32 CompareTo(T obj); // }

Generic delegates have similar benefits to generic interfaces, but are oriented to individual methods rather than whole types.

Generics in the Class Library

In addition to the generics implementation in the CLR, the class library includes generic library classes. Generic collection classes for implementing lists, dictionaries, stacks, queues, and linked lists are available as part of the class library. You can find them in the System.Collections.Generic namespace. Additionally, the class library includes supporting interface types such as IList<T>, ICollection<T>, and IComparable<T>, which are generic equivalents of the simple interfaces that shipped in the Microsoft .NET Framework 1.0 and 1.1 class libraries.

Finally, you will notice that types throughout the class library are augmented with generic versions of new and existing functionality. For example, the System.Array class includes generic versions of its BinarySearch and Sort methods that take advantage of the generic interfaces IComparer<T> and IComparable<T>.

In addition to the built-in generic support in the .Net Framework 2.0 base class library, you should check-out the PowerCollections project built on CLR generics at Power Collections for .NET. Available as part of the PowerCollections are the complete source and binaries for dozens of generic collection classes and algorithms.

Conclusion

CLR generics are a powerful application and library development feature. Whether you choose to apply generics by using the collection classes or by architecting your application generically throughout, generics can make your code more type-safe, maintainable, and efficient. It's exciting to see the CLR progress with the addition of significant features like generics. There are so many interesting possibilities available to you.

Additional resources