Printer Friendly Version      Send     
Click to Rate and Give Feedback
Related Articles
We introduce you to the benefits of building composite applications with the Composite Application Guidance for WPF from Microsoft patterns & practices.

By Glenn Block (September 2008)
ADO.NET Data Services provide Web-accessible endpoints that allow you to filter, sort, shape, and page data without having to build that functionality yourself.

By Shawn Wildermuth (September 2008)
See how routed events and routed commands in Windows Presentation Foundation form the basis for communication between the parts of your UI.

By Brian Noyes (September 2008)
Technology changes at a lightening-fast pace. This month Howard Dierking considers how the rapid changes affect developer priorities and magazine focus.

By Howard Dierking (September 2008)
More ...
Articles by this Author
Generics are an extension of the CLR type system that allow developers to define types for which certain details are left unspecified. These details are specified when the code is referenced by consumer code, making for enhanced flexibility. Jason Clark explains how.

By Jason Clark (Visual Studio 2005 Guided Tour 2006)
In the July 2003 installment of the . NET column I covered the basics of Win32® interoperation with the Microsoft® . NET Framework (P/Invoke). Based on reader feedback, this topic is worthy of further coverage, so I have decided to revisit P/Invoke in this column.

By Jason Clark (October 2004)


By Jason Clark (June 2004)


By Jason Clark (January 2004)
Most user applications benefit from the ability to be extended by other developers. It's often easier and more efficient to extend an existing application that users are already familiar with and trained on than it is to develop one from scratch. Thus, extensibility makes your application more attractive. You can build extensibility into your application by supporting features like plug-ins or macros. This is easily accomplished using the .NET Framework even if the core application isn't a .NET Framework app. In this article, the author describes extensibility features of the .NET Framework including late binding and reflection and how to use them, along with plug-in security considerations.

By Jason Clark (October 2003)


By Jason Clark (October 2003)


By Jason Clark (July 2003)
Both the .NET Framework and Windows have some very interesting APIs for creating applications that are capable of updating themselves automatically over a network. There are many advantages to writing your application to update itself like Windows Update does, including convenience for your users, from easier maintenance to network administration. Automatic updates require attention to factors such as discovery, security, and file replacement. In this article, the author covers the BITS API as well as a number of features of the .NET Framework that take care of these facets of auto-updating using the same facilities that the Windows Update uses.

By Jason Clark (February 2003)
More ...
Popular Articles
The goal of the ADO.NET Data Services Framework is to create a simple REST-based framework for exposing and consuming data-centric services easily.

By Elisa Flasko and Mike Flasko (August 2008)
Here we present techniques for programmatic and declarative data binding and display with Windows Presentation Foundation.

By Josh Smith (July 2008)
Here the author answers questions regarding the Entity Framework and provides an understanding of how and why it was developed.

By Elisa Flasko (July 2008)
Learn how to automate custom SharePoint application deployments, use the SharePoint API, and avoid the hassle of custom site definitions.

By E. Wilansky, P. Olszewski, and R. Sneddon (May 2008)
More ...
Read the Blog
SQL Server 2008 supports a new data type, HierarchyID, that helps solve some of the problems in modeling and querying hier­archical information. In the September 2008 issue of MSDN Magazine, Kent Tegels introduces you to the ...
Read more!
Many people using SharePoint technologies don't realize that there is auditing support built directly into the Windows SharePoint Services (WSS) 3.0 platform. In the September 2008 issue of MSDN Magazine, Ted Pattison walks you through a ...
Read more!
The September 2008 issue of MSDN Magazine is now available online. Here's what's in the issue: Hierarchy ID: Model ...
Read more!
Silverlight 2 features a rich and robust control model that is the basis for the controls included in the platform and for third-party control packages. You can also use this control model to build controls of your own. In the August 2008 issue of MSDN Magazine, Jeff Prosise describes how to ...
Read more!
In the August 2008 issue of MSDN Magazine, Matt Milner covers several topics regarding development with Windows Workflow Foundation, some that are intended to address specific reader questions, such as how to safely share a persistence database ...
Read more!
LINQ is a powerful tool enabling quick filtering data based on a standard query language. It can tear through a structured set of data using a simple and straightforward syntax. In the August 2008 issue of MSDN Magazine, Jared Parsons demonstrates a ...
Read more!
More ...
[ Editor's Update - 1/20/2006: This article refers to a beta version of Visual Studio 2005. An updated version of the article, reflecting features found in the final release of Visual Studio 2005, can be found at .NET: Introducing Generics in the CLR.]
.NET
Introducing Generics in the CLR
Jason Clark

In the last installment of this column, I covered Interop with unmanaged code via P/Invoke. In some ways this topic revisited the past by showing how your managed code can access legacy Win32® code. In contrast, this month I'm going to peek into the future by looking at a cool new feature, generics, which will be coming soon to the common language runtime (CLR). I'll introduce generics and discuss the benefits that it brings to your code, and in a future column I'll dig into more details of how the compiler and runtime implement generics.
Generics are an extension to the CLR's type system that allow developers to define types for which certain details are left unspecified. Instead, these details are specified when the code is referenced by consumer code. The code that references the generic type fills in the missing details, tailoring the type to its particular needs. The name generics reflects the goal of the feature: to enable the writing of code while not specifying details that might limit the scope of its usefulness. The code itself is generic. I'll get more specific in just a moment.
Just how far in the future are generics? Microsoft plans to ship generics in the release of the CLR code-named "Whidbey," and there should be a beta release of the Whidbey CLR sometime after this column goes to press. Meanwhile, in the same release of the CLR, there is a planned update to the language and compiler that takes full advantage of generics. Finally, the research group at Microsoft has modified a version of the shared source common language implementation (CLI), code-named "Rotor," to include generics support. This modified runtime, code-named "Gyro," is available at http://research.microsoft.com/projects/clrgen.

A First Look at Generics
As with any new technology, it is helpful to ask just why it's useful. Those of you who are familiar with templates in C++ will find that generics serve a similar purpose in managed code. However, I hesitate to draw too much of a comparison between CLR generics and C++ templates because generics have some additional benefits, including the absence of two common problems: code bloat and developer confusion.
Some of the strengths of CLR generics are compile-time type safety, binary code reuse, performance, and clarity. I'll briefly describe these benefits, and as you read the rest of this column, you'll understand them in more detail. As an example let's take two hypothetical collection classes: SortedList, a collection of Object references, and GenericSortedList<T>, a collection of any type.
Type safety When a user adds a String to a collection of type SortedList, there is an implicit cast to Object. Similarly, if a String object is retrieved from the list, it must be cast at run time from an Object reference to a String reference. This lack of type safety at compile time is both tedious for the developer and prone to error. In contrast, a use of GenericSortedList<String> where T is typed to String causes all add and lookup methods to work with String references. This allows for specifying and checking the type of the elements at compile time rather than at run time.
Binary code reuse For maintenance purposes, a developer may elect to achieve compile-time type safety using SortedList by deriving a SortedListOfStrings from it. The problem with this approach is that new code has to be written for every type for which you want a type-safe list, which can quickly become laborious. With GenericSortedList<T>, all you need to do is instantiate the type with the desired element type as T. As an added value, generics code is generated at run time, so two expansions over unrelated element types such as GenericSortedList<String> and GenericSortedList<FileStream> are able to reuse most of the same just-in-time (JIT)-compiled code. The CLR simply works out the details—no more code bloat!
Performance The bottom line is this: if type checking is done at compile time rather than at run time, performance improves. In managed code, casts between references and values incur both boxings and unboxings, and avoiding such casts can have an equally negative impact on performance. Current benchmarks of a quick-sort of an array of one million integers shows the generic method is three times faster than the non-generic equivalent. This is because boxing of the values is avoided completely. The same sort over an array of string references resulted in a 20 percent improvement in performance with the generic method due to the absence of a need to perform type checking at run time.
Clarity Clarity with generics comes in a number of forms. Constraints are a feature in generics that have the effect of making incompatible expansions of generic code impossible; with generics, you're never faced with the cryptic compiler errors that plague C++ template users. In the GenericSortedList<T> example, the collection class would have a constraint that makes it only work with types for T that can be compared and therefore sorted. Also, generic methods can often be called without employing any special syntax by using a feature called type inference. And, of course, type safety at compile time increases clarity in application code. I will look at constraints, type inference, and type safety in detail throughout this column.

A Simple Example
The Whidbey CLR release will provide these benefits out-of-the-box with a suite of generic collection classes in the class library. But applications can further benefit from generics by defining their own generic code. To examine how this is done, I'll start by modifying a simple linked-list node class to make it a generic class type.
The Node class in Figure 1 includes little more than the basics. It has a field, m_data, which refers to the data for the node and a second field, m_next, which refers to the next item in the linked list. Both of the fields are set by the constructor method. There are really only two additional bells and whistles, the first of which is that the m_data and m_next fields are accessed using read-only properties named Data and Next. The second is the override of System.Object's ToString virtual method.
Figure 1 also shows code that uses the Node class. This referencing code suffers from some limitations. The problem is that in order to be usable in many contexts, its data must be of the most basic type, System.Object. This means that when you use Node, you lose any form of compile-time type safety. Using Object to mean "any type" in an algorithm or data structure forces the consuming code to cast between Object references and the actual data type. Any type-mismatch bugs in the application are not caught until run time, when they take the form of an InvalidCastException at the point where the cast is attempted.
Additionally, any assignment of a primitive value such as an Int32 to an Object reference requires a boxing of the instance. Boxing involves a memory allocation and a memory copy, as well as an eventual garbage collection of the boxed value. Finally, casts from Object references to value types such as Int32, as you can see in Figure 1, incur unboxing which also includes a type check. Since boxing and unboxing hurt the overall performance of the algorithm, you can see why using Object to mean "any type" has its disadvantages.
A rewrite of Node using generics is an elegant way to address these problems. Take a look at the code in Figure 2 and you will see the Node type rewritten as the Node<T> type. A type with generic behaviors such as Node<T> is parameterized and can be called Parameterized Node, Node of T, or Generic Node. I will address the new C# syntax in a moment; first let's delve into how Node<T> differs from Node.
The Node<T> type is functionally and structurally similar to the Node type. Both support the building of linked lists of any given type of data. However, where Node uses System.Object to represent "any type," Node<T> leaves the type unspecified. Instead, Node<T> uses a type parameter named T that is a placeholder for a type. The type parameter is eventually specified by an argument to Node<T> when consuming code makes use of Node<T>.
The code in Figure 3 makes use of Node<T> with 32-bit signed integers by constructing a type name like so: Node<Int32>. In this example Int32 is the type argument to the type parameter T. (Incidentally, C# would also accept Node<int> to indicate T as Int32.) If the code required a linked list of some other type, such as String references, it could do this by specifying it as the type argument to T, like this: Node<String>.
The power behind Node<T> is that its algorithmic behaviors are well defined while the type of data that it operates on remains unspecified. In this way, the Node<T> type is specific in terms of how it works but generic in terms of what it works with. Details such as what type of data a linked list should hold is really best left specified by the code that uses Node<T> anyway.
When discussing generics it can be helpful to be explicit about two roles: defining code and referencing code. Defining code includes code that both declares the existence of the generic code and defines the type's members such as methods and fields. The code shown in Figure 2 is the defining code for the type Node<T>. Referencing code is consumer code that makes use of predefined generic code that may also be built into another assembly. Figure 3 is a referencing code example for Node<T>.
The reason that it is helpful to consider these roles is that both defining code and referencing code play a part in the actual construction of useable generic code. The referencing code in Figure 3 uses Node<T> to construct a new type named Node<Int32>. Node<Int32> is a distinct type that was built with two key ingredients: Node<T> created by defining code, and the type argument Int32 to parameter T specified by referencing code. Only with these two ingredients does generic code become complete.
Note that generic types, such as Node<T>, and types constructed from the generic type, such as Node<Int32> or Node<String>, are not related types in terms of object-oriented derivation. The types Node<Int32>, Node<String>, and Node<T> are all siblings derived directly from System.Object.

C# Generic Syntax
The CLR supports multiple programming languages, so there will be a variety of syntaxes for CLR generics. However, regardless of syntax, generic code written in one CLR-targeted language will be useable by programs written in others. I am going to cover C# syntax here because at the time of this writing, the C# syntax for generics is fairly stable among the big three managed languages. However, it is worth noting that both Visual Basic® .NET and Managed C++ should also support generics in their Whidbey releases.
Figure 4 shows basic C# syntax for generic defining code and generic referencing code. The syntax differences between the two reflect the different responsibilities of the two parties that are involved with generic code.
The current plan is that the CLR, and therefore C#, will support generic classes, structures, methods, interfaces, and delegates. The left side of Figure 4 shows an example of C# syntax for each of these cases of defining code. Note that angle brackets denote a type parameter list. Angle brackets appear immediately after the name of the generic type or member. Also there are one or more type parameters in the type parameter list. Parameters also appear throughout the definition of the generic code in place of a specific CLR type or as an argument to a type constructor. The right side of Figure 4 shows examples of C# syntax for the matching referencing code cases. Note here that angle brackets enclose type arguments; a generic identifier plus brackets constructs a distinct new identifier. Also note that type arguments specify which types to use for constructing a type or method from a generic.
Let's spend a moment on defining code syntax. The compiler recognizes that you are defining a generic type or method when it encounters a type-parameter list demarcated by angle brackets. The angle brackets for a generic definition immediately follow the name of the type or method being defined.
A type-parameter list indicates one or more types that you want to leave unspecified in the definition of the generic code. Type parameter names can be any valid identifier in C# and are separated by commas. Here are some things to note regarding type parameters from the Defining Code section of Figure 4:
  • In each code example you can see that the type parameters T or U are used throughout the definitions in places where a type name would normally appear.
  • In the IComparable<T> interface example, you can see the use of both a type parameter T and a regular type Int32. In the definitions of generic code you can mix both unspecified types through type parameters and specified types using a CLR type name.
  • You can see in the Node<T> example that type parameter T can be used alone as in the definition of m_data, and can also be used as part of another type construction as in the case of m_next. Use of a type parameter as an argument to another generic type definition, such as Node<T>, is called an open generic type. The use of a concrete type as a type argument, such as Node<System.Byte>, is called a closed generic type.
  • Like any generic method, the example generic method Swap<T> as seen in Figure 4 can be part of a generic or non-generic type and can be an instance, virtual, or static method.
In this column I am using single character names such as T and U for type parameters, primarily to keep things simple. However, you'll see that it is possible to use descriptive names as well. For example, the Node<T> type could be equivalently defined as Node<ItemType> or Node<dataType> in production code.
At the time of this writing, Microsoft has standardized on single-character type parameter names in library code to help distinguish the names from those used for ordinary types. Personally, for production code I like camelCasing type parameters because it distinguishes them from simple type names in code while remaining descriptive.
Generic referencing code is where the unspecified becomes specified. This is necessary if referencing code will actually use generic code. If you look at the examples in the Referencing Code section of Figure 4, you will see that in all cases, a new type or method is constructed from a generic by specifying CLR types as type arguments to the generic. In generic syntax, code like Node<Byte> and Pair<Byte,String> denote type names for new types constructed from a generic type definition.
I have one more syntax detail to cover before I dig in deeper on the technology itself. When code calls a generic method such as the Swap<T> method in Figure 4, the fully qualified call syntax includes any type arguments. However, it is sometimes possible to optionally exclude the type arguments from the call syntax, as shown in these two lines of code:
Decimal d1 = 0, d2 = 2;
Swap(ref d1, ref d2);
This simplified call syntax relies on a feature of the C# compiler called type inference, where the compiler uses types of the parameters passed to the method to infer the type arguments. In this case the compiler infers from the data types of d1 and d2 that the type argument to the type parameter T should be System.Decimal. If ambiguities exist, type inference won't work for the caller and the C# compiler will produce an error suggesting that you use the full call syntax, complete with angle brackets and type arguments.

Indirection
A good friend of mine is fond of pointing out that most elegant programming solutions revolve around adding an additional level of indirection. Pointers and references allow a single function to affect more than one instance of a data structure. Virtual functions allow a single call-site to route calls to a set of similar methods—some of which may be defined at a later time. These two examples of indirection are so common that the indirection itself often goes unnoticed by the programmer.
A primary purpose of indirection is to increase the flexibility of code. Generics are a form of indirection in which the defininition does not result in code that is directly usable. Instead, in defining generic code a "code factory" is created. The factory code can then be used by referencing code in order to construct code that can be used directly.
Let's look at this idea first with a generic method. The code in Figure 5 defines and references a generic method named CompareHashCodes<T>. The defining code creates a generic method named CompareHashCodes<T>, but none of the code shown in Figure 5 ever calls CompareHashCodes<T> directly. Instead, CompareHashCodes<T> is used by the referencing code in Main to construct two different methods: CompareHashCodes<Int32> and CompareHashCodes<String>. These constructed methods are instances of CompareHashCodes<T> and they are called by referencing code.
Normally, the definition of a method directly defines what the method does. A generic method definition, on the other hand, defines what its constructed method instances will do. The generic method itself does nothing but act as a model for how specific instances are constructed. CompareHashCodes<T> is a generic method from which method instances that compare hash codes are constructed. A constructed instance, such as CompareHashCodes<Int32>, does the actual work; it compares hash codes of integers. In contrast, CompareHashCodes<T> is one level of indirection removed from being callable.
Generic types are similarly one level of indirection removed from their simple counterparts. A simple type definition such as a class or structure is used by the system to create objects in memory. For example, the System.Collection.Stack type in the class library is used to create stack objects in memory. In a way, you can think of the new keyword in C# or the newobj instruction in intermediate language code as an object factory that creates object instances using a managed type as the blueprint for each object.
Generic types, on the other hand, are used to instantiate closed types rather than object instances. A type, constructed from a generic type, can then be used to create objects. Let's revisit the Node<T> type defined in Figure 2 along with its referencing code shown in Figure 3.
A managed application could never create an object of type Node<T>, even though it's a managed type. This is because Node<T> lacks sufficient definition to be instantiated as an object in memory. However, Node<T> can be used to instantiate another type during the execution of an application.
Node<T> is an open generic type and is only used for creating other constructed types. If a constructed type created using Node<T> is closed, such as Node<Int32>, then it can be used to create objects. The referencing code in Figure 3 uses Node<Int32> much like a simple type. It creates objects of type Node<Int32>, calls methods on the objects, and so on.
The extra layer of indirection that generic types provide is a very powerful feature. Referencing code that makes use of generic types results in tailor-made managed types. Framing generic code in your mind as a level of indirection removed from its simple counterpart will help you to intuit many of the behaviors, rules, and uses of generics in the CLR.

Conclusion
This time around you saw the benefits of generic types—how their use can improve type safety, code reuse, and performance. You also got a taste of the syntax in C# and saw how generics lead to another level of indirection, resulting in greater flexibility. Stay tuned next time when I'll dig deeper into generics internals.

Send your questions and comments for Jason to  dot-net@microsoft.com.


Jason Clark provides training and consulting for Microsoft and Wintellect (http://www.wintellect.com) and is a former developer on the Windows NT and Windows 2000 Server team. He is the coauthor of Programming Server-side Applications for Microsoft Windows 2000 (Microsoft Press, 2000). You can get in touch with Jason at JClark@Wintellect.com.

© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.
Page view tracker