June 2012

Volume 27 Number 06

CLR - What’s New in the .NET 4.5 Base Class Library

By Immo Landwerth | June 2012

The Microsoft .NET Framework base class library (BCL) is all about fundamentals. Although some of the basic constructs are stable and don’t change very much (for example, System.Int32 and System.String), Microsoft is still investing a great deal in this space. This article covers the big (and some of the small) improvements made to the BCL in the .NET Framework 4.5.

When reading this article, please keep in mind that it’s based on the .NET Framework 4.5 beta version, not on the final product and APIs, thus features are subject to change.

If you’d like an overview of other areas in the .NET Framework, such as Windows Communication Foundation (WCF) or Windows Presentation Foundation (WPF), take a look at the MSDN Library page, “What’s New in the .NET Framework 4.5 Beta” (bit.ly/p6We9u).

Simplified Asynchronous Programming

Using asynchronous I/O has several advantages. It helps to avoid blocking the UI and it can reduce the number of threads the OS has to use. And yet, chances are you’re not taking advantage of it because asynchronous programming used to be quite complex. The biggest problem was that the previous Asynchronous Programming Model (APM) was designed around Begin/End method pairs. To illustrate how this pattern works, consider the following straightforward synchronous method that copies a stream:

public void CopyTo(Stream source, Stream destination)
{
  byte[] buffer = new byte[0x1000];
  int numRead;
  while ((numRead = source.Read(buffer, 0, buffer.Length)) != 0)
  {
    destination.Write(buffer, 0, numRead);
  }
}

In order to make this method asynchronous using the earlier APM, you’d have to write the code shown in Figure 1.

Figure 1 Copying a Stream the Old Asynchronous Way

public void CopyToAsyncTheHardWay(Stream source, Stream destination)
{
  byte[] buffer = new byte[0x1000];
  Action<IAsyncResult> readWriteLoop = null;
  readWriteLoop = iar =>
  {
    for (bool isRead = (iar == null); ; isRead = !isRead)
    {
      switch (isRead)
      {
        case true:
          iar = source.BeginRead(buffer, 0, buffer.Length, 
            readResult =>
          {
            if (readResult.CompletedSynchronously) return;
            readWriteLoop(readResult);
          }, null);
          if (!iar.CompletedSynchronously) return;
          break;
        case false:
          int numRead = source.EndRead(iar);
          if (numRead == 0)
          {
            return;
          }
          iar = destination.BeginWrite(buffer, 0, numRead, 
            writeResult =>
          {
            if (writeResult.CompletedSynchronously) return;
            destination.EndWrite(writeResult);
            readWriteLoop(null);
          }, null);
          if (!iar.CompletedSynchronously) return;
          destination.EndWrite(iar);
          break;
        }
      }
  };
  readWriteLoop(null);
}

Clearly, the asynchronous version is not nearly as simple to understand as the synchronous counterpart. The intention is hard to distill from the boilerplate code that’s required to make basic programming language constructs, such as loops, work when delegates are involved. If you aren’t scared yet, try to add exception handling and cancellation.

Fortunately, this release of the BCL brings a new asynchronous programming model that’s based around Task and Task<T>. C# and Visual Basic added first-class language support by adding the async and await keywords (by the way, F# already had language support for it via asynchronous workflows and, in fact, served as an inspiration for this feature). As a result, the compilers will now take care of most, if not all, of the boilerplate code you used to have to write. The new language support, as well as certain API additions to the .NET Framework, work together to make writing asynchronous methods virtually as simple as writing synchronous code. See for yourself—to make the CopyTo method asynchronous now requires only the following highlighted changes:

public async Task CopyToAsync(Stream source, Stream destination)
{
  byte[] buffer = new byte[0x1000];
  int numRead;
  while ((numRead = await 
     source.ReadAsync(buffer, 0, buffer.Length)) != 0)
  {
    await destination.WriteAsync(buffer, 0, numRead);
  }
}

There are lots of interesting things to say about asynchronous programming using the new language features, but I’d like to focus on the impact on the BCL and you, the BCL consumer. If you’re super curious, though, go on and read the MSDN Library page, “Asynchronous Programming with Async and Await (C# and Visual Basic)” (bit.ly/nXerAc).

To create your own asynchronous operations, you need low-level building blocks. Based on those, you can build more complex methods like the CopyToAsync method shown earlier. That method requires only the ReadAsync and WriteAsync methods of the Stream class as building blocks. Figure 2 lists some of the most important asynchronous APIs added to the BCL.

Figure 2 Asynchronous Methods in the BCL

Type Methods
System.IO.Stream

ReadAsync

WriteAsync

FlushAsync

CopyToAsync

System.IO.TextReader

ReadAsync

ReadBlockAsync

ReadLineAsync

ReadToEndAsync

System.IO.TextWriter

WriteAsync

WriteLineAsync

FlushAsync

Please note that we did not add asynchronous versions of APIs that had a very small granularity, such as TextReader.Peek. The reason is that asynchronous APIs also add some overhead and we wanted to prevent developers from accidentally heading in the wrong direction. This also means we specifically decided against providing asynchronous versions for methods on BinaryReader or BinaryWriter. To use those APIs, we recommend using Task.Run to start a new asynchronous operation and then using synchronous APIs from within that operation—but don’t do it per method call. The general guidance is: Try to make your asynchronous operation as chunky as possible. For example, if you want to read 1,000 Int32s from a stream using BinaryReader, it’s better to run and await one Task to synchronously read all 1,000 than to individually run and await 1,000 Tasks that each reads only a single Int32.

If you want to learn more about writing asynchronous code, I recommend reading the Parallel Programming with .NET blog (blogs.msdn.com/b/pfxteam).

Read-Only Collection Interfaces

One of the long-standing feature requests for the BCL was read-only collection interfaces (not to be confused with immutable collections; see “Different Notions of Read-Only Collections” on p. 22 for details). Our position has been that this particular aspect (being read-only) is best modeled using the optional feature pattern. In this pattern, an API is provided that allows consumers to test whether a given feature is unsupported and to throw a NotSupportedException.

The benefit of this pattern is that it requires fewer types, because you don’t have to model feature combinations. For example, the Stream class provides several features, and all of them are expressed via Boolean get accessors (CanSeek, CanRead, CanWrite and CanTimeout). This allows the BCL to have a single type for streams and yet support every combination of the streaming features.

Over the years, we came to the conclusion that adding a read-only collection interface is worth the added complexity. First, I’d like to show you the interfaces and then discuss the features they offer. Figure 3 shows a Visual Studio class diagram of the existing (mutable) collection interfaces; Figure 4 shows the corresponding read-only interfaces.

Mutable Collection Interfaces
Figure 3 Mutable Collection Interfaces

Read-Only Collection Interfaces
Figure 4 Read-Only Collection Interfaces

Note that IReadOnlyCollection did not make it into the .NET Framework 4.5 beta, so don’t be surprised if you can’t find it. In the beta, IReadOnlyList and IReadOnlyDictionary derive directly from IEnumerable and each interface defines its own Count property.

IEnumerable is covariant. That means if a method accepts an IEnumerable you can call it with an IEnumerable (assuming Circle is derived from Shape). This is useful in scenarios with type hierarchies and algorithms that can operate on specific types, such as an application that can draw various shapes. IEnumerable is sufficient for most scenarios that deal with collections of types, but sometimes you need more power than it provides:

  1. Materialization.IEnumerable does not allow you to express whether the collection is already available (“materialized”) or whether it’s computed every time you iterate over it (for example, if it represents a LINQ query). When an algorithm requires multiple iterations over the collection, this can result in performance degradation if computing the sequence is expensive; it can also cause subtle bugs because of identity mismatches when objects are being generated again on subsequent passes.
  2. Count.IEnumerable does not provide a count. In fact, it might not even have one, as it could be an infinite sequence. In many cases, the static extension method Enumerable.Count is good enough, though. First, it special-cases known collection types such as ICollection to avoid iterating the entire sequence; second, in many cases, computing the result is not expensive. However, depending on the size of the collection, this can make a difference.
  3. Indexing.IEnumerable doesn’t allow random access to the items. Some algorithms, such as quick sort, depend on being able to get to an item by its index. Again, there’s a static extension method (Enumerable.ElementAt) that uses an optimized code path if the given enumerable is backed by an IList. However, if indexing is used in a loop, a linear scan can have disastrous performance implications, because it will turn your nice O(n) algorithm into an O(n2) algorithm. So if you need random access, well, you really need random access.

So why not simply use ICollection/IList instead of IEnumerable? Because you lose covariance and because you can no longer differentiate between methods that only read the collections and methods that also modify the collections—something that becomes increasingly important when you use asynchronous programming or multithreading in general. In other words, you want to have your cake and eat it, too.

Enter IReadOnlyCollection and IReadOnlyList. IReadOnlyCollection is basically the same as IEnumerable but it adds a Count property. This allows creating algorithms that can express the need for materialized collections, or collections with a known, finite size. IReadOnlyList expands on that by adding an indexer. Both of these interfaces are covariant, which means if a method accepts an IReadOnlyList you can call it with a List:

class Shape { /*...*/ }
class Circle : Shape { /*...*/ }
void LayoutShapes(IReadOnlyList<Shape> shapes) { /*...*/ }
void LayoutCircles()
{
  List<Circle> circles = GetCircles();
  LayoutShapes(circles);
}

Unfortunately, our type system doesn’t allow making types of T covariant unless it has no methods that take T as an input. Therefore, we can’t add an IndexOf method to IReadOnlyList<T>. We believe this is a small sacrifice compared to not having support for covariance.

All our built-in collection implementations, such as arrays, List<T>, Collection<T> and ReadOnlyCollection<T>, also implement the read-only collection interfaces. Because any collection can now be treated as a read-only collection, algorithms are able to declare their intent more precisely without limiting their level of reuse—they can be used across all collection types. In the preceding example, the consumer of LayoutShapes could pass in a List<Circle>, but LayoutShapes would also accept an array of Circle or a Collection<Circle>.

Another benefit of these collection types is the great experience they provide for working with the Windows Runtime (WinRT). WinRT supplies its own collection types, such as Windows.Founda­tion.Collections.IIterable<T> and Windows.Foundation.Collec­tions.IVector<T>. The CLR metadata layer projects these directly to the corre­sponding BCL data types. For example, when consuming WinRT from the .NET Framework, IIterable<T> becomes IEnumerable<T> and IVector<T> becomes IList<T>. In fact, a developer using Visual Studio and IntelliSense couldn’t even tell that WinRT has different collection types. Because WinRT also provides read-only versions (such as IVectorView<T>), the new read-only interfaces complete the picture so all collection types can easily be shared between the .NET Framework and WinRT.

Different Notions of Read-Only Collections

  • Mutable (or Not Read-Only) The most common collection type in the .NET world. These are collections such as List<T> that allow reading, as well as adding, removing and changing items.
  • Read-Only These are collections that can’t be modified from the outside. However, this notion of collections doesn’t guarantee that its contents will never change. For example, a dictionary’s keys and values collections can’t be updated directly, but adding to the dictionary indirectly updates the keys and values collections.
  • Immutable These are collections that, once created, are guaranteed to never be changed. This is a nice property for multithreading. If a complicated data structure is fully immutable, it can always be safely passed to a background worker. You don’t need to worry about someone modifying it at the same time. This collection type is not provided by the Microsoft .NET Framework base class library (BCL) today.
  • Freezable These collections behave like mutable collections until they’re frozen. Then they behave like immutable collections. Although the BCL doesn’t define these collections, you can find them in Windows Presentation Foundation.

Andrew Arnott has written an excellent blog post that describes the different notions of collections in more detail (bit.ly/pDNNdM).

Support for .zip Archives

Another common request over the years was support for reading and writing regular .zip archives. The .NET Framework 3.0 and later supports reading and writing archives in the Open Packaging Convention (bit.ly/ddsfZ7). However, System.IO.Packaging was tailored to support that particular specification and generally can’t be used to process ordinary .zip archives.

This release adds first-class zipping support via System.IO.Com­pression.ZipArchive. In addition, we’ve fixed some long-standing issues with performance and compression quality in our DeflateStream implementation. Starting with the .NET Framework 4.5, the Deflate­Stream class uses the popular zlib library. As a result, it provides a better implementation of the deflate algorithm and, in most cases, a smaller compressed file than in earlier versions of the .NET Framework.

Extracting an entire archive on disk takes only a single line:

ZipFile.ExtractToDirectory(@"P:\files.zip", @"P:\files");

We’ve also made sure that typical operations don’t require reading the entire archive into memory. For example, extracting a single file from a large archive can be done like this:

using (ZipArchive zipArchive = 
  ZipFile.Open(@"P:\files.zip", ZipArchiveMode.Read))
{
  foreach (ZipArchiveEntry entry in zipArchive.Entries)
  {
    if (entry.Name == "file.txt")
    {
      using (Stream stream = entry.Open())
      {
        ProcessFile(stream);
      }
    }
  }     
}

In this case, the only part loaded into memory is the .zip archive’s table of contents. The file being extracted is fully streamed; that is, it does not require fitting into the memory. This enables processing arbitrarily large .zip archives, even if memory is limited.

Creating .zip archives works similarly. In order to create a .zip archive from a directory, you need only write a single line:

ZipFile.CreateFromDirectory(@"P:\files", @"P:\files.zip");

Of course, you can also manually construct a .zip archive, which gives you full control over the internal structure. The following code shows how to create a .zip archive where only C# source code files are added (in addition, the .zip archive now also has a subdirectory called SourceCode):

IEnumerable<string> files = 
  Directory.EnumerateFiles(@"P:\files", "*.cs");
using (ZipArchive zipArchive = 
  ZipFile.Open(@"P:\files.zip", ZipArchiveMode.Create))
{
  foreach (string file in files)
  {
    var entryName = Path.Combine("SourceCode", Path.GetFileName(file));
    zipArchive.CreateEntryFromFile(file, entryName);
  }
}

Using streams you can also construct .zip archives where the inputs are not backed by an actual file. The same is true for the .zip archive itself. For example, instead of the ZipFile.Open, you can use the ZipArchive constructor that takes a stream. You could use this, for example, to build a Web server that creates .zip archives on the fly from content stored in a database and that also writes the results directly to the response stream, instead of to the disk.

One detail about the API design is worth explaining a bit more. You’ve probably noticed that the static convenience methods are defined on the class ZipFile, whereas the actual instance has the type ZipArchive. Why is that?

Starting with the .NET Framework 4.5, we designed the APIs with portability in mind. Some .NET platforms don’t support file paths, such as .NET for Metro-style apps on Windows 8. On this platform, the file system access is brokered to enable a capability-based model. In order to read or write files, you can no longer use regular Win32 APIs; instead, you must use WinRT APIs.

As I showed you, .zip functionality itself doesn’t require file paths at all. Therefore, we split the functionality into two pieces:

  1. System.IO.Compression.dll. This assembly contains general-purpose .zip functionality. It doesn’t support file paths. The main class in this assembly is ZipArchive.
  2. System.IO.Compression.FileSystem.dll. This assem­bly provides a static class ZipFile that defines extension methods and static helpers. Those APIs augment the .zip functionality with convenience methods on .NET platforms that support file system paths.

To learn more about writing portable .NET code, have a look at the Portable Class Libraries page in the MSDN Library at bit.ly/z2r3eM.

Miscellaneous Improvements

Of course, there’s always more to say. After working on a release for so long, naming the most important features sometimes feels like being asked to name your favorite child. In the following, I’d like to highlight a few areas that are worth mentioning but didn’t quite get the space they deserve in this article:

AssemblyMetadataAttribute One thing we learned in the .NET Framework about attributes is that no matter how many you have, you still need more (by the way, the .NET Framework 4 already had more than 300 assembly-level attributes). AssemblyMetadataAttribute is a general-purpose assembly attribute that allows associating a string-based key-value pair with an assembly. This can be used to point to the homepage of the product or to the version control label that corresponds to the source the assembly was built from.

WeakReference<T> The existing non-generic WeakReference type has two issues: First, it forces consumers to cast whenever the target needs to be accessed. More importantly, it has a design flaw that makes using it prone to race conditions: it exposes one API to check whether the object is alive (IsAlive) and a separate API to get to the actual object (Target). WeakReference<T> fixes this issue by providing the single API TryGetTarget, which performs both operations in an atomic fashion.

Comparer<T>.Create(Comparison<T>) The BCL provides two ways to implement comparers of collections. One is via the interface IComparer<T>; the other is the delegate Comparison<T>. Converting an IComparer<T> to Comparison<T> is simple. Most languages provide an implicit conversion from a method to a delegate type, so you can easily construct the Comparison<T> from the IComparer<T> Compare method. However, the other direction required you to implement an IComparer<T> yourself. In the .NET Framework 4.5 we added a static Create method on Comparer<T> that, given a Comparison<T>, gives you an implementation for IComparer<T>.

ArraySegment<T> In the .NET Framework 2.0 we added the struct ArraySegment<T>, which lets you represent a subset of a given array without copying. Unfortunately, ArraySegment<T> did not implement any of the collection interfaces, which prevented you from passing it to methods that operate over the collection interfaces. In this release, we fixed that and ArraySegment<T> now implements IEnumerable, IEnumerable<T>, ICollection<T> and IList<T>, as well as IReadOnlyCollection<T> and IReadOnlyList<T>.

SemaphoreSlim.WaitAsync This is the only synchronization primitive that supports awaiting the lock to be taken. If you want to learn more about the rationale, read “What’s New for Parallelism in .NET 4.5 Beta” (bit.ly/AmAUIF).

ReadOnlyDictionary<TKey,TValue> Since the .NET Framework 2.0, the BCL has provided ReadOnlyCollection<T>, which serves as a read-only wrapper around a given collection instance. This enables implementers of object models to expose collections that the consumer can’t change. In this release, we added the same concept for dictionaries.

BinaryReader, BinaryWriter, StreamReader, StreamWriter: Option for not disposing the underlying stream The higher-level reader and writer classes all accept a stream instance in their constructors. In previous releases, this meant that the ownership of that stream passed on to the reader/writer instance, which implied that disposing the reader/writer also disposed the underlying stream. This is all nice and handy when you have only a single reader/writer, but it makes it harder in cases where you need to compose several different APIs that all operate on streams but need to use readers/writers as part of their implementation. The best you could do in previous releases was not dispose the reader/writer and leave a comment in the source that explained the issue (this approach also forced you to manually flush the writer to avoid data loss). The .NET Framework 4.5 allows you to express this contract by using a reader/writer constructor that takes the parameter leaveOpen where you can explicitly specify that the reader/writer must not dispose the underlying stream.

Console.IsInputRedirected, Console.IsOutputRedirected and Console.IsErrorRedirected Command-line programs support redirecting input and output. To most applications, this is transparent. However, sometimes you want a different behavior when redirection is active. For example, colored console output is not useful and setting the cursor position will not succeed. These properties allow querying whether any of the standard streams were redirected.

Console.OutputEncoding and Console.InputEncoding can now be set to Encoding.Unicode Setting Console.OutputEncoding to Encoding.Unicode allows the program to write characters that could not be represented in the OEM code page associated with the console. This feature also makes it easier to display text in multiple scripts on the console. More details will be available in the upcoming revised documentation in the MSDN Library for the Console class.

ExceptionDispatchInfo Error handling is an important aspect in building framework components. Sometimes re-throwing exceptions (in C# via “throw;”) is not sufficient, because it can only happen within an exception handler. Some framework components, such as the Task infrastructure, have to re-throw the exception at a later point (for example, after marshaling back to the originating thread). In the past, that meant the original stack trace and Windows Error Reporting (WER) classification (also known as Watson buckets) was lost, as throwing the same exception object again would simply overwrite this information. ExceptionDispatchInfo allows capturing an existing exception object and throwing it again without losing any of the valuable information recorded in the exception object.

Regex.Timeout Regular expressions are a great way to validate input. However, it’s not widely known that certain regular expressions can be incredibly expensive to compute when applied to specific text inputs; that is, they have exponential time complexity. This is especially problematic in server environments in which the actual regular expressions are subject to configuration. Because it’s hard, if not impossible, to predict the runtime behavior of a given regular expression, the most conservative approach for those cases is to apply a constraint on how long the Regex engine will try to match a given input. For this reason, Regex now has several APIs that accept a timeout: Regex.IsMatch, Regex.Match, Regex.Matches, Regex.Split and Regex.Replace.

Get Involved

The features discussed in this article are available in the Visual Studio 11 beta. It meets our high standards for pre-release software, so we support it in production environments. You can download the beta from bit.ly/9JWDT9. Because this is prerelease software, we’re interested in getting your feedback, whether you’ve hit any issues (connect.microsoft.com/visualstudio) or you’ve got ideas for new features (visualstudio.uservoice.com). I’d also suggest subscribing to my team’s blog at blogs.msdn.com/b/bclteam so you’re informed about any upcoming changes or announcements.


Immo Landwerth is a program manager on the CLR team at Microsoft, where he works on the Microsoft .NET Framework base class library (BCL), API design and portable class libraries. You can reach him via the BCL team blog at blogs.msdn.com/b/bclteam.

Thanks to the following technical experts for reviewing this article: Nicholas Blumhardt, Greg Paperin, Daniel Plaisted, Evgeny Roubinchtein, Alok Shriram, Chris Szurgot, Stephen Toub and Mircea Trofin