February 2013

Volume 28 Number 02

The Working Programmer - .NET Collections, Part 2: Working with C5

By Ted Neward | February 2013

Ted NewardWelcome back.

In the first part of this series, I looked briefly at the Copenhagen Comprehensive Collection Classes for C# (C5) library, a set of classes designed to supplement (if not replace) the System.Collections classes that ship with the Microsoft .NET Framework runtime library. There’s a fair amount of overlap between the two libraries, partly because C5 wants to follow many of the same idioms that the .NET Framework Class Library (FCL) uses, and partly because there are only so many ways one can reasonably represent a particular collection type. (It’s hard to imagine an indexed collection—such as a Dictionary or a List—not supporting the language syntax for indexed properties: the “[]” operator in C# and the “()” operator in Visual Basic.) Where the FCL collections are utilitarian, however, the C5 collections go a step or two beyond that, and that’s where we want to spend our time.

(Note that there’s also very likely some performance differences between the two libraries that proponents or critics of each will be quick to point out—the C5 collection manual discusses some of the performance implications, for example. That said, I eschew most performance benchmarks on the grounds that, generally, all a benchmark proves is that for one particular case or set of cases, somebody got one of the two to run faster than the other, which doesn’t really say whether that will hold true for all cases between the two. This doesn’t mean all benchmarks are useless, just that the context matters to the benchmark. Readers are strongly encouraged to take their own particular scenarios, turn them into a benchmark and have a shootout between the two, just to see if there’s a marked difference in those particular cases.)

Implementations

First of all, let’s take a quick look at the different collection implementations that C5 provides. Again, as we discussed last time, developers using C5 shouldn’t generally worry about the implementation in use except when deciding which implementation to create—the rest of the time, the collection should be referenced by interface type. (For a description of the interface types, see the previous column in the series at msdn.microsoft.com/magazine/jj883961, or the C5 documentation at bit.ly/UcOcZH.) Here are the implementations:

  • CircularQueue<T> implements both IQueue<T> and IStack<T> to provide either the first-in-first-out semantics of IQueue<T> (via Enqueue and Dequeue) or the last-in-first-out semantics of IStack<T> (via Push and Pop), backed by a linked list. It grows in capacity as needed.
  • ArrayList<T> implements IList<T>, IStack<T> and IQueue<T>, backed by an array.
  • LinkedList<T> implements IList<T>, IStack<T> and IQueue<T>, using a doubly linked list of nodes.
  • HashedArrayList<T> implements IList<T>, backed by an array, but also maintains a hash table internally to efficiently find the position of an item within the list. Also, it doesn’t allow duplicates in the list (because duplicates would screw up the hash table lookup).
  • HashedLinkedList<T> implements IList<T>, backed by a linked list, and like its array-backed cousin, it uses an internal hash table to optimize lookups.
  • WrappedArray<T> implements IList<T>, wrapping around a single-dimensional array. The advantage of this class is that it simply “decorates” the array, making it far faster to obtain C5 functionality, as opposed to copying the elements out of the array and into an ArrayList<T>.
  • SortedArray<T> implements IIndexedSorted<T>, which means the collection can be indexed as well as sorted—we’ll get to this in a second. It keeps its items sorted and doesn’t allow duplicates.
  • TreeSet<T> implements IIndexedSorted<T> and IPersistedSorted<T> and is backed by a balanced red-black tree, which is great for insertion, removal and sorting. Like all sets, it doesn’t allow duplicates.
  • TreeBag<T> implements IIndexedSorted<T> and IPersistedSorted<T>, is backed by a balanced red-black tree, but is essentially a “bag” (sometimes called a “multiset”), meaning it allows duplicates.
  • HashSet<T> implements IExtensible<T>, and backs the set (meaning no duplicates) by a hash table with linear chaining. This means lookups will be fast, modifications less so.
  • HashBag<T> implements IExtensible<T>, and backs the bag (meaning duplicates are allowed) by a hash table with linear chaining, again making lookups fast.
  • IntervalHeap<T> implements IPriorityQueue<T>, using an interval heap stored as an array of pairs, making it efficient to pull from either the “max” or “min” end of the priority queue.

There are a few more implementations, and the C5 manual and docs have more details if you’re interested. However, aside from the performance implications, the critical thing to know is which implementations implement which interfaces, so that you can have a good idea of each when it’s time to choose one to instantiate. (You can always switch it around to a different implementation later, assuming you follow the C5 design guideline of always referencing the collections by the interfaces rather than their implementation types.)

Functionality

If C5 were just a larger collection of collection implementations, it would be interesting, but probably not enough to warrant significant interest or discussion. Fortunately, it offers a few new features to developers that deserve discussion.

Views One of the interesting little tidbits of the C5 library is the notion of “views”: subcollections of elements from the source collection that are, in fact, not copies but backed by the original collection. This was actually what the code from the previous column did, in the exploration test. See Figure 1 for how to create views on a collection.

Figure 1 Creating Views on a Collection

[TestMethod]
public void GettingStarted()
{
  IList<String> names = new ArrayList<String>();
  names.AddAll(new String[]
    { "Hoover", "Roosevelt", "Truman", "Eisenhower", "Kennedy" });
  // Print item 1 ("Roosevelt") in the list
  Assert.AreEqual("Roosevelt", names[1]);
  Console.WriteLine(names[1]);
  // Create a list view comprising post-WW2 presidents
  IList<String> postWWII = names.View(2, 3);
  // Print item 2 ("Kennedy") in the view
  Assert.AreEqual("Kennedy", postWWII[2]);
}

The view is backed by the original list, which means that if the original list changes for whatever reason, the view on it is also affected. See Figure 2 to see how views are potentially mutable.

Figure 2 Views Are Potentially Mutable

[TestMethod]
public void ViewExploration()
{
  IList<String> names = new ArrayList<String>();
  names.AddAll(new String[]
    { "Washington", "Adams", "Jefferson",
      "Hoover", "Roosevelt", "Truman",
      "Eisenhower", "Kennedy" });
  IList<String> postWWII = names.View(4, names.Count - 4);
  Assert.AreEqual(postWWII.Count, 4);
  IList<String> preWWII = names.View(0, 5);
  Assert.AreEqual(preWWII.Count, 5);
  Assert.AreEqual("Washington", preWWII[0]);
  names.Insert(3, "Jackson");
  Assert.AreEqual("Jackson", names[3]);
  Assert.AreEqual("Jackson", preWWII[3]);
}

As this test shows, changing the underlying list (“names”) means that the views defined on it (in this case, the “preWWII” view) also find their contents changing, so that now the first element in the view is “Washington,” instead of “Hoover.”

However, when possible, C5 will preserve the sanctity of the view; so, for example, if the insertion occurs at the front of the collection (where C5 can insert it without changing the contents of the “preWWII” view), then the view’s contents remain unchanged:

[TestMethod]
public void ViewUnchangingExploration()
{
  IList<String> names = new ArrayList<String>();
  names.AddAll(new String[]
    { "Hoover", "Roosevelt", "Truman", "Eisenhower", "Kennedy" });
  IList<String> preWWII = names.View(0, 2);
  Assert.AreEqual(preWWII.Count, 2);
  names.InsertFirst("Jackson");
  Assert.AreEqual("Jackson", names[0]);
  Assert.AreEqual("Hoover", preWWII[0]);
}

Immutable (Guarded) Collections With the rise of functional concepts and programming styles, a lot of emphasis has swung to immutable data and immutable objects, largely because immutable objects offer a lot of benefits vis-à-vis concurrency and parallel programming, but also because many developers find immutable objects easier to understand and reason about. Corollary to that concept, then, follows the concept of immutable collections—the idea that regardless of whether the objects inside the collection are immutable, the collection itself is fixed and unable to change (add or remove) the elements in the collection. (Note: You can see a preview of immutable collections released on NuGet in the MSDN Base Class Library (BCL) blog at bit.ly/12AXD78.)

Within C5, immutable collections are handled by instantiating “wrapper” collections around the collection containing the data of interest; these collections are “Guarded” collections and are used in classic Decorator pattern style:

public void ViewImmutableExploration()
{
  IList<String> names = new ArrayList<String>();
  names.AddAll(new String[]
    { "Hoover", "Roosevelt", "Truman", "Eisenhower", "Kennedy" });
  names = new GuardedList<String>(names);
  IList<String> preWWII = names.View(0, 2);
  Assert.AreEqual("Hoover", preWWII[0]);
  names.InsertFirst("Washington");
  Assert.AreEqual("Washington", names[0]);
}

If anyone tries to write code that adds or removes elements from the list, C5 quickly disabuses said developer of the idea: An exception is thrown as soon as any of the “modifying” methods (Add, Insert, InsertFirst and so on) are called.

This offers a pretty powerful opportunity, by the way. In the previous column, I mentioned that one of the key design points that went into C5 is the idea that collections should only be used through interfaces. Assuming developers using C5 carry that design idea forward, it now becomes really simple to ensure that a collection is never modified by a method to which it is passed (see Figure 3).

Figure 3 Guarded (Immutable) Collections

public void IWannaBePresidentToo(IList<String> presidents)
{
  presidents.Add("Neward");
}
[TestMethod]
public void NeverModifiedCollection()
{
  IList<String> names = new ArrayList<String>();
  names.AddAll(new String[]
    { "Hoover", "Roosevelt", "Truman","Eisenhower", "Kennedy" });
  try
  {
    IWannaBePresidentToo(new GuardedList<String>(names));
  }
  catch (Exception x)
  {
    // This is expected! Should be a ReadOnlyException
  }
  Assert.IsFalse(names.Contains("Neward"));
 }

Again, when the IWannaBePresidentToo method tries to modify the collection passed in (which, arguably, is bad design on the part of the programmer who wrote it, but unfortunately there’s a lot of that kind of code out there), an exception is thrown.

By the way, should you prefer that the collection not throw an exception and instead silently fail the modification (which I think is too subtle, but some developers may need that functionality), it’s relatively easy to put together your own version of Guarded­Array<T> that doesn’t throw.

Events Sometimes, modifications to collections are, in fact, what you want to allow—only you want to know when a collection is modified. Granted, you could spin up a thread and have it spin indefinitely over the collection, comparing the contents to the previous iteration’s contents, but not only is this a horrible waste of CPU resources, but it’s a pain to write and maintain, making it probably the worst possible design solution—and certainly a sight poorer than simply using a collection that supports events natively. All collections in C5 offer the ability to hang delegates off the collection, to be invoked when certain operations take place against the collection (see Figure 4).

Figure 4 “When I Become President ...”

[TestMethod]
public void InaugurationDay()
{
  IList<String> names = new ArrayList<String>();
  names.AddAll(new String[]
    { "Hoover", "Roosevelt", "Truman", "Eisenhower", "Kennedy" });
  names.ItemsAdded +=
    delegate (Object c, ItemCountEventArgs<string> args)
  {
    testContextInstance.WriteLine(
      "Happy Inauguration Day, {0}!", args.Item);
  };
  names.Add("Neward");
  Assert.IsTrue(names.Contains("Neward"));
}

Of course, the event handler can be written as a lambda; it’s just a little more descriptive to show you the actual argument types. The first argument is—as is canon—the collection itself.

Just a NuGet Away

No part of C5 couldn’t be built around the .NET FCL (aside from the emphasis on interfaces, which the FCL supports, but doesn’t really endorse that strongly, it seems), but the nice thing about C5 is that it’s done, it’s tested and it’s just a NuGet “Install-Package” away.

Happy coding!


Ted Neward is a principal with Neward & Associates LLC. He has written more than 100 articles and authored and co-authored a dozen books, including “Professional F# 2.0” (Wrox, 2010). He is an F# MVP and noted Java expert, and speaks at both Java and .NET conferences around the world. He consults and mentors regularly—reach him at ted@tedneward.com or Ted.Neward@neudesic.com if you’re interested in having him come work with your team. He blogs at blogs.tedneward.com and can be followed on Twitter at twitter.com/tedneward.

Thanks to the following technical expert for reviewing this article: Immo Landwerth