Click to Rate and Give Feedback
Related Articles
Here the author introduces SQL Server Data Services, which exposes its functionality over standard Web service interfaces.

By David Robinson (July 2008)
Here the author answers questions regarding the Entity Framework and provides an understanding of how and why it was developed.

By Elisa Flasko (July 2008)
Here we present techniques for programmatic and declarative data binding and display with Windows Presentation Foundation.

By Josh Smith (July 2008)
Systems that handle failure without losing data are elusive. Learn how to achieve systems that are both scalable and robust.

By Udi Dahan (July 2008)
More ...
Articles by this Author
In this month’s installment of .NET Matters, columnist Stephen Toub answers reader questions concerning asynchronous I/O .

By Stephen Toub (July 2008)
This month Stephen Toub discusses asynchronous stream processing.

By Stephen Toub (March 2008)
This month Stephen Toub explains how to make the most of dual processors when running encryption and compression tasks.

By Stephen Toub (February 2008)
The author creates a managed wrapper to use the new IFileOperations interface in Windows Vista from managed code.

By Stephen Toub (December 2007)
Find out how to use finalizers as a way to warn developers who use your custom types when they are garbage collected without having been disposed of correctly.

By Stephen Toub (November 2007)
This month Stephen Toub discusses deadlocks that can occur when synchronizing threads.

By Stephen Toub (October 2007)
Stephen Toub and Shawn Farkas discuss creating an adapter that takes the functionality of RNGCryptoServiceProvider and adapts it to the interface of Random.

By Stephen Toub and Shawn Farkas (September 2007)
Stephen Toub gets nostalgic as he prepares to leave MSDN Magazine.

By Stephen Toub (August 2007)
More ...
Popular Articles
Mike Volodarsky demonstrates the IIS 7.0 extensibility model by extending the Response Modification into a configurable Web server module and a custom management page for IIS Manager.

By Mike Volodarsky (Launch 2008)
Microsoft Robotics Studio is not just for playing with robots. It also allows you to build service-based applications for a wide range of hardware devices.

By Sara Morgan (June 2008)
Here we present techniques for programmatic and declarative data binding and display with Windows Presentation Foundation.

By Josh Smith (July 2008)
Learn how to automate custom SharePoint application deployments, use the SharePoint API, and avoid the hassle of custom site definitions.

By E. Wilansky, P. Olszewski, and R. Sneddon (May 2008)
More ...
Read the Blog
In the November issue of MSDN Magazine, Jeffrey Richter demonstrates some recent additions to the C# programming language that make working with the APM significantly easier. In the June ...
Read more!
The July 2008 issue of MSDN Magazine is now available online. Here's what's in the issue: Data Services: Develop ...
Read more!
The June 2008 issue features the first installment of a new MSDN Magazine column on software design fundamentals. We’ll discuss design patterns and principles in a manner that isn't bound to a specific tool or lifecycle methodology. In this issue, Jeremy Miller starts the Patterns in Practice column ...
Read more!
In the April 2008 issue of MSDN Magazine, Kenny Kerr introduced the Windows Imaging Component (WIC), showing you how you can use it to encode and decode different image ...
Read more!
A combination of the retained-mode graphics system and notification mechanisms such as dependency properties unleash the flexibility and power of Windows Presentation Foundation (WPF, allowing these objects to be targets of data bindings and animations. In the June 2008 issue of MSDN Magazine, Charles ...
Read more!
One problem with GUI programming in C++ is that most libraries are too low level, putting much of the burden on the programmer. In the June 2008 issue of MSDN Magazine, John Torjo introduces you eGUI++, a C++ library that gives you a ...
Read more!
More ...
.NET Matters
Deserialization Progress, and More
Stephen Toub

Code download available at: NETMatters2006_12.exe (161 KB)
Browse the Code Online

Q In my Windows® Forms application, I use a BinaryFormatter to save a large amount of application state, sometimes serializing to disk very large objects. At a later point, this state file needs to be loaded and deserialized by the application, and this deserialization process can take a measurable amount of time. Currently for both serialization and deserialization, I show a marquee progress bar, but that only provides an indication to the user that something is happening. For deserialization, I'd really like to show a standard progress bar that indicates how much of the deserialization has completed and how much still remains. BinaryFormatter doesn't provide any such feature, and I'm at a bit of a loss for how to extend the class to provide the behavior I need. Do you have any suggestions?
Q In my Windows® Forms application, I use a BinaryFormatter to save a large amount of application state, sometimes serializing to disk very large objects. At a later point, this state file needs to be loaded and deserialized by the application, and this deserialization process can take a measurable amount of time. Currently for both serialization and deserialization, I show a marquee progress bar, but that only provides an indication to the user that something is happening. For deserialization, I'd really like to show a standard progress bar that indicates how much of the deserialization has completed and how much still remains. BinaryFormatter doesn't provide any such feature, and I'm at a bit of a loss for how to extend the class to provide the behavior I need. Do you have any suggestions?
A BinaryFormatter is a difficult beast to extend, as it has a very simple interface. It's sealed with no virtual members, so you can't derive from it to override some key member like you might to add functionality to another class. And it doesn't expose any events you can hook to listen for progress notifications and the like. Instead, you'll need to get a bit creative.
A BinaryFormatter is a difficult beast to extend, as it has a very simple interface. It's sealed with no virtual members, so you can't derive from it to override some key member like you might to add functionality to another class. And it doesn't expose any events you can hook to listen for progress notifications and the like. Instead, you'll need to get a bit creative.
Getting quickly to the punch line, BinaryFormatter consumes the stream to be deserialized as is needed to parse it; this means that if we can figure out how much of the stream the BinaryFormatter has read, we can create a good approximation for how much of the deserialization process has been completed.
Figure 1 shows a first attempt at creating a utility method that allows you to track the progress of deserialization. The generic method accepts the stream to be deserialized and a callback delegate that will be invoked by the method to alert the caller of progress updates. Internally, it creates a BinaryFormatter and deserializes the stream, but before doing so, it first creates a Timer set to expire each second. Thus, after every second the Timer will invoke the callback delegate on a background thread, examining the current position of the stream and the length of the stream, and using that information to supply the callback with a number representing the percentage of deserialization completed. Unfortunately, as you can probably glean from the figure title, this method is rife with bugs. Most importantly, instance methods on Stream are not guaranteed to be thread-safe, so we shouldn't be accessing the Position and Length properties from a secondary thread as we're doing here. Additionally, there are several race conditions that will cause exceptions to be thrown, such as if deserialization finishes and the stream is closed just as the Timer's callback is being raised (you can't access the properties of a disposed FileStream). In short, this is a bad solution.
A much better solution involves the creation of a custom decorator stream. I discussed the decorator design pattern and its application to streams in my September 2005 column, but in short, a decorator is an object that has the same interface as another object it contains (in object-oriented terms, it is an object that has both an "is-a" and a "has-a" relationship with a specific type). So, for example, System.IO.Compression.GZipStream is a decorator, as it both derives from Stream and contains a Stream. A user constructs a GZipStream by supplying to its constructor a Stream that should be compressed or decompressed, and calls made to GZipStream's Stream implementation delegate to the underlying Stream's implementation, doing additional work before and after.
Instead of supplying to BinaryFormatter the actual Stream to be deserialized, we can supply to it a surrogate decorator Stream that contains the Stream to be deserialized. This container stream can implement its Read method to retrieve the data from the contained Stream, but also to raise an event that contains information about the current position and length of the stream.
Whenever I set out to implement a new decorator stream, I typically start with a helper base class, like the one shown in Figure 2. This ContainerStream class provides a constructor that accepts and stores the Stream to be contained, a protected property exposing to derived types the underlying Stream, and then implements all of the methods and properties on Stream to delegate to the contained Stream. This is useful since most of the methods and properties on Stream are abstract, and this helper lets you avoid implementing every method in your own derived type if all you're interested in doing is implementing one or two specific methods.
My ReadProgressStream implementation is shown in Figure 3. It derives from ContainerStream and overrides the Read method. The Read override calls to the base class's Read method (which in turn calls the contained Stream's Read), and then if anyone has registered with ReadProgressStream's ProgressChanged event, it raises the event with the current percentage of progress made reading through the stream (to improve performance, it only raises the event if the integral percentage has changed in value since the last call to Read).
With this class in place, tracking the progress of deserialization is straightforward, and can be done simply by passing a ReadProgressStream rather than the actual stream to be deserialized to the BinaryFormatter:
public static T Deserialize<T>(
  Stream stream, ProgressChangedEventHandler callback)
{
  using (ReadProgressStream rps = new ReadProgressStream(stream))
  {
    rps.ProgressChanged += callback;
    BinaryFormatter formatter = new BinaryFormatter();
    return (T)formatter.Deserialize(rps);
  }
}
This works, accomplishing the goal of being able to generally track the progress of deserialization. Unfortunately, as the Deserialize<T> method is currently written, it has a major performance problem. The BinaryFormatter makes many, many requests to the Stream's Read method, frequently for only a few bytes at a time. While we haven't added much code to the Read method, it's enough to create a significant bottleneck.
To improve performance, we need a way to throttle the number of times the ReadProgressStream's Read method is actually called. To solve this, it's important to realize that this problem isn't new. (Very few problems actually are, and life gets a lot easier if one accepts that most problems have already been solved one way or another.)
For example, FileStream is typically used to read and write data from disk, and disk is relatively slow. If every request to a FileStream required disk access, applications that relied on reading and writing files with FileStream would slow to a crawl. Instead, FileStream buffers data from the disk, reading more than it needs to on small read requests such that future requests for that data may be satisfied out of the buffer cache rather than requiring additional accesses to the disk.
That same solution of buffering data can be used to fix our performance problem. We want to limit the number of times ReadProgressStream's Read method is actually invoked, and one way to do that is by forcing the caller to make several large requests rather than many small requests. And doing that requires adding just one more line of code:
public static T Deserialize<T>(
  Stream stream, ProgressChangedEventHandler callback)
{
  using (ReadProgressStream rps = new ReadProgressStream(stream))
  using (BufferedStream bs = new BufferedStream(rps))
  {
    rps.ProgressChanged += callback;
    BinaryFormatter formatter = new BinaryFormatter();
    return (T)formatter.Deserialize(bs);
  }
}
Now instead of the BinaryFormatter directly accessing the Read ProgressStream, it goes through a System.IO.BufferedStream. This dramatically improves the performance of the solution to the point where it's actually usable. However, it introduces a new minor issue, albeit one we can overcome with a few more lines of code. The constructor for BufferedStream used in the previous example defaults the buffer size to 4KB. This means if we're dealing with a relatively small stream, we're not going to make very many calls to the underlying ReadProgressStream's Read method, and we won't get many progress notifications. Given that it's a small stream, and thus one that should deserialize very quickly, this might not be an issue for you. If, however, it is a problem and you need more frequent updates, you can change the size of the buffer using a second constructor on BufferedStream that accepts the buffer size to use in addition to the stream to wrap. An easy way to determine the buffer size is to take the length of the stream and divide it by a hundred (since the ProgressChangedEventArgs uses an integral value for percentage completed, thus resulting in typical values falling between 0 and 100, inclusive), but for extremely large streams, you might end up with huge buffer sizes. To counteract that, I've chosen to go with whatever's smaller, 1/100th of the stream's length, or 4096 bytes, and my solution that includes that calculation is shown in Figure 4. You should of course test this formula in your own scenarios, and make changes if this is an inappropriate choice for your system.

Q I'm using the SortedList<TKey, TValue> collection from the System.Collection.Generics namespace, and the performance is not what I'd expect. What sort of algorithms are used under the covers for this collection?
Q I'm using the SortedList<TKey, TValue> collection from the System.Collection.Generics namespace, and the performance is not what I'd expect. What sort of algorithms are used under the covers for this collection?
A Like List<T>, SortedList<TKey, TValue> wraps an underlying array. (SortedList actually wraps two arrays, one for keys and one for values). When you add an item to SortedList, it uses a binary search to find the correct place to insert an item into array, ensures that the array is large enough to hold the existing items plus the new item, shifts the existing items to make room for the new item, and adds the new item to the array in the correct location. Depending on where you're inserting into the list, that shift operation can be expensive. Take the following two loops as an example. The first inserts the numbers from 1 to 10,000 into the list, whereas the second inserts the same numbers but in the opposite order, starting with 10,000 and going down to 1:
for (int i = 1; i <= 10000; ++i) list.Add(i, i); // loop #1

for (int i = 10000; i >= 1; --i) list.Add(i, i); // loop #2
A Like List<T>, SortedList<TKey, TValue> wraps an underlying array. (SortedList actually wraps two arrays, one for keys and one for values). When you add an item to SortedList, it uses a binary search to find the correct place to insert an item into array, ensures that the array is large enough to hold the existing items plus the new item, shifts the existing items to make room for the new item, and adds the new item to the array in the correct location. Depending on where you're inserting into the list, that shift operation can be expensive. Take the following two loops as an example. The first inserts the numbers from 1 to 10,000 into the list, whereas the second inserts the same numbers but in the opposite order, starting with 10,000 and going down to 1:
for (int i = 1; i <= 10000; ++i) list.Add(i, i); // loop #1

for (int i = 10000; i >= 1; --i) list.Add(i, i); // loop #2
On my machine, the first loop is more than 30 times faster than the second, and that gap will continue to grow as the number of elements increases. What's happening here is that in the first loop I'm always adding to the end of the array, which means no elements need to be shifted on an insertion. In the second loop, I'm always adding to the beginning, which means all elements need to be shifted on every insertion. This degrades the algorithm to an O(n2) algorithm, the worst case complexity for an insertion sort of this nature.
The moral of the story is that you need to pick your data structures carefully based on the scenarios in which you plan to use them. Moreover, once you've chosen a data structure, make sure you're using it in a manner that provides optimal performance. As I've just shown, simply changing the order in which you perform certain operations can have a significant impact.

Q I need to convert a standard file path to an 8.3 file path. Does any support for this exist in the .NET Framework?
Q I need to convert a standard file path to an 8.3 file path. Does any support for this exist in the .NET Framework?
A Sure, but only in the sense that through P/Invoke you have access to the entire suite of Win32® APIs. This conversion can be accomplished easily using the GetShortPathName function exposed from kernel32.dll, as shown in Figure 5. It accepts three parameters: the string containing the path to be converted, an output memory buffer to contain the converted string, and an integer containing the size of that buffer in characters. If GetShortPathName is called with a buffer that's too small to receive the converted string (or, more specifically, if the value in cchBuffer is too small), GetShortPathName will return the size of the buffer required to store the string.
A Sure, but only in the sense that through P/Invoke you have access to the entire suite of Win32® APIs. This conversion can be accomplished easily using the GetShortPathName function exposed from kernel32.dll, as shown in Figure 5. It accepts three parameters: the string containing the path to be converted, an output memory buffer to contain the converted string, and an integer containing the size of that buffer in characters. If GetShortPathName is called with a buffer that's too small to receive the converted string (or, more specifically, if the value in cchBuffer is too small), GetShortPathName will return the size of the buffer required to store the string.
You can take advantage of this by making two calls to GetShortPathName; the first passes in a buffer size of 0 in order to force GetShortPathName to return the needed buffer size. A second call to GetShortPathName is then made with the appropriately sized buffer. When calling through P/Invoke Win32 functions that expect a pointer to a string buffer to contain the output value, you typically use a StringBuffer, preallocated through its Capacity property with a buffer big enough to hold the output string. After the P/Invoke, the result can be retrieved easily using StringBuffer's ToString method.

Send your questions and comments for Stephen to netqa@microsoft.com.


Stephen Toub is the Technical Editor for MSDN Magazine.

Page view tracker