span.sup { vertical-align:text-top; }

.NET Matters

Asynchronous I/O with WebClient

Stephen Toub

Q I really like the static helper methods on the System.IO.File class for reading and writing data: ReadAllText, ReadAllBytes, WriteAllText, WriteAllBytes, and so on. However, these methods are synchronous, and I'd like to be able to use them asynchronously such that under the covers they're also using asynchronous I/O. I know the System.IO.FileStream class supports asynchronous I/O; does File?

A File's methods only support synchronous operation. But the functionality to implement asynchronous methods like the ones you describe certainly exists. In this column, I'll walk through two ways to do it, starting with the more complicated, but more efficient, of the two.

First, I'll begin by defining the APIs I actually want to implement. The original signatures you cite look like this:

public static byte [] ReadAllBytes(string path);
public static string ReadAllText(string path);

public static void WriteAllBytes(string path, byte [] bytes);
public static void WriteAllText(string path, string contents);

I want asynchronous versions of these, and I'll define them as:

public static void ReadAllBytesAsync(
    string path, Action<byte[]> success, Action<Exception> failure);
public static void ReadAllTextAsync(
    string path, Action<string> success, Action<Exception> failure);

public static void WriteAllBytesAsync(
    string path, byte[] bytes, 
    Action success, Action<Exception> failure);
public static void WriteAllTextAsync(
    string path, string contents, 
    Action success, Action<Exception> failure);

These signatures are very similar to the originals. Instead of returning data synchronously, the Read* methods accept two delegates, one for successful execution and one for exceptional execution, where the delegate for the former is passed the read data and where the delegate for the latter is passed any resulting exception. The Write* methods also accept two delegates, but the success delegate is parameterless, since there's no expected output (the original Write* methods return void).

The more complicated approach involves implementing these directly with the Asynchronous Programming Model (APM) pattern exposed by the System.IO.Stream class, from which FileStream derives. In my column in the March 2008 issue of MSDN® Magazine (msdn.microsoft.com/magazine/cc337900), I demonstrated implementing a method CopyStreamToStream, which asynchronously copies from one Stream to another Stream using each Stream's APM implementation; for reference, that implementation is shown in Figure 1, and I'm going to reuse it here. (The implementation is slightly simplified and changed from the last time you saw it. This included removing the usage of System.Component­Model.Async­OperationManager and AsyncOperation, preferring to move their usage into the higher level APIs being implemented in this column. I'll discuss why in a moment.)

Figure 1 Asynchronous Stream Copying

public static void CopyStreamToStream(
    Stream source, Stream destination, Action<Exception> completed)
{
    byte[] buffer = new byte[0x1000];
    if (completed == null) completed = delegate {};

    AsyncCallback rc = null;
    rc = readResult =>
    {
        try
        {
            int read = source.EndRead(readResult);
            if (read > 0)
            {
                destination.BeginWrite(buffer, 0, read, writeResult =>
                {
                    try
                    {
                        destination.EndWrite(writeResult);
                        source.BeginRead(
                            buffer, 0, buffer.Length, rc, null);
                    }
                    catch (Exception exc) { completed(exc); }
                }, null);
            }
            else completed(null);
        }
        catch (Exception exc) { completed(exc); }
    };

    source.BeginRead(buffer, 0, buffer.Length, rc, null);
}

That implementation represents the most difficult aspect of implementing these asynchronous methods. On top of Copy­StreamToStream, I'll implement two helper methods, one for reading bytes from a file asynchronously and one for writing bytes to a file asynchronously. They do the bulk of the work (see Figure 2).

Figure 2 Asynchronous Helper Methods

private static void ReadAllBytesAsyncInternal(string path, 
    Action<byte[]> success, Action<Exception> failure)
{
    var input = new FileStream(path, FileMode.Open, 
        FileAccess.Read, FileShare.Read, 0x1000, true);
    var output = new MemoryStream((int)input.Length);
    CopyStreamToStream(input, output, e =>
    {
        byte [] bytes = e == null ? output.GetBuffer() : null;
        output.Close();
        input.Close();

        if (e != null) failure(e);
        else success(bytes);
    });
}

private static void WriteAllBytesAsyncInternal(
    string path, byte[] bytes, 
    Action success, Action<Exception> failure)
{
    var input = new MemoryStream(bytes);
    var output = new FileStream(path, FileMode.Create, 
        FileAccess.Write, FileShare.None, 0x1000, true);
    CopyStreamToStream(input, output, e =>
    {
        input.Close();
        output.Close();

        if (e != null) failure(e);
        else success();
    });
}

ReadAllBytesAsyncInternal creates an input File­Stream to work with the underlying stream asynchronously and a MemoryStream to store the bytes read in from the file. Then the CopyStream­ToStream method is used to copy all of the data asynchronously from the FileStream to the Memory­Stream. When the operation is complete, the streams are closed. The failure delegate is called if an exception was thrown; if not, the success delegate is called, provided with the data read into the MemoryStream from the file.

WriteAllBytesAsyncInternal is very similar. Here, an input MemoryStream is created that wraps the provided byte array, and an output FileStream is created, again one that supports asynchronous I/O. As with ReadAllBytesAsyncInternal, upon completion the streams are closed, and the failure delegate is called if an exception occurred.

Implementing each of the public signatures shown earlier now requires just a few additional lines on top of the methods I created inFigures 1 and 2. These are shown in Figure 3.

Figure 3 Implementing the Public Methods

public static void ReadAllBytesAsync(
    string path, Action<byte[]> success, Action<Exception>
        failure)
{
    AsyncOperation asyncOp = 
        AsyncOperationManager.CreateOperation(null);
    ReadAllBytesAsyncInternal(path,
        bytes => asyncOp.Post(delegate { success(bytes); }, null),
        exception => asyncOp.Post(
            delegate { failure(exception); }, null));
}

public static void ReadAllTextAsync(
    string path, Action<string> success, Action<Exception>
        failure)
{
    AsyncOperation asyncOp = 
        AsyncOperationManager.CreateOperation(null);
    ReadAllBytesAsyncInternal(path,
        bytes => {
            string text;
            using (var ms = new MemoryStream(bytes)) 
                text = new StreamReader(ms).ReadToEnd();
            asyncOp.Post(delegate { success(text); }, null);
        },
        exception => asyncOp.Post(
            delegate { failure(exception); }, null));
}

public static void WriteAllBytesAsync(
    string path, byte[] bytes, Action success, Action<Exception>
       failure)
{
    AsyncOperation asyncOp = 
        AsyncOperationManager.CreateOperation(null);
    WriteAllBytesAsyncInternal(path, bytes, 
        () => asyncOp.Post(delegate {success(); }, null),
        exception => asyncOp.Post(
            delegate { failure(exception); }, null));
}

public static void WriteAllTextAsync(
    string path, string contents, Action success, Action<Exception>
       failure)
{
    AsyncOperation asyncOp = 
        AsyncOperationManager.CreateOperation(null);
    ThreadPool.QueueUserWorkItem(delegate
    {
        var bytes = Encoding.UTF8.GetBytes(contents);
        WriteAllBytesAsyncInternal(path, bytes,
            () => asyncOp.Post(delegate {success(); }, null),
            exception => asyncOp.Post(
                delegate { failure(exception); }, null));
    });
}

For the most part, each of these methods is a simple wrapper around the internal implementations shown in Figure 2. However, there are some interesting subtleties as well. First, I mentioned earlier that I was stripping the AsyncOperationManager support out of CopyStreamToStream. AsyncOperation is itself a wrapper around System.Threading.SynchronizationContext, and it uses the underlying SynchronizationContext captured in the call to Async­OperationManager.CreateOperation to Post delegates back in a manner appropriate to the synchronization context current at the time of creation. For example, on the UI thread in a Windows® Forms application, SynchronizationContext.Current will likely return a Win­dowsFormsSynchronizationContext that supports marshaling delegate calls back to the UI thread. Thus, if AsyncOperation­Manager.CreateOperation is called on the UI thread, using the resulting AsyncOperation's Post method will result in the provided delegate being marshaled to and executed in the UI thread.

However, I want to minimize the amount of work that's done on the UI thread. Consider now the ReadAllTextAsync method. When I'm finished loading the data from the file, I want to convert the read bytes into a string (this would be more efficient if I had a writeable StringStream class and could pass an instance of that to CopyStreamToStream rather than passing a MemoryStream). The conversion could incur more of a cost than I'd like on the UI thread. Thus, I want to do the conversion before posting back to the UI. But in the original implementation of CopyStreamToStream, the completion delegate was run under the captured Synchronization­Context, and thus I'd already be on the UI thread by the time I wanted to do the conversion. Instead, I've pulled the Async­Operation work into each of these outer methods; thus I can delay calling Post until I'm done with the real computational work.

Another interesting implementation detail exists in WriteAllText­Async. WriteAllTextAsync doesn't return until the asynchronous operation has been kicked off. But before I can call WriteAllBytes­AsyncInternal to do so, I need to convert the provided text string into a byte array. So, rather than block the calling thread, I queue a work item to the ThreadPool. That work item does the conversion from string to bytes, and then starts the internal copy.

In the end, this isn't a horrible amount of code, but it's also not simple. One would hope that these common asynchronous patterns for reading and writing file data would already exist somewhere in the Microsoft® .NET Framework, especially when you consider that there are other things you'd likely want to add on top of this, such as progress notifications as the data is being read or written. In fact, such functionality does already exist, but in an unlikely place: System.Net.

The System.Net.WebClient class is incredibly handy for a variety of purposes. Given its name and placement in the System.Net namespace, it's not surprising that most folks think of it as being only for Web-related activities: downloading files from an HTTP server, uploading files to an FTP site, and the like. But WebClient is nicely abstracted on top of WebRequest and WebResponse, which support a pluggable factory model for creating concrete implementations of each of these types. Calling WebRequest.Create with an HTTP URL will return an instance of an HttpWebRequest, just as calling it with an FTP URL will return an instance of an FtpWeb­Request. WebClient uses WebRequest and WebResponse internally to implement a whole slew of useful functionality. Here's a sampling of some of the more relevant methods:

public void DownloadDataAsync(Uri address);
public event DownloadDataCompletedEventHandler DownloadDataCompleted;

public void DownloadStringAsync(Uri address);
public event 
    DownloadStringCompletedEventHandler DownloadStringCompleted;

public void UploadDataAsync(Uri address, byte[] data);
public event UploadDataCompletedEventHandler UploadDataCompleted;

public void UploadStringAsync(Uri address, string data);
public event UploadStringCompletedEventHandler UploadStringCompleted;

So now if you call Download­DataAsync providing it with the URL of the data to be downloaded, it will do so asynchronously; when it completes it'll raise the DownloadDataCompleted event. Similarly, call UploadString­Async providing it with a text string and the location to which the string should be uploaded, and it will do so asynchronously; when it completes it will raise the Upload­StringCompleted event. Nice and easy.

Now, the cool part is that one of the WebRequest providers built into the .NET Framework is FileWebRequest (and a corresponding FileWebResponse). With this, I can write code like:

WebRequest wr = Webrequest.Create(@"file://C:\test.txt");

This code gives me a WebRequest I can work with like any other, except this WebRequest is targeting a file on disk rather than some file on a Web site somewhere. Combine that with WebClient's usage of WebRequest, and I'm sure you see where I'm headed: I can use WebClient to read and write files asynchronously!

To implement the read and write methods I did earlier requires only a few lines of code per method, as shown in Figure 4 (in fact, at this point, I question whether it's even worth having these wrappers, since most of the code is just for adapting the made-up API signatures to those provided by WebClient for the same tasks). WebClient internally uses AsyncOperationManager to ensure the callbacks happen in the right context, and FileWeb­Re­quest/FileWebResponse will use asynchronous I/O when the asynchronous methods on WebClient are being used (WebClient also provides synchronous versions of these methods).

Figure 4 Using WebClient to Read and Write Files

public static void ReadAllBytesAsync(
    string path, Action<byte[]> success, Action<Exception> failure)
{
    var wc = new WebClient();
    wc.DownloadDataCompleted += (sender, e) =>
    {
        if (e.Error != null) failure(e.Error);
        else success(e.Result);
    };
    wc.DownloadDataAsync(new Uri("file://" + path));
}

public static void ReadAllTextAsync(
    string path, Action<string> success, Action<Exception> failure)
{
    var wc = new WebClient();
    wc.DownloadStringCompleted += (sender, e) =>
    {
        if (e.Error != null) failure(e.Error);
        else success(e.Result);
    };
    wc.DownloadStringAsync(new Uri("file://" + path));
}

public static void WriteAllBytesAsync(
    string path, byte [] bytes, 
    Action success, Action<Exception> failure)
{
    var wc = new WebClient();
    wc.UploadDataCompleted += (sender, e) =>
    {
        if (e.Error != null) failure(e.Error);
        else success();
    };
    wc.UploadDataAsync(new Uri("file://" + path), bytes);
}

public static void WriteAllTextAsync(
    string path, string contents, 
    Action success, Action<Exception> failure)
{
    var wc = new WebClient();
    wc.UploadStringCompleted += (sender, e) =>
    {
        if (e.Error != null) failure(e.Error);
        else success();
    };
    wc.UploadStringAsync(new Uri("file://" + path), contents);
}

What's even cooler is that WebClient provides useful additional functionality on top of this; specifically, it already has built-in support for progress tracking. It provides two relevant events:

public event 
    DownloadProgressChangedEventHandler DownloadProgressChanged;
public event 
    UploadProgressChangedEventHandler UploadProgressChanged;

These are raised when there are progress updates to provide with regard to an in-process download or upload, respectively. And the event arguments provided to the handlers for these events provide useful information for progress reporting. For example, here's the DownloadProgressChangedEventArgs type:

public class DownloadProgressChangedEventArgs : 
    ProgressChangedEventArgs
{
    public long BytesReceived { get; }
    public long TotalBytesToReceive { get; }

    /* from the base ProgressChangedEventArgs type
    public int ProgressPercentage { get; }
    public object UserState { get; }
    */
}

All of this lets me integrate asynchronous I/O very nicely into my GUI applications. Consider a Windows Forms application that needs to load a large file from disk into memory before working with it. I can load it asynchronously, with progress notifications that update a progress bar, and with completion notification:

private void button1_Click(object sender, EventArgs e) {
    WebClient wc = new WebClient();
    wc.DownloadDataCompleted += (s, de) => {
        _fileData = de.Result;
        MessageBox.Show("File loaded");
    };
    wc.DownloadProgressChanged += 
        (s, de) => progressBar1.Value = de.ProgressPercentage;
    wc.DownloadDataAsync(new Uri(@"file://c:\largeFile.dat"));
}
private byte [] _fileData;

Unfortunately, while the APIs are quite slick, there's a fairly significant problem here. While FileWebRequest/FileWebResponse do use asynchronous I/O under the covers, for whatever reason the implementation also relies on the worker ThreadPool for callback notifications; sometimes it blocks those threads from the pool waiting for the asynchronous I/O to complete. (Whether this is a code defect or just a bad performance problem is a matter of debate.) In the end, however, it means that if you're trying to use this WebClient technique to read or write a large number of files asynchronously and concurrently, you'll likely find your performance severely hampered by the number of threads in the ThreadPool.

Blocking threads in the ThreadPool is typically a bad idea, as it will force the ThreadPool to inject new threads, which not only consumes additional resources but also slows down the application as the ThreadPool throttles injection with time delays. All in all, while the WebClient approach for reading and writing files asynchronously may be good for a few files at a time, until FileWeb­Request/FileWebResponse are modified to fix this issue, you're probably better off with the hand-coded versions shown earlier in this column. Another solution is to provide your own implementations of WebRequest/WebResponse that do the right thing, and then register those with WebClient.

Send your questions and comments for Stephen to netqa@microsoft.com.

Stephen Toub is a Senior Program Manager Lead on the Parallel Computing Platform team at Microsoft. He is also a Contributing Editor for MSDN Magazine.