October 2012

Volume 27 Number 10

Windows with C++ - Back to the Future with Resumable Functions

By Kenny Kerr | October 2012

Kenny KerrI concluded my last column (msdn.microsoft.com/magazine/jj618294) by highlighting some possible improvements to C++11 futures and promises that would transform them from being largely academic and simplistic to practical and useful for the construction of efficient and composable asynchronous systems. In large part, this was inspired by the work of Niklas Gustafsson and Artur Laksberg from the Visual C++ team.

As a representation of the future of futures, I gave an example along these lines:

 

int main()
{
  uint8 b[1024];
  auto f = storage_read(b, sizeof(b), 0).then([&]()
  {
    return storage_write(b, sizeof(b), 1024);
  });
  f.wait();
}

Both the storage_read and storage_write functions return a future representing the respective I/O operations that might complete at some point in the future. These functions model some storage subsystem with 1KB pages. The program as a whole reads the first page from storage to the buffer and then copies it back to the second page of storage. The novelty of this example is in the use of the hypothetical “then” method added to the future class, allowing the read and write operations to be composed into a single logical I/O operation, which can then be seamlessly waited upon.

This is a huge improvement over the world of stack ripping that I described in my last column, yet in itself is still not quite the utopian dream of a coroutine-like facility supported by the language that I described in my August 2012 column, “Lightweight Cooperative Multitasking with C++” (msdn.microsoft.com/magazine/jj553509). In that column I successfully demonstrated how such a facility can be achieved with some dramatic macro trickery—but not without significant drawbacks, primarily related to the inability to use local variables. This month I want to share some thoughts on how this might be achieved in the C++ language itself.

I necessarily began this series of articles exploring alternative techniques for achieving concurrency with a practical solution because the reality is that we need solutions that work today. We do, however, need to look to the future and push the C++ community forward by demanding greater support for writing I/O-intensive applications in a more natural and productive manner. Surely writing highly scalable systems shouldn’t be the exclusive purview of JavaScript and C# programmers and the rare C++ programmer with enough willpower. Also, keep in mind that this isn’t just about convenience and elegance in programming syntax and style. The ability to have multiple I/O requests active at any given time has the potential to improve performance dramatically. Storage and network drivers are designed to scale well as more I/O requests are in flight. In the case of storage drivers, the requests can be combined to improve hardware buffering and reduce seek times. In the case of network drivers, more requests mean larger network packets, optimized sliding window operations and more.

I’m going to switch gears slightly to illustrate how quickly complexity rears its ugly head. Rather than simply reading and writing to and from a storage device, how about serving up a file’s contents over a network connection? As before, I’m going to start with a synchronous approach and work from there. Computers might be fundamentally asynchronous, but we mere mortals certainly are not. I don’t know about you, but I’ve never been much of a multitasker. Consider the following classes:

class file { uint32 read(void * b, uint32 s); };
class net { void write(void * b, uint32 s); };

Use your imagination to fill out the rest. I just need a file class that allows a certain number of bytes to be read from some file. I’ll further assume that the file object will keep track of the offset. Similarly, the net class might model a TCP stream where the data offset is handled by TCP via its sliding window implementation that’s necessarily hidden from the caller. For a variety of reasons, perhaps related to caching or contention, the file read method might not always return the number of bytes actually requested. It will, however, only return zero when the end of the file has been reached. The net write method is simpler because the TCP implementation, by design, thankfully does a huge amount of work to keep this simple for the caller. This is a basic imaginary scenario but pretty representative of OS I/O. I can now write the following simple program:

int main()
{
  file f = ...; net n = ...; uint8 b[4096];
  while (auto actual = f.read(b, sizeof(b)))
  {
    n.write(b, actual);
  }
}

Given a 10KB file, you can imagine the following sequence of events before the loop is exhausted:

read 4096 bytes -> write 4096 bytes ->
read 4096 bytes -> write 4096 bytes ->
read 2048 bytes -> write 2048 bytes ->
read 0 bytes

Like the synchronous example in my last column, it’s not hard to figure out what’s going on here, thanks to the sequential nature of C++. Making the switch to asynchronous composition is a bit more difficult. The first step is to transform the file and net classes to return futures:

class file { future<uint32> read(void * b, uint32 s); };
class net { future<void> write(void * b, uint32 s); };

That was the easy part. Rewriting the main function to take advantage of any asynchrony in these methods presents a few challenges. It’s no longer enough to use the future’s hypothetical “then” method, because I’m no longer simply dealing with sequential composition. Yes, it’s true that a write follows a read, but only if the read actually reads something. To complicate matters further, a read also follows a write in all cases. You might be tempted to think in terms of closures, but that concept covers the composition of state and behavior and not the composition of behavior with other behavior.

I could start by creating closures only for the read and write operations:

auto read = [&]() { return f.read(b, sizeof(b)); };
auto write = [&](uint32 actual) { n.write(b, actual); };

Of course, this doesn’t quite work because the future’s then method doesn’t know what to pass to the write function:

read().then(write);

To address this, I need some kind of convention that will allow futures to forward state. An obvious choice (perhaps) is to  forward the future itself. The then method will then expect an expression taking a future parameter of the appropriate type, allowing me to write this:

auto read = [&]() { return f.read(b, sizeof(b)); };
auto write = [&](future<uint32> previous) { n.write(b, 
  previous.get()); };
read().then(write);
future<void> do_while(function<future<bool>()> body)
{
  auto done = make_shared<promise<void>>();
  iteration(body, done);
  return done->get_future();  
}

The do_while function first creates a reference-counted promise whose ultimate future signals the termination of the loop. This is passed to the iteration function along with the function representing the body of the loop:

void iteration(function<future<bool>()> body, 
  shared_ptr<promise<void>> done)
{
  body().then([=](future<bool> previous)
  {
    if (previous.get()) { iteration(body, done); }
    else { done->set_value(); }
  });
}

This iteration function is the heart of the do_while algorithm, providing the chaining from one invocation to the next, as well as the ability to break out and signal completion. Although it might look recursive, remember that the whole point is to separate the asynchronous operations from the stack, and thus the loop doesn’t actually grow the stack. Using the do_while algorithm is relatively easy, and I can now write the program shown in Figure 1.

Figure 1 Using a do_while Algorithm

int main()
{
  file f = ...; net n = ...; uint8 b[4096];
  auto loop = do_while([&]()
  {
    return f.read(b, sizeof(b)).then([&](future<uint32> previous)
    {
      return n.write(b, previous.get());
    }).then([&]()
    {
      promise<bool> p;
      p.set_value(!f.eof);
      return p.get_future();
    });
  });
  loop.wait();
}

The do_while function naturally returns a future, and in this case it’s waited upon, but this could just as easily have been avoided by storing the main function’s local variables on the heap with shared_ptrs. Inside the lambda expression passed to the do_while function, the read operation begins, followed by the write operation. To keep this example simple, I assume that write will return immediately if it’s told to write zero bytes. When the write operation completes, I check the file’s end-of-file status and return a future providing the loop condition value. This ensures that the body of the loop will repeat until the file’s content is exhausted.

Although this code isn’t particularly obnoxious—and, indeed, is arguably a lot cleaner than stack ripping—a little support from the language would go a long way. Niklas Gustafsson has already proposed such a design and called it “resumable functions.” Building on the improvements proposed for futures and promises and adding a little syntactic sugar, I could write a resumable function to encapsulate the surprisingly complex asynchronous operation as follows:

future<void> file_to_net(shared_ptr<file> f, 
  shared_ptr<net> n) resumable
{
  uint8 b[4096];
  while (auto actual = await f->read(b, sizeof(b)))
  {
    await n->write(b, actual);
  }
}

The beauty of this design is that the code has a striking resemblance to the original synchronous version, and that’s what I’m looking for, after all. Notice the “resumable” contextual keyword following the function’s parameter list. This is analogous to the hypothetical “async” keyword I described in my August 2012 column. Unlike what I demonstrated in that column, however, this would be implemented by the compiler itself. Thus there would be no complications and limitations such as those I confronted with the macro implementation. You could use switch statements and local variables—and constructors and destructors would work as expected—but your functions would now be able to pause and resume in a manner similar to what I prototyped with macros. Not only that, but you’d be freed from the pitfalls of capturing local variables only to have them go out of scope, a common mistake when using lambda expressions. The compiler would take care of providing storage for local variables inside resumable functions on the heap.

In the earlier example you’ll also notice the “await” keyword preceding the read and write method calls. This keyword defines a resumption point and expects an expression resulting in a future-like object that it can use to determine whether to pause and resume later or whether to simply continue execution if the asynchronous operation happened to complete synchronously. Obviously, to achieve the best performance, I need to handle the all-too-common scenario of asynchronous operations completing synchronously, perhaps due to caching or fast-fail scenarios.

Notice that I said the await keyword expects a future-like object. Strictly speaking, there’s no reason it needs to be an actual future object. It only needs to provide the necessary behavior to support the detection of asynchronous completion and signaling. This is analogous to the way templates work today. This future-like object would need to support the then method I illustrated in my last column as well as the existing get method. To improve performance in cases where the result is immediately available, the proposed try_get and is_done methods would also be useful. Of course, the compiler can optimize based on the availability of such methods.

This isn’t as far-fetched as it might seem. C# already has a nearly identical facility in the form of async methods, the moral equivalent of resumable functions. It even provides an await keyword that works in the same way as I’ve illustrated. My hope is that the C++ community will embrace resumable functions, or something like them, so that we’ll all be able to write efficient and composable asynchronous systems naturally and easily.

For a detailed analysis of resumable functions, including a look at how they might be implemented, please read Niklas Gustafsson’s paper, “Resumable Functions,” at bit.ly/zvPr0a.


Kenny Kerr is a software craftsman with a passion for native Windows development. Reach him at kennykerr.ca.

Thanks to the following technical expert for reviewing this article: Artur Laksberg