June 2018

Volume 33 Number 6

[C++]

Effective Async with Coroutines and C++/WinRT

By Kenny Kerr | June 2018

The Windows Runtime has a relatively simple async model in the sense that, like everything else in the Windows Runtime, it’s focused on allowing components to expose async methods and making it simple for apps to call those async methods. It doesn’t in itself provide a concurrency runtime or even anything in the way of building blocks for producing or consuming async methods. Instead, all of that is left to the individual language projections. This is as it should be and isn’t meant to trivialize the Windows Runtime async pattern. It’s no small feat to implement this pattern correctly. Of course, it also means that a developer’s perception of async in the Windows Runtime is very heavily influenced by the developer’s language of choice. A developer who has only ever used C++/CX might, for example, wrongly but understandably assume that async is a hot mess.

The ideal concurrency framework for the C# developer will be different from the ideal concurrency library for the C++ developer. The role of the language, and libraries in the case of C++, is to take care of the mechanics of the async pattern and provide a natural bridge to a language-specific implementation.

Coroutines are the preferred abstraction for both implementing and calling async methods in C++, but let’s first make sure we understand how the async model works. Consider a class with a single static method that looks something like this:

struct Sample
{
  Sample() = delete;
  static Windows::Foundation::IAsyncAction CopyAsync();
};

Async methods end with “Async” by convention, so you might think of this as the Copy async method. There might be a blocking or synchronous alternative that’s simply called Copy. It’s conceivable that a caller might want a blocking Copy method for use by a background thread and a non-blocking, or asynchronous, method for use by a UI thread that can’t afford to block for fear of appearing unresponsive.

At first, the CopyAsync method might seem quite simple to call. I could write the following C++ code:

IAsyncAction async = Sample::CopyAsync();

As you might imagine, the resulting IAsyncAction isn’t actually the ultimate result of the async method, even though it’s the result of calling the CopyAsync method in a traditional procedural manner. The IAsyncAction is the object that a caller might use to wait upon the result synchronously or asynchronously, depending on the situation. Along with IAsyncAction, there are three other well-known interfaces that follow a similar pattern and offer different features for the callee to communicate information back to the caller. The table in Figure 1 provides a comparison of the four async interfaces.

Figure 1 A Comparison of Async Interfaces

Name Result Progress
IAsyncAction No No
IAsyncActionWithProgress No Yes
IAsyncOperation Yes No
IAsyncOperationWithProgress Yes Yes

In C++ terms, the interfaces can be expressed as shown in Figure 2.

Figure 2 The Async Interfaces Expressed in C++ Terms

namespace Windows::Foundation
{
  struct IAsyncAction;
  template <typename Progress>
  struct IAsyncActionWithProgress;
  template <typename Result>
  struct IAsyncOperation;
  template <typename Result, typename Progress>
  struct IAsyncOperationWithProgress;
}

IAsyncAction and IAsyncActionWithProgress can be waited upon to determine when the async method completes, but these interfaces don’t offer any observable result or return value directly. IAsyncOperation and IAsyncOperationWithProgress, on the other hand, expect the Result type parameter to indicate the type of result that can be expected when the async method completes successfully. Finally, IAsyncActionWithProgress and IAsyncOperationWithProgress expect the Progress type parameter to indicate the type of progress information that can be expected periodically for long-running operations up until the async method completes.

There are a few ways to wait upon the result of an async method. I won’t describe them all here as that would turn this into a very long article. While there are a variety of ways to handle async completion, there are only two that I recommend: the async.get method, which performs a blocking wait, and the co_await async expression, which performs a cooperative wait in the context of a coroutine. Neither is better than the other as they simply serve different purposes. Let’s look now at how to do a blocking wait.

As I mentioned, a blocking wait can be achieved using the get method as follows:

IAsyncAction async = Sample::CopyAsync();
async.get();

There’s seldom any value in holding on to the async object and the following form is thus preferred:

Sample::CopyAsync().get();

It’s important to keep in mind that the get method will block the calling thread until the async method completes. As such, it’s not appropriate to use the get method on a UI thread because it may cause the app to become unresponsive. An assertion will fire in unoptimized builds if you attempt to do so. The get method is ideal for console apps or background threads where, for whatever reason, you may not want to use a coroutine.

Once the async method completes, the get method will return any result directly to the caller. In the case of IAsyncAction and IAsyncActionWithProgress, the return type is void. That might be useful for an async method that initiates a file-copy operation, but less so for something like an async method that reads the contents of a file. Let’s add another async method to the example:

struct Sample
{
  Sample() = delete;
  static Windows::Foundation::IAsyncAction CopyAsync();
  static Windows::Foundation::IAsyncOperation<hstring> ReadAsync();
};

In the case of ReadAsync, the get method will properly forward the hstring result to the caller once the operation completes:

Sample::CopyAsync().get();
hstring result = Sample::ReadAsync().get();

Assuming execution returns from the get method, the resulting string will hold whatever value was returned by the async method upon its successful completion. Execution may not return, for example, if an error occurred.

The get method is limited in the sense that it can’t be used from a UI thread, nor does it exploit the full potential of the machine’s concurrency, since it holds the calling thread hostage until the async method completes. Using a coroutine allows the async method to complete without holding such a precious resource captive for some indeterminate amount of time.

Handling Async Completion

Now that you have a handle on async interfaces in general, let’s begin to drill down into how they work in a bit more detail. Assuming you’re not satisfied with the blocking wait provided by the get method, what other options are there? We’ll soon switch gears and focus entirely on coroutines but, for the moment, let’s take a closer look at those async interfaces to see what they offer. Both the coroutine support, as well as the get method I looked at previously, rely on the contract and state machine implied by those interfaces. I won’t go into too much detail because you really don’t need to know all that much about these, but let’s explore the basics so they’ll at least be familiar if you do ever have to dive in and use them directly for something more out of the ordinary.

All four of the async interfaces logically derive from the IAsyncInfo interface. There’s very little you can do with IAsyncInfo and it’s regrettable that it even exists because it adds a bit of overhead. The only IAsyncInfo members you should really consider are Status, which can tell you whether the async method has completed, and Cancel, which can be used to request cancellation of a long-running operation whose result is no longer needed. I nitpick this design because I really like the async pattern in general and just wish it were perfect because it’s so very close.

The Status member can be useful if you need to determine whether an async method has completed without actually waiting for it. Here’s an example:

auto async = ReadAsync();
if (async.Status() == AsyncStatus::Completed)
{
  auto result = async.GetResults();
  printf("%ls\n", result.c_str());
}

Each of the four async interfaces, not IAsyncInfo itself, provides individual versions of the GetResults method that should be called only after you’ve determined that the async method has completed. Don’t confuse this with the get method provided by C++/WinRT. While GetResults is implemented by the async method itself, get is implemented by C++/WinRT. GetResults won’t block if the async method is still running and will likely throw an hresult_illegal_method_call exception if called prematurely. You can, no doubt, begin to imagine how the blocking get method is implemented. Conceptually, it looks something like this:

auto get() const
{
  if (Status() != AsyncStatus::Completed)
  {
    // Wait for completion somehow ...
  }
  return GetResults();
}

The actual implementation is a bit more complicated, but this captures the gist of it. The point here is that GetResults is called regardless of whether it’s an IAsyncOperation, which returns a value, or IAsyncAction, which doesn’t. The reason for this is that GetResults is responsible for propagating any error that may have occurred within the implementation of the async method and will rethrow an exception as needed.

The question that remains is how the caller can wait for completion. I’m going to write a non-member get function to show you what’s involved. I’ll start with this basic outline, inspired by the previous conceptual get method:

template <typename T>
auto get(T const& async)
{
  if (async.Status() != AsyncStatus::Completed)
  {
    // Wait for completion somehow ...
  }
  return async.GetResults();
}

I want this function template to work with all four of the async interfaces, so I’ll use the return statement unilaterally. Special provision is made in the C++ language for genericity and you can be thankful for that.

Each of the four async interfaces provides a unique Completed member that can be used to register a callback—called a delegate—that will be called when the async method completes. In most cases, C++/WinRT will automatically create the delegate for you. All you need to do is provide some function-like handler, and a lambda is usually the simplest:

async.Completed([](auto&& async, AsyncStatus status)
{
  // It's done!
});

The type of the delegate’s first parameter will be that of the async interface that just completed, but keep in mind that completion should be regarded as a simple signal. In other words, don’t stuff a bunch of code inside the Completed handler. Essentially, you should regard it as a noexcept handler because the async method won’t itself know what to do with any failure occurring inside this handler. So what can you do?

Well, you could simply notify a waiting thread using an event. Figure 3 shows what the get function might look like.

Figure 3 Notifying a Waiting Thread Using an Event

template <typename T>
auto get(T const& async)
{
  if (async.Status() != AsyncStatus::Completed)
  {
    handle signal = CreateEvent(nullptr, true, false, nullptr);
    async.Completed([&](auto&&, auto&&)
    {
      SetEvent(signal.get());
    });
    WaitForSingleObject(signal.get(), INFINITE);
  }
  return async.GetResults();
}

C++/WinRT’s get methods use a condition variable with a slim reader/writer lock because it’s slightly more efficient. Such a variant might look something like what’s shown in Figure 4.

Figure 4 Using a Condition Variable with a Slim Reader/Writer Lock

template <typename T>
auto get(T const& async)
{
  if (async.Status() != AsyncStatus::Completed)
  {
    slim_mutex m;
    slim_condition_variable cv;
    bool completed = false;
    async.Completed([&](auto&&, auto&&)
    {
      {
        slim_lock_guard const guard(m);
        completed = true;
      }
      cv.notify_one();
    });
    slim_lock_guard guard(m);
    cv.wait(m, [&] { return completed; });
  }
  return async.GetResults();
}

You can, of course, use the C++ standard library’s mutex and condition variable if you prefer. The point here is simply that the Completed handler is your hook to wiring up async completion and it can be done quite generically.

Naturally, there’s no reason for you to write your own get function, and more than likely coroutines will be much simpler and more versatile in general. Still, I hope this helps you appreciate some of the power and flexibility in the Windows Runtime.

Producing Async Objects

Now that we’ve explored the async interfaces and some completion mechanics in general, let’s turn our attention to creating or producing implementations of those four async interfaces. As you’ve already learned, implementing WinRT interfaces with C++/WinRT is very simple. I might, for example, implement IAsyncAction, as shown in Figure 5.

Figure 5 Implementing IAsyncAction

struct MyAsync : implements<MyAsync, IAsyncAction, IAsyncInfo>
{
  // IAsyncInfo members ...
  uint32_t Id() const;
  AsyncStatus Status() const;
  HRESULT ErrorCode() const;
  void Cancel() const;
  void Close() const;
  // IAsyncAction members ...
  void Completed(AsyncActionCompletedHandler const& handler) const;
  AsyncActionCompletedHandler Completed() const;
  void GetResults() const;
};

The difficulty comes when you consider how you might implement those methods. While it’s not hard to imagine some implementation, it’s almost impossible to do this correctly without first reverse-engineering how the existing language projections actually implement them. You see, the WinRT async pattern only works if everyone implements these interfaces using a very specific state machine and in exactly the same way. Each language projection makes the same assumptions about how this state machine is implemented and if you happen to implement it in a slightly different way, bad things will happen.

Thankfully, you don’t have to worry about this because each language projection, with the exception of C++/CX, already implements this correctly for you. Here’s a complete implementation of IAsyncAction thanks to C++/WinRT’s coroutine support:

IAsyncAction CopyAsync()
{
  co_return;
}

Now this isn’t a particularly interesting implementation, but it is very educational and a good example of just how much C++/WinRT is doing for you. Because this is a complete implementation, we can use it to exercise some of what we’ve learned thus far. The earlier CopyAsync function is a coroutine. The coroutine’s return type is used to stitch together an implementation of both IAsyncAction and IAsyncInfo, and the C++ compiler brings it to life at just the right moment. We’ll explore some of those details later, but for now let’s observe how this coroutine works. Consider the following console app:

IAsyncAction CopyAsync()
{
  co_return;
}
int main()
{
  IAsyncAction async = CopyAsync();
  async.get();
}

The main function calls the CopyAsync function, which returns an IasyncAction. If you forget for a moment what the CopyAsync function’s body or definition looks like, it should be evident that it’s just a function that returns an IAsyncAction object. You can therefore use it in all the ways that you’ve already learned.

A coroutine (of this sort) must have a co_return statement or a co_await statement. It may, of course, have multiple such statements, but must have at least one of these in order to actually be a coroutine. As you might expect, a co_return statement doesn’t introduce any kind of suspension or asynchrony. Therefore, this CopyAsync function produces an IAsyncAction that completes immediately or synchronously. I can illustrate this as follows:

IAsyncAction Async()
{
  co_return;
}
int main()
{
  IAsyncAction async = Async();
  assert(async.Status() == AsyncStatus::Completed);
}

The assertion is guaranteed to be true. There’s no race here. Because CopyAsync is just a function, the caller is blocked until it returns and the first opportunity for it to return happens to be the co_return statement. What this means is that if you have some async contract that you need to implement, but the implementation doesn’t actually need to introduce any asynchrony, it can simply return the value directly, and without blocking or introducing a context switch. Consider a function that downloads and then returns a cached value, as shown in Figure 6.

Figure 6 A Function That Downloads and Returns a Cached Value

hstring m_cache;
IAsyncOperation<hstring> ReadAsync()
{
  if (m_cache.empty())
  {
    // Download and cache value ...
  }
  co_return m_cache;
}
int main()
{
  hstring message = ReadAsync().get();
  printf("%ls\n", message.c_str());
}

The first time ReadAsync is called, the cache is likely empty and the result is downloaded. Presumably this will suspend the coroutine itself while this takes place. Suspension implies that execution returns to the caller. The caller is handed an async object that has not, in fact, completed, hence the need to somehow wait for completion.

The beauty of coroutines is that there’s a single abstraction both for producing async objects and for consuming those same async objects. An API or component author might implement an async method as described earlier, but an API consumer or app developer could also use coroutines to call and wait for its completion. Let’s now rewrite the main function from Figure 6 to use a coroutine to do the waiting:

IAsyncAction MainAsync()
{
  hstring result = co_await ReadAsync();
  printf("%ls\n", result.c_str());
}
int main()
{
  MainAsync().get();
}

I have essentially taken the body of the old main function and moved it into the MainAsync coroutine. The main function uses the get method to prevent the app from terminating while the coroutine completes asynchronously. The MainAsync function has something new and that’s the co_await statement. Rather than using the get method to block the calling thread until ReadAsync completes, the co_await statement is used to wait for the ReadAsync function to complete in a cooperative or non-blocking manner. This is what I meant by a suspension point. The co_await statement represents a suspension point. This app only calls ReadAsync once, but you can imagine it being called multiple times in a more interesting app. The first time it gets called, the MainAsync coroutine will actually suspend and return control to its caller. The second time it’s called, it won’t suspend at all but rather return the value directly.

Coroutines are very new to many C++ developers, so don’t feel bad if this still seems rather magical. These concepts will become quite clear as you start writing and debugging coroutines yourself. The good news is that you already know enough to begin to make effective use of coroutines to consume async APIs provided by Windows. For example, you should be able to reason about how the console app in Figure 7 works.

Figure 7 An Example Console App with Coroutines

#include "winrt/Windows.Web.Syndication.h"
using namespace winrt;
using namespace Windows::Foundation;
using namespace Windows::Web::Syndication;
IAsyncAction MainAsync()
{
  Uri uri(L"https://kennykerr.ca/feed");
  SyndicationClient client;
  SyndicationFeed feed = co_await client.RetrieveFeedAsync(uri);
  for (auto&& item : feed.Items())
  {
    hstring title = item.Title().Text();
    printf("%ls\n", title.c_str());
  }
}
int main()
{
  init_apartment();
  MainAsync().get();
}

Give it a try right now and see just how much fun it is to use modern C++ on Windows.

Coroutines and the Thread Pool

Creating a basic coroutine is trivial. You can very easily co_await some other async action or operation, simply co_return a value, or craft some combination of the two. Here’s a coroutine that’s not asynchronous at all:

IAsyncOperation<int> return_123()
{
  co_return 123;
}

Even though it executes synchronously, it still produces a completely valid implementation of the IAsyncOperation interface:

int main()
{
  int result = return_123().get();
  assert(result == 123);
}

Here’s one that will wait for five seconds before returning the value:

using namespace std::chrono;
IAsyncOperation<int> return_123_after_5s()
{
  co_await 5s;
  co_return 123;
}

The next one is ostensibly going to execute asynchronously and yet the main function remains largely unchanged, thanks to the get function’s blocking behavior:

int main()
{
  int result = return_123_after_5s().get();
  assert(result == 123);
}

The co_return statement in the last coroutine will execute on the Windows thread pool, because the co_await expression is a chrono duration that uses a thread pool timer. The co_await statement represents a suspension point and it should be apparent that a coroutine may resume on a completely different thread following suspension. You can also make this explicit using resume_background:

IAsyncOperation<int> background_123()
{
  co_await resume_background();
  co_return 123;
}

There’s no apparent delay this time, but the coroutine is guaranteed to resume on the thread pool. What if you’re not sure? You might have a cached value and only want to introduce a context switch if the value must be retrieved from latent storage. This is where it’s good to remember that a coroutine is also a function, so all the normal rules apply:

IAsyncOperation<int> background_123()
{
  static std::atomic<int> result{0};
  if (result == 0)
  {
    co_await resume_background();
    result = 123;
  }
  co_return result;
}

This is only conditionally going to introduce concurrency. Multiple threads could conceivably race in and call background_123, causing a few of them to resume on the thread pool, but eventually the atomic variable will be primed and the coroutine will begin to complete synchronously. That is, of course, the worst case.

Let’s imagine the value may only be read from storage once a signal is raised, indicating that the value is ready. We can use the two coroutines in Figure 8 to pull this off.

Figure 8 Reading a Value from Storage After a Signal Is Raised

handle m_signal{ CreateEvent(nullptr, true, false, nullptr) };
std::atomic<int> m_value{ 0 };
IAsyncAction prepare_result()
{
  co_await 5s;
  m_value = 123;
  SetEvent(m_signal.get());
}
IAsyncOperation<int> return_on_signal()
{
  co_await resume_on_signal(m_signal.get());
  co_return m_value;
}

The first coroutine artificially waits for five seconds, sets the value, and then signals the Win32 event. The second coroutine waits for the event to become signaled, and then simply returns the value. Once again, the thread pool is used to wait for the event, leading to an efficient and scalable implementation. Coordinating the two coroutines is straightforward:

int main()
{
  prepare_result();
  int result = return_on_signal().get();
  assert(result == 123);
}

The main function kicks off the first coroutine but doesn’t block waiting for its completion. The second coroutine immediately begins waiting for the value, blocking as it does so.

Coroutines and the Calling Context

Thus far, I’ve focused on the thread pool, or what might be called background threads. C++/WinRT loves the Windows thread pool, but invariably you need to get work back onto a foreground thread representing some user interaction. Let’s look at ways to take precise control over the execution context.

Because coroutines can be used to introduce concurrency or deal with latency in other APIs, some confusion may arise as to the execution context of a given coroutine at any particular point in time. Let’s clear a few things up.

Figure 9 shows a simple function that will print out some basic information about the calling thread.

Figure 9 Printing Out Basic Information About the Calling Thread

void print_context()
{
  printf("thread:%d apartment:", GetCurrentThreadId());
  APTTYPE type;
  APTTYPEQUALIFIER qualifier;
  HRESULT const result = CoGetApartmentType(&type, &qualifier);
  if (result == S_OK)
  {
    puts(type == APTTYPE_MTA ? "MTA" : "STA");
  }
  else
  {
    puts("N/A");
  }
}

This is by no means an exhaustive or foolproof dump of apartment information, but it’s good enough for our purposes today. For the COM nerds out there, N/A is “not applicable” and not the other NA you’re thinking of. Recall that there are two primary apartment models. A process has at most one multi-threaded apartment (MTA) and may have any number of single-threaded apartments (STA). Apartments are an unfortunate reality designed to accommodate COM’s remoting architecture but have traditionally been used to support COM objects that weren’t thread safe.

The STA is used by COM to ensure that objects are only ever called from the thread on which they’re created. Naturally, this implies some mechanism to marshal calls from other threads back on to the apartment thread. STAs typically use a message loop or dispatcher queue for this. The MTA is used by COM to indicate that no thread affinity exists, but that marshaling is required if a call originates in some other apartment. The relationship between objects and threads is complex, so I’ll save that for another day. The ideal scenario is when an object proclaims that it’s agile and thus free from apartment affinity.

Let’s use the print_context function to write a few interesting programs. Here’s one that calls print_context before and after using resume_background to move work on to a background (thread pool) thread:

IAsyncAction Async()
{
  print_context();
  co_await resume_background();
  print_context();
}

Consider also the following caller:

int main()
{
  init_apartment();
  Async().get();
}

The caller is important because it determines the original or calling context for the coroutine (or any function). In this case, init_apartment is called from main without any arguments. This means that the app’s primary thread will join the MTA. Thus, it doesn’t require a message loop or dispatcher of any kind. It also means that the thread can happily block execution as I’ve done here by using the blocking get function to wait for the coroutine to complete. Since the calling thread is an MTA thread, the coroutine begins execution on the MTA. The resume_background co_await expression is used to suspend the coroutine momentarily so that the calling thread is released back to the caller and the coroutine itself is free to resume as soon as a thread is available from the thread pool so that the coroutine may continue to execute concurrently. Once on the thread pool (otherwise known as a background thread), print_context is once again called. Here’s what you might see if you were to run this program:

thread:18924 apartment:MTA
thread:9568 apartment:MTA

The thread identifiers don’t matter. What matters is that they’re unique. Notice that the thread temporarily provided by the thread pool is also an MTA thread. How can this be if I didn’t call init_apartment on that thread? If you step into the print_context function, you’ll notice that the APTTYPEQUALIFIER distinguishes between these threads and identifies the thread association as being implicit. You can safely assume that a thread pool thread is an MTA thread in practice, provided the process is keeping the MTA alive by some other means.

The key is that the resume_background co_await expression will effectively switch the execution context of a coroutine to the thread pool, regardless of what thread or apartment on which it was originally executing. A coroutine should presume that it must not block the caller and ensure that evaluation of a coroutine is suspended prior to some compute-bound operation, potentially blocking the calling thread. That doesn’t mean that resume_background should always be used. A coroutine might exist purely to aggregate some other set of coroutines. In that case, any co_await expression within the coroutine may provide the necessary suspension to ensure that the calling thread isn’t blocked.

Now consider what happens if I change the caller, the app’s main function, as follows:

int main()
{
  init_apartment(apartment_type::single_threaded);
  Async();
  MSG message;
  while (GetMessage(&message, nullptr, 0, 0))
  {
    DispatchMessage(&message);
  }
}

This seems reasonable enough. The primary thread becomes an STA thread, calls the Async function (without blocking in any way), and enters a message loop to ensure that the STA can service cross-apartment calls. The problem is that the MTA hasn’t been created, so the threads owned by the thread pool won’t join the MTA implicitly and thus can’t make use of COM services such as activation. Here’s what you might see if you were to run the program now:

thread:17552 apartment:STA
thread:19300 apartment:N/A

Notice that following the resume_background suspension, the coroutine no longer has an apartment context in which to execute code that relies on the COM runtime. If you really need an STA for your primary thread, this problem is easily solved by ensuring that the MTA is “always on” regardless, as shown in Figure 10.

Figure 10 Ensuring the MTA Is “Always On"

int main()
{
  CO_MTA_USAGE_COOKIE mta{};
  check_hresult(CoIncrementMTAUsage(&mta));
  init_apartment(apartment_type::single_threaded);
  Async();
  MSG message;
  while (GetMessage(&message, nullptr, 0, 0))
  {
    DispatchMessage(&message);
  }
}

The CoIncrementMTAUsage function will ensure that the MTA is created. The calling thread becomes an implicit member of the MTA until or unless an explicit choice is made to join some other apartment, as is the case here. If I were to run the program again, I’d get the desired result:

thread:11276 apartment:STA
thread:9412 apartment:MTA

This is essentially the environment in which most modern Windows apps find themselves, but now you know a bit more about how you can control or create that environment for yourself. Join me next time as we continue to explore coroutines.

Wrapping Up

Well, I’ve gone pretty deep on async and coroutines in C++. There’s always more to say about concurrency. It’s such a fascinating topic, and I hope this introduction will get you excited about how simple it is to deal with async in your C++ apps and components and the sheer power at your fingertips when you begin to use C++/WinRT.


Kenny Kerr is an author, systems programmer and the creator of C++/WinRT. He’s also an engineer on the Windows team at Microsoft where he’s designing the future of C++ for Windows, enabling developers to write beautiful high-performance apps and components with incredible ease.