December 2011

Volume 26 Number 12

Windows with C++ - Thread Pool Timers and I/O

By Kenny Kerr | December 2011

Kenny KerrIn this, my final installment on the Windows 7 thread pool, I’m going to cover the two remaining callback-generating objects provided by the API. There’s even more I could write about the thread pool, but after five articles that cover virtually all of its features, you should be comfortable using it to power your applications effectively and efficiently.

In my August (msdn.microsoft.com/magazine/hh335066) and November (msdn.microsoft.com/magazine/hh547107) columns, I described work and wait objects respectively. A work object allows you to submit work, in the form of a function, directly to the thread pool for execution. The function will execute at the earliest opportunity. A wait object tells the thread pool to wait for a kernel synchronization object on your behalf, and queue a function when it’s signaled. This is a scalable alternative to traditional synchronization primitives and an efficient alternative to polling. There are, however, many cases where timers are required to execute some code after a certain interval or at some regular period. This might be because of a lack of “push” support in some Web protocol or perhaps because you are implementing a UDP-style communications protocol and you need to handle retransmissions. Fortunately, the thread pool API provides a timer object to handle all of these scenarios in an efficient and now-familiar manner.

Timer Objects

The CreateThreadpoolTimer function creates a timer object. If the function succeeds, it returns an opaque pointer representing the timer object. If it fails, it returns a null pointer value and provides more information via the GetLastError function. Given a timer object, the CloseThreadpoolTimer function informs the thread pool that the object may be released. If you’ve been following along in the series, this should all sound quite familiar. Here’s a traits class that can be used with the handy unique_handle class template I introduced in my July 2011 column (msdn.microsoft.com/magazine/hh288076):

struct timer_traits
{
  static PTP_TIMER invalid() throw()
  {
    return nullptr;
  }
  static void close(PTP_TIMER value) throw()
  {
    CloseThreadpoolTimer(value);
  }
};
typedef unique_handle<PTP_TIMER, timer_traits> timer;

I can now use the typedef and create a timer object as follows:

void * context = ...
timer t(CreateThreadpoolTimer(its_time, context, nullptr));
check_bool(t);

As usual, the final parameter optionally accepts a pointer to an environment so you can associate the timer object with an environment, as I described in my September 2011 column (msdn.microsoft.com/­magazine/hh416747). The first parameter is the callback function that will be queued to the thread pool each time the timer expires. The timer callback is declared as follows:

void CALLBACK its_time(PTP_CALLBACK_INSTANCE, void * context, PTP_TIMER);

To control when and how often the timer expires, you use the SetThreadpoolTimer function. Naturally, its first parameter provides the timer object but the second parameter indicates the due time at which the timer should expire. It uses a FILETIME structure to describe either absolute or relative time. If you’re not quite sure how this works, I encourage you to read last month’s column, where I described the semantics around the FILETIME structure in detail. Here’s a simple example where I set the timer to expire in five seconds:

union FILETIME64
{
  INT64 quad;
  FILETIME ft;
};
FILETIME relative_time(DWORD milliseconds)
{
  FILETIME64 ft = { -static_cast<INT64>(milliseconds) * 10000 };
  return ft.ft;
}
auto due_time = relative_time(5 * 1000);
SetThreadpoolTimer(t.get(), &due_time, 0, 0);

Again, if you’re unsure about how the relative_time function works, please read my November 2011 column. In this example, the timer will expire after five seconds, at which point the thread pool will queue an instance of the its_time callback function. Unless action is taken, no further callbacks will be queued.

You can also use SetThreadpoolTimer to create a periodic timer that will queue a callback on some regular interval. Here’s an example:

auto due_time = relative_time(5 * 1000);
SetThreadpoolTimer(t.get(), &due_time, 500, 0);

In this example, the timer’s callback is first queued after five seconds and then every half-second after that until the timer object is reset or closed. Unlike the due time, the period is simply specified in milliseconds. Keep in mind that a periodic timer will queue a callback after the given period elapses, regardless of how long it takes the callback to execute. This means it’s possible for multiple callbacks to run concurrently, or overlap, if the interval is small enough or the callbacks take a long enough time to execute.

If you need to ensure callbacks don’t overlap, and the precise start time for each period isn’t that important, then a different approach for creating a periodic timer might be appropriate. Instead of specifying a period in the call to SetThreadpoolTimer, simply reset the timer in the callback itself. In this way, you can ensure the callbacks will never overlap. If nothing else, this simplifies debugging. Imagine stepping through a timer callback in the debugger only to find that the thread pool has already queued a few more instances while you were analyzing your code (or refilling your coffee). With this approach, that will never happen. Here’s what it looks like:

void CALLBACK its_time(PTP_CALLBACK_INSTANCE, void *, PTP_TIMER timer)
{
  // Your code goes here
  auto due_time = relative_time(500);
  SetThreadpoolTimer(timer, &due_time, 0, 0);
}
auto due_time = relative_time(5 * 1000);
SetThreadpoolTimer(t.get(), &due_time, 0, 0);

As you can see, the initial due time is five seconds and then I reset the due time to 500 ms at the end of the callback. I have taken advantage of the fact that the callback signature provides a pointer to the originating timer object, making the job of resetting the timer very simple. You may also want to use RAII to ensure the call to SetThreadpoolTimer is reliably called before the callback returns.

You can call SetThreadpoolTimer with a null pointer value for the due time to stop any future timer expirations that may result in further callbacks. You’ll also need to call WaitForThreadpool­TimerCallbacks to avoid any race conditions. Of course, timer objects work equally well with cleanup groups, as described in my October 2011 column.

SetThreadpoolTimer’s final parameter can be a bit confusing because the documentation refers to a “window length” as well as a delay. What’s that all about? This is actually a feature that affects energy efficiency and helps reduce overall power consumption. It’s based on a technique called timer coalescing. Obviously, the best solution is to avoid timers altogether and use events instead. This allows the system’s processors the greatest amount of idle time, thereby encouraging them to enter their low-power idle states as much as possible. Still, if timers are necessary, timer coalescing can reduce the overall power consumption by reducing the number of timer interrupts that are required. Timer coalescing is based on the idea of a “tolerable delay” for the timer expirations. Given some tolerable delay, the Windows kernel may adjust the actual expiration time to coincide with any existing timers. A good rule of thumb is to set the delay to one-tenth of the period in use. For example, if the timer should expire in 10 seconds, use a one-second delay, depending on what’s appropriate for your application. The greater the delay, the more opportunity the kernel has to optimize its timer interrupts. On the other hand, anything less than 50 ms will not be of much use because it begins to encroach on the kernel’s default clock interval.

I/O Completion Objects

Now it’s time for me to introduce the gem of the thread pool API: the input/output (I/O) completion object, or simply the I/O object. Back when I first introduced the thread pool API, I mentioned that the thread pool is built on top of the I/O completion port API. Traditionally, implementing the most scalable I/O on Windows was possible only using the I/O completion port API. I have written about this API in the past. Although not particularly difficult to use, it was not always that easy to integrate with an application’s other threading needs. Thanks to the thread pool API, though, you have the best of both worlds with a single API for work, synchronization, timers and now I/O, too. The other benefit is that performing overlapped I/O completion with the thread pool is actually more intuitive than using the I/O completion port API, especially when it comes to handling multiple file handles and multiple overlapped operations concurrently.

As you might have guessed, the CreateThreadpoolIo function creates an I/O object and the CloseThreadpoolIo function informs the thread pool that the object may be released. Here’s a traits class for the unique_handle class template:

struct io_traits
{
  static PTP_IO invalid() throw()
  {
    return nullptr;
  }
  static void close(PTP_IO value) throw()
  {
    CloseThreadpoolIo(value);
  }
};
typedef unique_handle<PTP_IO, io_traits> io;

The CreateThreadpoolIo function accepts a file handle, implying that an I/O object is able to control the I/O for a single object. Naturally, that object needs to support overlapped I/O, but this includes popular resource types such as file system files, named pipes, sockets and so on. Let me demonstrate with a simple example of waiting to receive a UDP packet using a socket. To manage the socket, I’ll use unique_handle with the following traits class:

struct socket_traits
{
  static SOCKET invalid() throw()
  {
    return INVALID_SOCKET;
  }
  static void close(SOCKET value) throw()
  {
    closesocket(value);
  }
};
typedef unique_handle<SOCKET, socket_traits> socket;

Unlike the traits classes I’ve shown thus far, in this case the invalid function doesn’t return a null pointer value. This is because the WSASocket function, like the CreateFile function, uses an unusual value to indicate an invalid handle. Given this traits class and typedef, I can create a socket and I/O object quite simply:

socket s(WSASocket( ... , WSA_FLAG_OVERLAPPED));
check_bool(s);
void * context = ...
io i(CreateThreadpoolIo(reinterpret_cast<HANDLE>(s.get()), io_completion, context, nullptr));
check_bool(i);

The callback function that signals the completion of any I/O operation is declared as follows:

void CALLBACK io_completion(PTP_CALLBACK_INSTANCE, void * context, void * overlapped,
  ULONG result, ULONG_PTR bytes_copied, PTP_IO)

The unique parameters for this callback should be familiar if you’ve used overlapped I/O before. Because overlapped I/O is by nature asynchronous and allows overlapping I/O operations—hence the name overlapped I/O—there needs to be a way to identify the particular I/O operation that has completed. This is the purpose of the overlapped parameter. This parameter provides a pointer to the OVERLAPPED or WSAOVERLAPPED structure that was specified when a particular I/O operation was first initiated. The traditional approach of packing an OVERLAPPED structure into a larger structure to hang more data off this parameter can still be used. The overlapped parameter provides a way to identify the particular I/O operation that has completed, while the context parameter—as usual—provides a context for the I/O endpoint, regardless of any particular operation. Given these two parameters, you should have no trouble coordinating the flow of data through your application. The result parameter tells you whether the overlapped operation succeeded with the usual ERROR_SUCCESS, or zero, indicating success. Finally, the bytes_copied parameter obviously tells you how many bytes were actually read or written. A common mistake is to assume that the number of bytes requested was actually copied. Don’t make that mistake: it’s the very reason for this parameter’s existence.

The only part of the thread pool’s I/O support that’s slightly tricky is the handling of the I/O request itself. It takes care to code this properly. Before calling a function to initiate some asynchronous I/O operation, such as ReadFile or WSARecvFrom, you must call the StartThreadpoolIo function to let the thread pool know that an I/O operation is about to start. The trick is that if the I/O operation happens to complete synchronously, then you must notify the thread pool of this by calling the CancelThreadpoolIo function. Keep in mind that I/O completion doesn’t necessarily equate to successful completion. An I/O operation might succeed or fail both synchronously or asynchronously. Either way, if the I/O operation will not notify the completion port of its completion, you need to let the thread pool know. Here’s what this might look like in the context of receiving a UDP packet:

StartThreadpoolIo(i.get());
auto result = WSARecvFrom(s.get(), ...
if (!result)
{
  result = WSA_IO_PENDING;
}
else
{
  result = WSAGetLastError();
}
if (WSA_IO_PENDING != result)
{
  CancelThreadpoolIo(i.get());
}

As you can see, I begin the process by calling StartThreadpoolIo to tell the thread pool that an I/O operation is about to begin. I then call WSARecvFrom to get things going. Interpreting the result is the crucial part. The WSARecvFrom function returns zero if the operation completed successfully, but the completion port will still be notified, so I change the result to WSA_IO_PENDING. Any other result from WSARecvFrom indicates failure, with the exception, of course, of WSA_IO_PENDING itself, which simply means that the operation has been successfully initiated but it will be completed later. Now, I simply call CancelThreadpoolIo if the result is not pending to keep the thread pool up to speed. Different I/O endpoints may provide different semantics. For example, file I/O can be configured to avoid notifying the completion port on synchronous completion. You would then need to call CancelThreadpoolIo as appropriate.

Like the other callback-generating objects in the thread pool API, pending callbacks for I/O objects can be canceled using the WaitForThreadpoolIoCallbacks function. Just keep in mind that this will cancel any pending callbacks, but not cancel any pending I/O operations themselves. You will still need to use the appropriate function to cancel the operation to avoid any race conditions. This allows you to safely free any OVERLAPPED structures, and so forth.

And that’s it for the thread pool API. As I said, there’s more I could write about this powerful API, but given the detailed walk-through I’ve provided thus far, I’m sure you’re well on your way to using it to power your next application. Join me next month as I continue to explore Windows with C++.     


Kenny Kerr is a software craftsman with a passion for native Windows development. Reach him at kennykerr.ca.