Exercise 4: Working with the Concurrency Runtime Events (Background)

Figure 1

So far we have seen how to synchronize data using the critical_section and reader_writer_lock synchronization primitives. Next we will see work with the concurrency runtime’s event.

This is a bi-state type class that, unlike Critical Section or Reader Writer Lock, does not protect access to shared data. Events synchronize flow of execution and use concurrency runtime’s facilities to enable cooperative schedule of work. They behave similar to Win32 manual-reset event. The main difference between the concurrency runtime’s event and Win32 event is that the concurrency runtime’s event are designed to cooperatively yield to other cooperative tasks in the runtime when blocked in addition to preempting whereas Win32 events are, by design purely pre-emptive in nature.

User Mode Scheduling

Figure 2

One of the benefits of working with the Concurrency Runtime events is that when they are blocked and cannot go any further will yield to other threads. Concurrency Runtime takes advantage of new User Mode Scheduling (UMS) technology, a feature that is part of both Windows 7 (x64) and Windows Server 2008 R2.

As the name implies, UMS Threads are threads that are scheduled by a user-mode scheduler (like the Concurrency Runtime’s scheduler). Scheduling threads in user mode has a couple of advantages:

  1. A UMS Thread can be scheduled without a kernel transition, which can provide a performance boost.
  2. Full use of the OS’s quantum can be achieved if a UMS Thread blocks on a system call or sync event.

To illustrate the benefits of UMS, let’s picture a scenario of a very simple scheduler that has a single work queue. In this case, we are trying to schedule 100 tasks in a computer that has 2 CPUs as shown in the diagram below:

Figure 3

Here we’ve started with 100 items in our task queue, and two threads have picked up Task 1 and Task 2 and are running them in parallel. Unfortunately, Task 2 is going to block on a critical section. Obviously, we would like the scheduler (i.e. the Concurrency Runtime) to use CPU 2 to run the other queued tasks while Task 2 is blocked. Alas, with ordinary Win32 threads, the scheduler cannot tell the difference between a task that is performing a very long computation and a task that is simply blocked in the kernel. The end result is that until Task 2 unblocks, the Concurrency Runtime will not schedule any more tasks on CPU 2. Our 2-core machine just became a 1-core machine, and in the worst case, all 99 remaining tasks will be executed serially on CPU 1.

User Mode Scheduling (cont’d)

This situation can be improved somewhat by using the Concurrency Runtime’s cooperative synchronization primitives (critical_section, reader_writer_lock, event) instead of Win32’s kernel primitives. These runtime-aware primitives will cooperatively block a thread, informing the Concurrency Runtime that other work can be run on the CPU. In the above example, Task 2 will cooperatively block, but Task 3 can be run on another thread on CPU 2. All this involves several trips through the kernel to block one thread and unblock another, but it’s certainly better than wasting the CPU:

Figure 4

The situation is improved even further on Windows Server 2008 R2 with UMS threads. When Task 2 blocks, the OS gives control back to Concurrency Runtime. It can now make a scheduling decision and create a new thread to run Task 3 from the task queue. The new thread is scheduled in user-mode by the Concurrency Runtime, not by the OS, so the switch is very fast.

Both CPU 1 and CPU 2 can now be kept busy with the remaining 99 non-blocking tasks in the queue. When Task 2 gets unblocked, Windows Server 2008 R2 places its host thread back on a runnable list so that the Concurrency Runtime can schedule it – again, from user-mode – and Task 2 can be continued on any available CPU:

Figure 5

Note:
The Concurrency Runtime looks for unblocked (runnable) tasks before queued tasks, so it might be somewhat more likely that Task2 will run before the next queued task (19)