3: Parallel Tasks

Article
03/13/2012

On this page:
The Basics \| An Example \| Variations \| Coordinating Tasks with Cooperative Blocking \| Canceling a Task Group \| Handling Exceptions \| Speculative Execution \| Anti-Patterns \| Variables Captured by Closures \| Unintended Propagation of Cancellation Requests \| The Cost of Synchronization \| Design Notes \| Task Group Calling Conventions \| Tasks and Threads \| How Tasks Are Scheduled \| Structured Task Groups and Task Handles \| Lightweight Tasks \| Exercises \| Further Reading

Chapter 2, "Parallel Loops," shows how you can use a parallel loop to apply a single operation to many data elements. This is data parallelism. Chapter 3 explains what happens when there are distinct asynchronous operations that can run simultaneously. In this situation, you can temporarily fork a program's flow of control with tasks that can potentially execute in parallel. This is task parallelism. The Parallel Tasks pattern is sometimes known as the Fork/Join pattern or the Master/Worker pattern.

Parallel tasks are asynchronous operations that can run at the same time. This approach is also known as task parallelism.

Data parallelism and task parallelism are two ends of a spectrum. Data parallelism occurs when a single operation is applied to many inputs. Task parallelism uses multiple operations, each with its own input.

In the Parallel Patterns Library (PPL), tasks are started and managed by methods of the task_group class, which is declared in the ppl.h header file. The task group class’s run method creates and schedules new tasks. You can wait for all tasks created by the task group to complete by invoking the task group's wait method. If you think in terms of fork/join, the run method is the fork operation and the wait method is the join operation.

Parallel tasks in PPL are managed by the task_group class.

Scheduling is an important aspect of parallel tasks. Unlike threads, new tasks don't necessarily begin to execute immediately. Instead, they are placed in a work queue. Tasks run when their associated task scheduler removes them from the queue, usually as processor resources become available. As long as there are enough tasks and the tasks are sufficiently free of serializing dependencies, the program's performance scales with the number of available cores. In this way, tasks embody the concept of potential parallelism that was introduced in Chapter 1.

Another important aspect of task-based applications is how they handle exceptions. In PPL, an unhandled exception that occurs during the execution of a task is deferred for later observation. For example, the deferred exception is automatically observed at a later time when you call the task_group::wait method. At that time, the exception is rethrown in the calling context of the wait method. This allows you to use the same exception handling approach in parallel programs that you use in sequential programs.

Tasks in PPL defer exceptions and rethrow them when the task group’s wait method is invoked.

The Basics

Each task is a sequential operation; however, tasks can often run in parallel. Here's some sequential code.

  DoLeft();
  DoRight();

Let's assume that the methods DoLeft and DoRight are independent. This means that neither method writes to memory locations or files that the other method might read. Because the methods are independent, you can use the parallel_invoke function of the Concurrency namespace to run them in parallel. This is shown in the following code.

  parallel_invoke(
    []() { DoLeft(); },
    []() { DoRight(); }
  );

The parallel_invoke function is the simplest expression of the Parallel Tasks pattern. The function creates a new task group with new parallel tasks for each lambda expression in its argument list. The parallel_invoke function returns when all the tasks are finished. The arguments to parallel_invoke are known as work functions.

There are overloaded versions of the parallel_invoke function that accept up to nine work functions.

The parallel_invoke function in the Concurrency namespace creates a group of parallel tasks and waits for them all to complete.

You can't assume that all parallel tasks will immediately run. Depending on the current work load and system configuration, tasks might be scheduled to run one after another, or they might run at the same time. For more information about how tasks are scheduled see "How Tasks Are Scheduled," later in this chapter.

The functions run by parallel_invoke can either complete normally or finish by throwing an exception. If an exception is thrown by one of the work functions during the execution of parallel_invoke, it will be deferred and rethrown when all tasks finish. If more than one of the work functions throws an exception, the runtime chooses one of the exceptions to be rethrown. The remaining exceptions will not be externally observed. For more information and a code example, see the section, "Handling Exceptions," later in this chapter.

Internally, parallel_invoke creates new tasks and waits for them. You can reproduce this functionality by creating a task group object and calling its run and wait methods. Here's an example.

  task_group tg;

  tg.run([](){ DoLeft(); });
  tg.run([](){ DoRight(); });
  tg.wait();

Use the task group’s run method to create a task and schedule its execution. Use the wait method to block the current context until all of the tasks in a task group have completed.

The run method of the task_group class creates and schedules a new task. The run method’s argument is a lambda expression, a pointer to function, or a function object that will be invoked when the task eventually executes. In other words, the argument can be any object that supports the function call operator with the signature void operator()(). When you use the task_group::run method to create a task, the new task is added to a work queue for eventual execution, but it does not start to execute until its task scheduler takes it out of the work queue, which can happen immediately or can occur at some point in the future.

Note

If you pass areferencetoan instance of a class that supportsoperator()as an argument to the task_group::run method, you must make sure to manage the memory of the function object. The function object can safely be destroyed only after the task group object’s wait method returns. Lambda expressions and pointers to static functions do not require explicit deletion and are therefore easier to use than class-type functors.

You can wait for the tasks of the task group to complete by calling the task group’s wait method.

It is also possible to combine the run and wait steps into a single operation. This is shown in the following code.

  task_group tg;

  tg.run([](){ DoLeft(); });
  tg.run_and_wait([](){ DoRight(); });

Calling the task group’s run_and_wait method instead of the run method followed by a call to the wait method can result in slightly more efficient use of threads. The run_and_wait method acts as a hint to the task scheduler that it can reuse the current context to execute the new task.

The examples you've seen so far are simple, but they're powerful enough to handle many scenarios. For more ways to use tasks, see the section, "Variations," later in this chapter.

An Example

An example of task parallelism is an image processing application where images are created with layers. Separate images from different sources are processed independently and then combined with a process known as alpha blending. This process superimposes semitransparent layers to form a single image.

The source images that are combined are different, and different image processing operations are performed on each of them. This means that the image processing operations must be performed separately on each source image and must be complete before the images can be blended. In the example, there are only two source images, and the operations are simple: conversion to gray scale and rotation. In a more realistic example, there might be more source images and more complicated operations.

Here's the sequential code. The source code for the complete example is located at http://parallelpatternscpp.codeplex.com in the Chapter3\ImageBlender folder.

static void SequentialImageProcessing(
     Bitmap* const source1, Bitmap* const source2, 
     Bitmap* const layer1, Bitmap* const layer2, 
     Graphics* const blender)
{
  SetToGray(source1, layer1);
  Rotate(source2, layer2);
  Blend(layer1, layer2, blender);
}

In this example, source1 and source2 are bitmaps that are the original source images, layer1 and layer2 are bitmaps that have been prepared with additional information needed to blend the images, and blender is a Graphics instance that performs the blending and references the bitmap with the final blended image. Internally, SetToGray, Rotate, and Blend use methods from the platform’s Gdiplus namespace to perform the image processing.

The SetToGray and Rotate methods are entirely independent of each other. This means that you can execute them in separate tasks. If two or more cores are available, the tasks might run in parallel, and the image processing operations might complete in less elapsed time than a sequential version would.

The parallel_invoke function creates tasks and waits for them to complete before proceeding. This is shown in the following code.

static void ParallelInvokeImageProcessing(
     Bitmap* const source1, Bitmap* const source2, 
     Bitmap* layer1, Bitmap* layer2, Graphics* blender)
{
  parallel_invoke(
    [&source1, &layer1](){ SetToGray(source1, layer1); },
    [&source2, &layer2](){ Rotate(source2, layer2);} 
  );
  Blend(layer1, layer2, blender);
}

In this example, the tasks are identified implicitly by the arguments to parallel_invoke. This call does not return until all of the tasks complete.

Use the parallel_invoke function whenever tasks can be defined in a single lexical scope.

You can also create parallel tasks explicitly. This is shown in the following code.

static void ParallelTaskGroupImageProcessing(
    Bitmap* const source1, Bitmap* const source2,
    Bitmap* layer1, Bitmap* layer2, Graphics* blender)
{
  task_group tasks;
  tasks.run(
     [&source1, &layer1](){ SetToGray(source1, layer1);} 
  );
  tasks.run_and_wait(
     [&source2, &layer2](){ Rotate(source2, layer2); }
  );
  Blend(layer1, layer2, blender);
}

This code allocates a task group on the stack. It then calls task group run****methods to create and run two tasks that execute SetToGray and Rotate. The example uses the run_and_wait method to create the second task and to wait for all tasks to finish before blending the processed images.

Variations

This section describes variations of PPL’s implementation of the Parallel Task pattern.

Coordinating Tasks with Cooperative Blocking

The classes and functions in the Concurrency namespace implement a task-coordination feature known as cooperative blocking. With cooperative blocking, your task can suspend its execution and relinquish control to the task scheduler until a specific condition is met. This usually occurs when another task performs an action that the first task needs. A typical example of a cooperative blocking operation is the task_group::wait method. If wait is invoked from within a running task, the task scheduler knows that the current task can’t continue until all the tasks of the specified task group run to completion. Cooperative blocking provides a robust and highly programmable way to coordinate tasks.

Cooperative blocking provides a robust and highly programmable way to coordinate tasks.

Cooperative blocking can improve the performance of a parallel application by enabling fuller use of processor resources. A cooperatively blocked task represents an opportunity for the task scheduler to apply processor resources to other tasks. If you are going to use PPL effectively, it is important to understand the interaction of cooperative blocking with the task scheduler. For more information, see the "Task Scheduler" section of Appendix A, "The Task Scheduler and Resource Manager."

Invoking cooperative blocking acts as a hint to the scheduler that other work may be started or resumed.

You can also invoke any of the synchronization features of the operating system from within tasks. Using lower-level blocking operations of the operating system is sometimes called noncooperative blocking. If you do any type of blocking, you will need to consider whether cooperative or noncooperative blocking is most appropriate. In general, cooperative blocking has the advantage of better coordination with the task scheduler.

Cooperative blocking can improve the performance of a parallel application by enabling fuller use of processor resources.

The following table lists the most important operations that the runtime considers to be cooperative blocking operations. Note that these operations are not guaranteed to block every time they are called; instead, they have the potential to block if certain conditions exist. All of the classes are in the Concurrency namespace.

Cooperative blocking operation	Description
task_group::wait method	This method blocks the current task until the tasks of another task group have completed their work.
critical_section::lock method	A critical section provides mutual exclusion. Acquiring the lock may block the current task if the lock is in use by another task. Only one task at a time can possess the lock.
critical_section::scoped_lock classconstructor	An exception-safe way to acquire and release a critical section within a block of code is to define a critical_section ::scoped_lock at the beginning of that block.
reader_writer_lock::lock method	This method acquires a reader/writer lock for concurrency-safe modification of shared data. Calling the lock method will block the current task if a reader or reader/writer lock is already held by another thread.
reader_writer_lock::scoped_lock class constructor	An exception-safe way to acquire and release a reader/writer lock within a block of code is to define a reader_writer_lock:: scoped_lock at the beginning of that block.
reader_writer_lock::lock_read method	This method acquires a reader lock for concurrency-safe reading of shared data. The lock_read method may block the current thread if another thread holds the reader/writer lock. Unlike the reader/writer lock, more than one task may hold a reader lock at the same time.
reader_writer_lock::scoped_lock_read class constructor	An exception-safe way to acquire and release a reader lock within a block of code is to define a reader_writer_lock:: scoped_lock_read at the beginning of that block.
event::wait method	This method is a cooperatively blocking equivalent of a manual reset event. Invoking the event::wait method may block if the event has not yet been set.
agent::wait method agent::wait_for_* methods	These methods block the current task until the agent instance completes its work.
wait(…) function	The Concurrency::wait function cooperatively blocks execution for a specified time interval.
Context::Block method	This method suspends the current context until it is cooperatively reenabled by another task’s invocation of the Context::Unblock method. This operation is used by libraries to implement new task-coordination control structures. It is not normally used by application code.
Context::Yield method	This method suspends the current thread so that another worker thread may be given the opportunity to resume execution. Although Yield potentially suspends execution of the current thread, it never results in a blocked context. Therefore, Yield can only loosely be considered a blocking operation.
parallel_for, parallel_for_each and parallel_invoke functions	PPL’s functions for parallel algorithms internally invoke blocking operations, such as the wait method.
send and asend functions	These functions transmit data to messaging blocks. In Microsoft® Visual Studio® 2010 SP1, the implementations of the send and asend functions sometimes invoke cooperative blocking, depending on certain internal system details. The asend function is expected to be nonblocking in future versions of the runtime.
receive function	This function gets a value from a messaging block. It may block if the messaging block does not yet have a value to provide.

Canceling a Task Group

You can signal cancellation by invoking the task group’s cancel method. Here's an example:

Use the task_group::cancel method to signal cancellation of all tasks in a task group.

task_group tg;
tg.run([](){ DoLeft(); });
tg.cancel();                   // could be called from any thread
wcout << L"  Is canceling: " << tg.is_canceling() << endl;

task_group_status status = tg.wait();
wcout << L"  Status:    " << status << endl;

Invoking a task group’s cancel method causes the task group to transition to a state where its is_canceling method will return true. Tasks in the task group that have not yet started are not allowed to run. New tasks that are added to the task group by the run method are ignored after the cancel method has been called.

Tasks in the task group that have started before cancellation is signaled continue to run, but their behavior may change. If a task of a task group that is being canceled invokes any function in the Concurrency namespace, an exception may be thrown. For example, if a running task of a task group that is being canceled makes a call to another task group's wait method, an exception may be thrown by the runtime. The specific set of functions in the Concurrency namespace that will throw exceptions when a task group is undergoing cancellation and the specific set of circumstances that will cause such exceptions to be thrown are intentionally not specified. Therefore, your application logic should not depend on whether any particular library function throws or does not throw an exception when a task group is being canceled.

Note

You must ensure that your task work functions are exception safe. Use the Resource Acquisition is Initialization (RAII) pattern for automatic cleanup.

In the current version of PPL, canceling a task group with cooperatively (or noncooperatively) blocked tasks may result in deadlock in certain cases. For example, consider the case where you create two tasks in a task group that share an instance E of the cooperatively blocking event class. One of the tasks calls the wait method of event E, and the other task calls the signal method of event E. If the task group’s cancel method is called while the first task is blocked waiting for event E but before the second task has started to execute, there will be no way for the wait condition of the first task to be satisfied.

Note

Use extreme caution if you mix blocking operations other than task_group::wait with task group cancellation. In such cases you must ensure that all possible code paths are deadlock free.

Cancellation will automatically propagate across task groups in certain situations. For example, cancellation will be propagated if a task in task group A is waiting for the tasks of task group B to complete. In this situation, if task group A’s cancel method is called before the call to task group B’s wait method completes, then the runtime also invokes task group B’s cancel method. The task in task group A that is waiting on task group B remains blocked until task group B has no more running tasks. At that time, the call to task group B’s wait method will throw an exception. (Of course, if task group B is very fast, its wait function might return normally before there is a chance to propagate a cancellation from task group A.)

Note that the runtime only returns the enumerated value canceled for the wait method of the top-most task group in a tree of dependent task groups. The other stack frames will have an internal exception passed through them.

A cancellation request automatically propagates to another task group if a call to that group’s wait method is blocking any of the tasks of a task group that is being cancelled.

Long-running tasks can use the is_canceling method to poll their task group for its cancellation status and shut themselves down if cancellation has been requested. The is_canceling method might also be used if you need to perform local cleanup actions for a task that's in the process of being canceled. When the task group returns from a call to its wait method, its state is reset and its is_canceling method thereafter returns false.

Returning from the task_group::wait method returns a task group object to its initial, default state. The task group has the same behavior as if it were newly created.

Checking for cancellation within a loop that has many iterations that each performs a small amount of work can negatively affect your application's performance. On the other hand, checking only infrequently for cancellation can introduce unacceptable latency in your application's response to cancellation requests. For example, in an interactive GUI-based application, checking for cancellation more than once per second is probably a good idea. An application that runs in the background could poll for cancellation less frequently, perhaps every two to ten seconds. Profile your application to collect performance data that will help you determine the best places to test for cancellation requests in your code. In particular, look for places in your code where you can poll for cancellation at evenly spaced intervals. For more information about profiling, see "Appendix B, Profiling and Debugging."

Handling Exceptions

If the execution of a task’s work function throws an unhandled exception, the unhandled exception is temporarily unobserved by your application. The runtime catches the exception and records its details in internal data structures. Then, the runtime cancels the task group that contains the faulted task. See the previous section, "Canceling a Task Group," for more information about what happens during task group cancellation. Under certain conditions, the cancellation is automatically propagated to other task groups.

When a task of a task group throws an unhandled exception, the runtime cancels that task group. The task group’s is_canceling method returns true during the course of the shutdown.

Recovering a deferred exception and rethrowing it is known as "observing an unhandled task exception." When all of the tasks in the task group have completed, the task_group::wait method rethrows the faulted task’s exception in the runtime context of the thread that invoked the wait method. If there is more than one exception (that is, if more than one task threw an unhandled exception), the runtime will choose one of the exceptions to rethrow. The remaining exceptions will not be observed.

Note

Be aware that if more than one task throws an exception, only one of the exceptions will be observed by the task_group::wait method. You can’t control which exception will be rethrown.

If you need to record all of the exceptions, you can implement custom logging or tracing code.

Speculative Execution

Speculative execution occurs when you perform an operation in anticipation of a particular result. For example, you might predict that the current computation will produce the value 42. You start the next computation, which depends on the current computation's result, by using 42 as the input. If the first computation ends with 42, you've gained parallelism by successfully starting the dependent operation well in advance of when you otherwise could have. If the first computation results in something other than 42, you can restart the second operation using the correct value as the input.

Another example of speculative execution occurs when you execute more than one asynchronous operation in parallel but need just one of the operations to complete before proceeding. Imagine, for example, that you use three different search tasks to search for an item. After the fastest task finds the item, you don't need to wait for the other searches to complete. In cases like this you wait for the first task to complete and usually cancel the remaining tasks. However, you should always observe any exceptions that might have occurred in any of the tasks.

Use task cancellation as a way to wait for at least one task in a set of tasks to complete.

You can use the task_group::cancel method to implement speculative execution. Here's an example.

task_group tg;

tg.run([&tg](){ SearchLeft(tg); });
tg.run([&tg](){ SearchRight(tg); });
tg.run_and_wait([&tg](){ SearchCenter(tg); });

In this example, you perform three searches in parallel. You want to continue if any of the three functions completes. You don’t need to wait for all of them. To make this possible, the code passes a reference to the task group object to each of the work functions. Inside of the work functions, the code cancels the task group when it completes its work. The following code shows how to accomplish this inside of the SearchLeft function.

void SearchLeft(task_group& tg)
{
  bool result = 
              DoCpuIntensiveOperation(g_TaskMilliseconds/5, &tg);
  wcout << L"  Left search finished, completed = " << result 
        << endl;
  tg.cancel();
}

The long-running function, DoCpuIntensiveOperation, checks for the cancellation status. This is shown in the following code.

bool DoCpuIntensiveOperation(DWORD milliseconds, 
                             task_group* tg = nullptr)
{
  // ...
  int i = 0;
  DWORD checkInterval = ...
  while (true) 
  {
    if ((milliseconds == 0) || (++i % checkInterval == 0))
    {
      if (tg && tg->is_canceling())
        return false;
    }
    // ...
  }
}

The body of the while loop periodically checks to see if the task group, tg, has received a cancellation request. For performance reasons the code only polls at a specified number of loop iterations.

Anti-Patterns

Here are some things to watch out for when you use task groups.

Variables Captured by Closures

In C++, a closure can be created using a lambda expression that represents an unnamed (anonymous) function. Closures can refer to variables defined outside of their lexical scope, such as local variables that were declared in a scope that contains the closure.

The semantics of closures in C++ may not be intuitive to some programmers, and it's easy to make mistakes. If you code your closure incorrectly, you may find that captured variables don't behave as you expect, especially in parallel programs.

Problems occur when you reference a variable without considering its scope. Here's an example.

  task_group tg;
  for (int i = 0; i < 4; i++)
  {
     // WARNING: BUGGY CODE, i has unexpected value
     tg.run([&i]() { wcout << i << endl; } );
  }
  tg.wait();

You might think that this code sample would print the numbers 1, 2, 3, 4 in some arbitrary order, but it can print other values, depending on how the threads happen to run. For example, you might see 4, 4, 4, 4. The reason is that the variable i is captured by reference and shared by all the closures created by the steps of the for loop. By the time the tasks start, the value of the single, shared variable i will probably be different from the value of i when the task was created.

The solution is to capture the variable by value in the appropriate scope.

  task_group tg;
  for (int i = 0; i < 4; i++)
  {
     tg.run([i]() { wcout << i << endl; } );
  }
  tg.wait();

This version prints the numbers 1, 2, 3, 4 in an arbitrary order, as was originally intended. The reason is that the value of the variable i is passed to the closure. Effectively, a new variable named i is instantiated with each iteration of the for loop.

Note

It’s a good idea to make sure that every lambda expression uses explicit capture instead of implicit capture.

Unintended Propagation of Cancellation Requests

If you use a library that is implemented with PPL, the API calls into that library may create task groups that are internal to the library. If you call that library’s APIs from a task context within your application, you might unintentionally create a situation where a task of your application’s task group is waiting on a task group inside of the library’s implementation. According to the behavior described in the "Canceling a Task Group" section, if you invoke the cancel method of a task group in your application, you may implicitly cause the cancellation of task groups that were created by the library you are calling. Transitive propagation of cancellation into another component’s internal task groups may not be the behavior you intend; in some cases, you may prefer that library functions run to completion even though a higher-level component is beginning to shut down.

Note

Be aware that implicit parent-child relationships can influence the behavior of cancellation and exception handling, especially with libraries.

You avoid cases of unintended propagation of runtime context by using a neutral, non-PPL thread context to call into any library functions that may depend on task group wait operations. For example, you could use a lightweight task to invoke library functions. A lightweight task is a lower-level type of task that does not result in the propagation of cancellation requests. They are described in Appendix A.

The Cost of Synchronization

Locks and other synchronization operations are sometimes necessary in parallel programs. However, programmers often underestimate how much serializing operations can degrade performance.

Note

To avoid performance bottlenecks, review your use of locks and other synchronization operations carefully.

You may want to review the "Scalable Sharing of Data" section of Chapter 1, "Introduction" for guidance on how to factor synchronization into the design of your application. Well-designed applications require explicit synchronization operations less often than poorly designed applications.

Design Notes

This section describes some of the design considerations that were used to create the Parallel Patterns Library, along with some recommended coding practices.

Task Group Calling Conventions

Whenever you call a task group’s run method, you must subsequently call its wait method and allow that call to return before you destroy the task group. Otherwise, the runtime will throw a "missing wait" exception. The missing wait exception only occurs in the normal flow of execution; it will not be thrown if you unwind the stack due to an application exception. Therefore, you do not need an RAII wrapper that calls the wait method.

The task_group class’s methods are all concurrency safe, so there are many ways to invoke these methods. However, no matter which method you choose, you must make sure a return from a call to the task group’s wait method is the last operation that happens in the normal (no exceptions thrown) execution path before allowing a task group’s destructor to run.

If a task group’s wait method is called from multiple contexts, and you interleave these calls with calls to the task group’s run method, be aware that the results may vary from run to run. For example, it is possible to call a task group’s wait method concurrently from two different contexts. If there are pending tasks in the task group, both invocations of task_group::wait will return only after the task group has completed all of its pending tasks. However, if the task group is canceled while the tasks are running, only one of the wait functions will return the canceled status value. The other invocation will return a normal status, due to interleaving. (Returning from the wait method resets the task group’s is_canceling status as a side effect; whichever invocation returns first will perform the reset.)

Tasks and Threads

When a task begins to run, the applicable task scheduler invokes the task's work function in a thread of its choosing.

The task will not migrate among threads at run time. This is a useful guarantee because it lets you use thread-affine abstractions, such as critical sections, without having to worry, for example, that the EnterCriticalSection function will be executed in a different thread than the LeaveCriticalSection function.

In general, creating a new task is a much less resource-intensive operation than creating a new thread. It is possible for an application to create hundreds of thousands or even millions of tasks and still run efficiently.

You may want to profile your application as you experiment with strategies for using tasks in your application. If your tasks are too fine grained, you will incur overhead for task management that may hurt performance. For example, a task that performs a single arithmetic operation is not going to improve performance. However, if you decompose your application into too few tasks, you will not fully exploit the potential parallelism of the application.

How Tasks Are Scheduled

The techniques for scheduling tasks and scheduling threads demonstrate fundamentally different goals. The operating system generally schedules threads in a way that minimizes latency. Preemptive thread scheduling allows users to interact with a busy system with very little perceived delay, even on a system with only one core.

In contrast, the task scheduler tries to maximize overall throughput rather than ensure low latency for any given task. In other words, when you decompose a computationally intensive operation into tasks that can potentially run in parallel, you want to complete the overall operation as quickly as possible without concern for the scheduling delays that any given task might experience. For task-based systems such as PPL and the underlying Concurrency Runtime, the measure of success is the speed of the overall result. The goal is to optimize the use of processor resources.

For these reasons you should not expect that tasks in PPL are scheduled "fairly." Instead, a variety of techniques are used to improve throughput. These techniques mainly focus on keeping cores busy and on an effective use of system caches. For more information about scheduling, see Appendix A, "The Task Scheduler and Resource Manager."

There are a number of options available that allow you to control how the scheduler deals with tasks. See "Schedule Policies" in Appendix A for more information.

Structured Task Groups and Task Handles

In addition to the task_group class, PPL also provides a lower-level interface called the structured_task_group class, which is documented on MSDN®.

Structured task groups have lower overhead than the task groups, but there are restrictions on how structured task groups can be used. These restrictions require stack-based work functions and strict nesting of subordinate structured task groups. Although the task groups are recommended for most application programming, structured task groups are important for implementing parallel libraries. For example, the PPL parallel_invoke function is implemented with structured task groups. The parallel_invoke function is usually enough in cases of strict nested parallelism, and because it is much easier to use than structured task groups, you probably won’t ever need to use structured task groups directly.

PPL includes a data type named the task_handle class. It encapsulates a work function used by a task. One of the overloaded versions of the task_group class’s run method accepts a task handle as its argument. Task handles are created by means of the make_task function. Most applications will never need access to task handles; however, you must use task handles with structured task groups. Unlike lambda expressions, task handles require explicit memory management by your application.

Lightweight Tasks

In addition to the task_group objects that were described in this chapter, the Concurrency Runtime provides lower-level APIs that may be useful to some programmers, especially those who are adapting existing applications that create many threads. However, in general it is recommended that programmers start with the Parallel Patterns Library (PPL) or the Asynchronous Agents Library.

See the "Lightweight Tasks" section of Appendix A, "The Task Scheduler and Resource Manager" for more information.

Exercises

The image blender example in this chapter uses task parallelism: a different task processes each image layer. A typical strategy in image processing uses data parallelism: the same computation processes different portions of an image or different images. Is there a way to use data parallelism in the image blender example? If there is, what are the advantages and disadvantages, compared to the task parallelism discussed here?
In the image blender sample, the image processing methods SetToGray and Rotate are void methods that do not return results, but they save their results by updating their second argument. Why don't they return their results?
In the image blender sample that uses task_group::run method, what happens if one of the parallel tasks throws an exception? Answer the same question for the sample that uses the parallel_invoke function.

3: Parallel Tasks

The Basics

An Example

Variations

Coordinating Tasks with Cooperative Blocking

Canceling a Task Group

Handling Exceptions

Speculative Execution

Anti-Patterns

Variables Captured by Closures

Unintended Propagation of Cancellation Requests

The Cost of Synchronization

Design Notes

Task Group Calling Conventions

Tasks and Threads

How Tasks Are Scheduled

Structured Task Groups and Task Handles

Lightweight Tasks

Exercises

Further Reading

Additional resources