Click to Rate and Give Feedback
MSDN
MSDN Library
Visual Studio
Visual C++
Concurrency Runtime
Collapse All/Expand All Collapse All
Visual Studio 2010 - Visual C++
Parallel Patterns Library (PPL)

[This documentation is for preview only, and is subject to change in later releases. Blank topics are included as placeholders.]

The Parallel Patterns Library (PPL) provides an imperative programming model that promotes scalability and ease-of-use for developing concurrent applications. The PPL builds on the scheduling and resource management components of the Concurrency Runtime. It raises the level of abstraction between your application code and the underlying threading mechanism by providing generic, type-safe algorithms and containers that act on data in parallel. The PPL also lets you develop applications that scale by providing alternatives to shared state.

The PPL provides the following features:

  • Task Parallelism: a mechanism to execute several work items (tasks) in parallel

  • Parallel algorithms: generic algorithms that act on collections of data in parallel

  • Parallel containers and objects: generic container types that provide safe concurrent access to their elements

The PPL provides a programming model that resembles the Standard Template Library (STL). For example, consider the following program that computes several Fibonacci numbers serially and in parallel. Both methods act on an array object. The example also prints to the console the time that is required to perform both computations.

The serial version uses the STL for_each algorithm to traverse the array and stores the results in a vector object. The parallel version performs the same task, but uses the PPL parallel_for_each algorithm and stores the results in a concurrent_vector object. The concurrent_vector class enables each loop iteration to concurrently add elements without the requirement to synchronize write access to the container.

Because parallel_for_each acts concurrently, the parallel version of this example must sort the concurrent_vector object to produce the same results as the serial version.

Note that the example uses a naïve method to compute the Fibonacci numbers; however, this method illustrates how the Concurrency Runtime can improve the performance of long computations.

Visual C++
// parallel-fibonacci.cpp
// compile with: /EHsc
#include <windows.h>
#include <ppl.h>
#include <concurrent_vector.h>
#include <array>
#include <vector>
#include <tuple>
#include <algorithm>
#include <iostream>

using namespace Concurrency;
using namespace std;
using namespace std::tr1;

// Calls the provided work function and returns the number of milliseconds 
// that it takes to call that function.
template <class Function>
__int64 time_call(Function&& f)
{
   __int64 begin = GetTickCount();
   f();
   return GetTickCount() - begin;
}

// Computes the nth Fibonacci number.
int fibonacci(int n)
{
   if(n < 2)
      return n;
   return fibonacci(n-1) + fibonacci(n-2);
}

int wmain()
{
   __int64 elapsed;

   // An array of Fibonacci numbers to compute.
   array<int, 4> a = { 41, 24, 26, 42 };

   // The results of the serial computation.
   vector<tuple<int,int>> results1;

   // The results of the parallel computation.
   concurrent_vector<tuple<int,int>> results2;

   // Use the for_each algorithm to compute the results serially.
   elapsed = time_call([&] 
   {
      for_each (a.begin(), a.end(), [&](int n) {
         results1.push_back(make_tuple(n, fibonacci(n)));
      });
   });   
   wcout << L"serial time: " << elapsed << L" ms" << endl;

   // Use the parallel_for_each algorithm to perform the same task.
   elapsed = time_call([&] 
   {
      parallel_for_each (a.begin(), a.end(), [&](int n) {
         results2.push_back(make_tuple(n, fibonacci(n)));
      });

      // Because parallel_for_each acts concurrently, the results do not 
      // have a pre-determined order. Sort the concurrent_vector object
      // so that the results match the serial version.
      sort(results2.begin(), results2.end());
   });   
   wcout << L"parallel time: " << elapsed << L" ms" << endl << endl;

   // Print the results.
   for_each (results2.begin(), results2.end(), [](tuple<int,int>& pair) {
      wcout << L"fib(" << get<0>(pair) << L"): " << get<1>(pair) << endl;
   });
}

The following sample output is for a computer that has four processors.

serial time: 9250 ms
parallel time: 5726 ms

fib(24): 46368
fib(26): 121393
fib(41): 165580141
fib(42): 267914296

Each iteration of the loop requires a different amount of time to finish. The performance of parallel_for_each is bounded by the operation that finishes last. Therefore, you should not expect linear performance improvements between the serial and parallel versions of this example.

The PPL supports both structured and unstructured parallelism. Each model has advantages over the other.

Structured parallelism refers to parallel code that is scheduled and finishes completely in the lexical scope from which it starts. Under the structured parallelism model, a task does not finish until its child tasks finish. The parallel algorithms follow the structured parallelism model. For example, parallel_for, parallel_for_each, and parallel_invoke are examples of structured parallelism because all parallel code finishes in the context of the function call. The structured_task_group class also follows the structured parallelism model because it can be called and waited on only by the thread that created it. In addition, the nested, or child, tasks of a structured_task_group object finish before the outer tasks finish.

Unstructured parallelism refers to parallel code that can finish in a different context from which it starts. For example, the task_group class enables you start tasks in one context, and then wait for or cancel those tasks in another context. In the Concurrency Runtime, types that follow the unstructured parallelism model (such as task_group) are often more flexible than types that follow the structured model (such as structured_task_group). However, types that follow the structured model could lead to better performing code.

For more information about parallel algorithms, see Parallel Algorithms. For more information about the differences between the task_group and structured_task_group classes, see Task Parallelism.

Task Parallelism

Describes how to use parallel task mechanisms such as structured and unstructured task groups.

Parallel Algorithms

Describes how to use parallel algorithms such as parallel_for and parallel_for_each.

Parallel Containers and Objects

Describes the various parallel containers and objects that are provided by the PPL.

Cancellation in the PPL

Explains how to cancel the work that is being performed by a parallel algorithm.

Concurrency Runtime

Describes the Concurrency Runtime, which simplifies parallel programming, and contains links to related topics.

© 2009 Microsoft Corporation. All rights reserved. Terms of Use | Trademarks | Privacy Statement | Site Feedback
Page view tracker