Export (0) Print
Expand All

How to: Use combinable to Improve Performance

This example shows how to use the concurrency::combinable class to compute the sum of the numbers in a std::array object that are prime. The combinable class improves performance by eliminating shared state.

Tip Tip

In some cases, parallel map (concurrency::parallel_transform) and reduce (concurrency:: parallel_reduce) can provide performance improvements over combinable. For an example that uses map and reduce operations to produce the same results as this example, see Parallel Algorithms.

The following example uses the std::accumulate function to compute the sum of the elements in an array that are prime. In this example, a is an array object and the is_prime function determines whether its input value is prime.

prime_sum = accumulate(begin(a), end(a), 0, [&](int acc, int i) {
   return acc + (is_prime(i) ? i : 0);
});

The following example shows a naïve way to parallelize the previous example. This example uses the concurrency::parallel_for_each algorithm to process the array in parallel and a concurrency::critical_section object to synchronize access to the prime_sum variable. This example does not scale because each thread must wait for the shared resource to become available.

critical_section cs;
prime_sum = 0;
parallel_for_each(begin(a), end(a), [&](int i) {
   cs.lock();
   prime_sum += (is_prime(i) ? i : 0);
   cs.unlock();
});

The following example uses a combinable object to improve the performance of the previous example. This example eliminates the need for synchronization objects; it scales because the combinable object enables each thread to perform its task independently.

A combinable object is typically used in two steps. First, produce a series of fine-grained computations by performing work in parallel. Next, combine (or reduce) the computations into a final result. This example uses the concurrency::combinable::local method to obtain a reference to the local sum. It then uses the concurrency::combinable::combine method and a std::plus object to combine the local computations into the final result.

combinable<int> sum;
parallel_for_each(begin(a), end(a), [&](int i) {
   sum.local() += (is_prime(i) ? i : 0);
});
prime_sum = sum.combine(plus<int>());

The following complete example computes the sum of prime numbers both serially and in parallel. The example prints to the console the time that is required to perform both computations.

// parallel-sum-of-primes.cpp 
// compile with: /EHsc
#include <windows.h>
#include <ppl.h>
#include <array>
#include <numeric>
#include <iostream>

using namespace concurrency;
using namespace std;

// Calls the provided work function and returns the number of milliseconds  
// that it takes to call that function. 
template <class Function>
__int64 time_call(Function&& f)
{
   __int64 begin = GetTickCount();
   f();
   return GetTickCount() - begin;
}

// Determines whether the input value is prime. 
bool is_prime(int n)
{
   if (n < 2)
      return false;
   for (int i = 2; i < n; ++i)
   {
      if ((n % i) == 0)
         return false;
   }
   return true;
}

int wmain()
{   
   // Create an array object that contains 200000 integers. 
   array<int, 200000> a;

   // Initialize the array such that a[i] == i.
   iota(begin(a), end(a), 0);

   int prime_sum;
   __int64 elapsed;

   // Compute the sum of the numbers in the array that are prime.
   elapsed = time_call([&] {
      prime_sum = accumulate(begin(a), end(a), 0, [&](int acc, int i) {
         return acc + (is_prime(i) ? i : 0);
      });
   });   
   wcout << prime_sum << endl;   
   wcout << L"serial time: " << elapsed << L" ms" << endl << endl;

   // Now perform the same task in parallel.
   elapsed = time_call([&] {
      combinable<int> sum;
      parallel_for_each(begin(a), end(a), [&](int i) {
         sum.local() += (is_prime(i) ? i : 0);
      });
      prime_sum = sum.combine(plus<int>());
   });
   wcout << prime_sum << endl;
   wcout << L"parallel time: " << elapsed << L" ms" << endl << endl;
}

The following sample output is for a computer that has four processors.

1709600813
serial time: 6178 ms

1709600813
parallel time: 1638 ms

To compile the code, copy it and then paste it in a Visual Studio project, or paste it in a file that is named parallel-sum-of-primes.cpp and then run the following command in a Visual Studio Command Prompt window.

cl.exe /EHsc parallel-sum-of-primes.cpp

For an example that uses map and reduce operations to produce the same results, see Parallel Algorithms.

Show:
© 2014 Microsoft