How to: Use combinable to Improve Performance
This example shows how to use the concurrency::combinable class to compute the sum of the numbers in a std::array object that are prime. The combinable class improves performance by eliminating shared state.
Tip |
---|
In some cases, parallel map (concurrency::parallel_transform) and reduce (concurrency:: parallel_reduce) can provide performance improvements over combinable. For an example that uses map and reduce operations to produce the same results as this example, see Parallel Algorithms. |
The following example uses the std::accumulate function to compute the sum of the elements in an array that are prime. In this example, a is an array object and the is_prime function determines whether its input value is prime.
prime_sum = accumulate(begin(a), end(a), 0, [&](int acc, int i) { return acc + (is_prime(i) ? i : 0); });
The following example shows a naïve way to parallelize the previous example. This example uses the concurrency::parallel_for_each algorithm to process the array in parallel and a concurrency::critical_section object to synchronize access to the prime_sum variable. This example does not scale because each thread must wait for the shared resource to become available.
critical_section cs;
prime_sum = 0;
parallel_for_each(begin(a), end(a), [&](int i) {
cs.lock();
prime_sum += (is_prime(i) ? i : 0);
cs.unlock();
});
The following example uses a combinable object to improve the performance of the previous example. This example eliminates the need for synchronization objects; it scales because the combinable object enables each thread to perform its task independently.
A combinable object is typically used in two steps. First, produce a series of fine-grained computations by performing work in parallel. Next, combine (or reduce) the computations into a final result. This example uses the concurrency::combinable::local method to obtain a reference to the local sum. It then uses the concurrency::combinable::combine method and a std::plus object to combine the local computations into the final result.
combinable<int> sum; parallel_for_each(begin(a), end(a), [&](int i) { sum.local() += (is_prime(i) ? i : 0); }); prime_sum = sum.combine(plus<int>());
The following complete example computes the sum of prime numbers both serially and in parallel. The example prints to the console the time that is required to perform both computations.
// parallel-sum-of-primes.cpp // compile with: /EHsc #include <windows.h> #include <ppl.h> #include <array> #include <numeric> #include <iostream> using namespace concurrency; using namespace std; // Calls the provided work function and returns the number of milliseconds // that it takes to call that function. template <class Function> __int64 time_call(Function&& f) { __int64 begin = GetTickCount(); f(); return GetTickCount() - begin; } // Determines whether the input value is prime. bool is_prime(int n) { if (n < 2) return false; for (int i = 2; i < n; ++i) { if ((n % i) == 0) return false; } return true; } int wmain() { // Create an array object that contains 200000 integers. array<int, 200000> a; // Initialize the array such that a[i] == i. iota(begin(a), end(a), 0); int prime_sum; __int64 elapsed; // Compute the sum of the numbers in the array that are prime. elapsed = time_call([&] { prime_sum = accumulate(begin(a), end(a), 0, [&](int acc, int i) { return acc + (is_prime(i) ? i : 0); }); }); wcout << prime_sum << endl; wcout << L"serial time: " << elapsed << L" ms" << endl << endl; // Now perform the same task in parallel. elapsed = time_call([&] { combinable<int> sum; parallel_for_each(begin(a), end(a), [&](int i) { sum.local() += (is_prime(i) ? i : 0); }); prime_sum = sum.combine(plus<int>()); }); wcout << prime_sum << endl; wcout << L"parallel time: " << elapsed << L" ms" << endl << endl; }
The following sample output is for a computer that has four processors.
1709600813 serial time: 6178 ms 1709600813 parallel time: 1638 ms
For an example that uses map and reduce operations to produce the same results, see Parallel Algorithms.