# How to: Write a parallel_for_each Loop

**Visual Studio 2013**

This example shows how to use the concurrency::parallel_for_each algorithm to compute the count of prime numbers in a std::array object in parallel.

The following example computes the count of prime numbers in an array two times. The example first uses the std::for_each algorithm to compute the count serially. The example then uses the **parallel_for_each** algorithm to perform the same task in parallel. The example also prints to the console the time that is required to perform both computations.

// parallel-count-primes.cpp // compile with: /EHsc #include <windows.h> #include <ppl.h> #include <iostream> #include <algorithm> #include <array> using namespace concurrency; using namespace std; // Calls the provided work function and returns the number of milliseconds // that it takes to call that function. template <class Function> __int64 time_call(Function&& f) { __int64 begin = GetTickCount(); f(); return GetTickCount() - begin; } // Determines whether the input value is prime. bool is_prime(int n) { if (n < 2) return false; for (int i = 2; i < n; ++i) { if ((n % i) == 0) return false; } return true; } int wmain() { // Create an array object that contains 200000 integers. array<int, 200000> a; // Initialize the array such that a[i] == i. int n = 0; generate(begin(a), end(a), [&] { return n++; }); LONG prime_count; __int64 elapsed; // Use the for_each algorithm to count the number of prime numbers // in the array serially. prime_count = 0L; elapsed = time_call([&] { for_each (begin(a), end(a), [&](int n ) { if (is_prime(n)) ++prime_count; }); }); wcout << L"serial version: " << endl << L"found " << prime_count << L" prime numbers" << endl << L"took " << elapsed << L" ms" << endl << endl; // Use the parallel_for_each algorithm to count the number of prime numbers // in the array in parallel. prime_count = 0L; elapsed = time_call([&] { parallel_for_each (begin(a), end(a), [&](int n ) { if (is_prime(n)) InterlockedIncrement(&prime_count); }); }); wcout << L"parallel version: " << endl << L"found " << prime_count << L" prime numbers" << endl << L"took " << elapsed << L" ms" << endl << endl; }

The following sample output is for a computer that has four processors.

serial version: found 17984 prime numbers took 6115 ms parallel version: found 17984 prime numbers took 1653 ms

The lambda expression that the example passes to the **parallel_for_each** algorithm uses the **InterlockedIncrement** function to enable parallel iterations of the loop to increment the counter simultaneously. If you use functions such as **InterlockedIncrement** to synchronize access to shared resources, you can present performance bottlenecks in your code. You can use a lock-free synchronization mechanism, for example, the concurrency::combinable class, to eliminate simultaneous access to shared resources. For an example that uses the **combinable** class in this manner, see How to: Use combinable to Improve Performance.