parallel_for_each Function (C++ AMP)

Article
04/28/2015

Runs a function across the compute domain. For more information, see C++ AMP Overview.

template <
   int _Rank,
   typename _Kernel_type
>
void parallel_for_each(
   const extent<_Rank>& _Compute_domain,
   const _Kernel_type &_Kernel
);

template <
   int _Dim0,
   int _Dim1,
   int _Dim2,
   typename _Kernel_type
>
void parallel_for_each(
   const tiled_extent<_Dim0, _Dim1, _Dim2>& _Compute_domain,
   const _Kernel_type& _Kernel
);

template <
   int _Dim0,
   int _Dim1,
   typename _Kernel_type
>
void parallel_for_each(
   const tiled_extent<_Dim0, _Dim1>& _Compute_domain,
   const _Kernel_type& _Kernel
);

template <
   int _Dim0,
   typename _Kernel_type
>
void parallel_for_each(
   const tiled_extent<_Dim0>& _Compute_domain,
   const _Kernel_type& _Kernel
);

template <
   int _Rank,
   typename _Kernel_type
>
void parallel_for_each(
   const accelerator_view& _Accl_view,
   const extent<_Rank>& _Compute_domain,
   const _Kernel_type& _Kernel
);

template <
   int _Dim0,
   int _Dim1,
   int _Dim2,
   typename _Kernel_type
>
void parallel_for_each(
   const accelerator_view& _Accl_view,
   const tiled_extent<_Dim0, _Dim1, _Dim2>& _Compute_domain,
   const _Kernel_type& _Kernel
);

template <
   int _Dim0,
   int _Dim1,
   typename _Kernel_type
>
void parallel_for_each(
   const accelerator_view& _Accl_view,
   const tiled_extent<_Dim0, _Dim1>& _Compute_domain,
   const _Kernel_type& _Kernel
);

template <
   int _Dim0,
   typename _Kernel_type
>
void parallel_for_each(
   const accelerator_view& _Accl_view,
   const tiled_extent<_Dim0>& _Compute_domain,
   const _Kernel_type& _Kernel
);

Parameters

_Accl_view
The accelerator_view object to run the parallel computation on.
_Compute_domain
An extent object that contains the data for the computation.
_Dim0
The dimension of the tiled_extent object.
_Dim1
The dimension of the tiled_extent object.
_Dim2
The dimension of the tiled_extent object.
_Kernel
A lambda or function object that takes an argument of type "index<_Rank>" and performs the parallel computation.
_Kernel_type
A lambda or functor.
_Rank
The rank of the extent.

Remarks

The parallel_for_each function starts data-parallel computations on accelerator devices. The basic behavior of parallel_for_each is like that of for_each, which runs a function on each element that's in a container. The basic components in a call to parallel_for_each are a compute domain, an index, and a kernel function. When parallel_for_each runs, a parallel activity is run for each index in the compute domain. You can use the parallel activity to access the elements in the input or output arrays. A call to parallel_for_each behaves as though it's synchronous. In practice, the call is asynchronous because it runs on a separate device. There are no guarantees about the order and concurrency of the parallel activities run by the non-tiled parallel_for_each. Activities communicate only by using atomic functions.

The tiled version of parallel_for_each organizes the parallel activities into tiles that have a fixed size and 1, 2, or 3 dimensions, as specified in the tiled_extent argument. Threads in the same tile have access to any variables declared with the tile_static keyword. You can use the tile_barrier::wait Method method to synchronize access to variables declared with the tile_static keyword. The following restrictions apply to the tiled parallel_for_each:

The product of the tile extent dimensions cannot exceed 1024.
- 3D: D0 * D1 * D2 ≤ 1024; and D0 ≤ 64
- 2D: D0 * D1 ≤ 1024
- 1D: D0 ≤ 1024
The tiled grid provided as the first parameter to parallel_for_each must be divisible, along each of its dimensions, by the corresponding tile extent.

For more information, see Using Tiles.

The parallel_for_each code runs on an accelerator, usually a GPU device. You can pass this accelerator explicitly to parallel_for_each as an optional accelerator_view parameter. Otherwise, the target accelerator is chosen from the objects of type array<T,N> that are captured in the kernel function. If all arrays aren’t bound to the same accelerator, an exception is thrown. The tiled_index argument passed to the kernel contains a collection of indexes, including those that are relative to the current tile.

The _Kernel parameter of the parallel_for_each function must be a lambda or function object. To run on an accelerator, the lambda must include the restrict(amp) clause, although it can have additional restrictions. The restriction clause imposes several restrictions of the kernel function. For more information, see restrict (C++ AMP).

You must be able to invoke the _Kernel argument by using one of the following argument types:

Non-tiled: index<N>, where N must be the same rank as the extent<N> used in the parallel_for_each.
Tiled: A tiled_index object whose dimensions match those of the tiled_extent object used in the call to parallel_for_each.

The kernel function must return void.

Because the kernel function doesn't take any other arguments, all other data operated on by the kernel must be captured in the lambda or function object. All captured data must be passed by value, except for array<T,N> objects, which must be captured by reference or pointer. Several restrictions also apply on the object types that can be captured. For more information, see restrict (C++ AMP).

If an error occurs when trying to start the parallel_for_each, call the runtime throws an exception. Exceptions can be thrown the following reasons:

Failure to create the shader.
Failure to create buffers.
Invalid extent passed.
Mismatched accelerators.

Requirements

Header: amp.h

Namespace: Concurrency

parallel_for_each Function (C++ AMP)

Parameters

Remarks

Requirements

See Also

Reference

Additional resources