This topic has not yet been rated - Rate this topic

parallel_for_each Function (C++ AMP)

Visual Studio 11

[This documentation is for preview only, and is subject to change in later releases. Blank topics are included as placeholders.]

Runs a function across the compute domain. For more information, see Overview of C++ Accelerated Massive Parallelism (C++ AMP).


          
template <
   int _Rank,
   typename _Kernel_type
                    
>
void parallel_for_each(
   const extent<_Rank>& _Compute_domain,
   const _Kernel_type &_Kernel
                    
);
template <
   int _Dim0,
   int _Dim1,
   int _Dim2,
   typename _Kernel_type
                    
>
void parallel_for_each(
   const tiled_extent<_Dim0,
   _Dim1,
   _Dim2>& _Compute_domain,
   const _Kernel_type& _Kernel
                    
);
template <
   int _Dim0,
   int _Dim1,
   typename _Kernel_type
                    
>
void parallel_for_each(
   const tiled_extent<_Dim0,
   _Dim1>& _Compute_domain,
   const _Kernel_type& _Kernel
                    
);
template <
   int _Dim0,
   typename _Kernel_type
                    
>
void parallel_for_each(
   const tiled_extent<_Dim0>& _Compute_domain,
   const _Kernel_type& _Kernel
                    
);
template <
   int _Rank,
   typename _Kernel_type
                    
>
void parallel_for_each(
   const accelerator_view& _Accl_view,
   const extent<_Rank>& _Compute_domain,
   const _Kernel_type& _Kernel
                    
);
template <
   int _Dim0,
   int _Dim1,
   int _Dim2,
   typename _Kernel_type
                    
>
void parallel_for_each(
   const accelerator_view& _Accl_view,
   const tiled_extent<_Dim0,
   _Dim1,
   _Dim2>& _Compute_domain,
   const _Kernel_type& _Kernel
                    
);
template <
   int _Dim0,
   int _Dim1,
   typename _Kernel_type
                    
>
void parallel_for_each(
   const accelerator_view& _Accl_view,
   const tiled_extent<_Dim0,
   _Dim1>& _Compute_domain,
   const _Kernel_type& _Kernel
                    
);
template <
   int _Dim0,
   typename _Kernel_type
                    
>
void parallel_for_each(
   const accelerator_view& _Accl_view,
   const tiled_extent<_Dim0>& _Compute_domain,
   const _Kernel_type& _Kernel
                    
);
        
_Kernel_type

A lambda or functor.

_Rank

The grid rank.

_Z

The z component of the tiled_grid object.

_Y

The y component of the tile_grid object.

_X

The x component of the tile_grid object.

_Compute_domain

The data container.

_Kernel

A lambda or functor.

The parallel_for_each function starts data-parallel computations on accelerator devices. The basic behavior of parallel_for_each is like that of for_each, which runs a function on each element that's in a container.

The basic components in a call to parallel_for_each are a compute domain, an index, and a kernel function.

When parallel_for_each runs, a parallel activity is run for each index in the compute domain. You can use the parallel activity to access the elements in the input or output arrays.

A call to parallel_for_each behaves as though it's synchronous. In practice, the call is asynchronous because it runs on a separate device.

There are no guarantees about the order and concurrency of the parallel activities run by the non-tiled parallel_for_each. Activities communicate only by using atomics.

The tiled version of parallel_for_each organizes the parallel activities into tiles that have a fixed size and 1, 2, or 3 dimensions, as specified in the tiled_grid argument. You can't create tiles with more than three dimensions. Threads in the same tile have access to any variables declared with the tile_static keyword. You can use tiled_index::barrier.wait to synchronize access to variables declared with the tile_static keyword. The following restrictions apply to the tiled parallel_for_each:

  • The product of the tile extent dimensions cannot exceed 1024.

    • 3D: D0 * D1 * D2 ≤ 1024; and D0 ≤ 64

    • 2D: D0 * D1 ≤ 1024

    • 1D: D0 ≤ 1024

  • The tiled grid provided as the first parameter to parallel_for_each must be divisible, along each of its dimensions, by the corresponding tile extent.

The parallel_for_each code runs on an accelerator, usually a GPU device. You can pass this accelerator explicitly to parallel_for_each as an optional accelerator_view parameter. Otherwise, the target accelerator is chosen from the objects of type array<T,N> that are captured in the kernel function. If all arrays aren’t bound to the same accelerator, an exception is thrown. The tiled_index argument passed to the kernel contains a collection of indexes, including those that are relative to the current tile.

The _Kernel parameter of the parallel_for_each function must be a lambda or functor. To run on an accelerator, the lambda must include the restrict(amp) clause, although it can have additional restrictions. The call graph emanating from the kernel must also be visible for inlining. You can also use a custom function object, but the function call operator must still be marked restrict(amp). The functor can't have a function call operator that's overloaded, and the function call operator must be inline. For more information, see Restriction Clause (C++ AMP).

You must be able to invoke the _Kernel argument by using one of the following argument types:

  • Non-tiled: index<N>, where N must be the same rank as the grid<N> used in the parallel_for_each.

  • Tiled: tiled_index<[[Z,] Y,] X>, where the tile extents must match those of the tiled_grid objects used in the parallel_for_each.

The kernel function must return void.

Because the kernel function doesn't take any other arguments, all other data operated on by the kernel must be captured in the lambda or function object. All captured data must be passed by value, except for array<T,N> objects, which must be captured by reference or pointer. The following restrictions also apply on the object types that can be captured:

  • No pointers or references (exception: array<T,N>)

  • No char, short, or long long types

  • No bool types

  • structs or classes that contain supported types are allowed

  • No objects that contain virtual functions or virtual bases

  • Structs or classes must be PODs (more formally, they must be copyable by blitting)

If an error occurs trying to start the parallel_for_each, the runtime throws an exception. Exceptions can be thrown the following reasons:

  • Failure to create the shader.

  • Failure to create buffers.

  • Invalid grid passed.

  • Mismatched accelerators.

Header: amp.h

Namespace: Concurrency

Did you find this helpful?
(1500 characters remaining)