Windows apps
Collapse the table of content
Expand the table of content
The topic you requested is included in another documentation set. For convenience, it's displayed below. Choose Switch to see the topic in its original location.

Graphics (C++ AMP)

C++ AMP contains several APIs in the Concurrency::graphics namespace that you can use to access the texture support on GPUs. Some common scenarios are:

  • You can use the texture class as a data container for computation and exploit the spatial locality of the texture cache and layouts of GPU hardware. Spatial locality is the property of data elements being physically close to each other.

  • The runtime provides efficient interoperability with non-compute shaders. Pixel, vertex, tesselation, and hull shaders frequently consume or produce textures that you can use in your C++ AMP computations. 

  • The graphics APIs in C++ AMP provide alternative ways to access sub-word packed buffers. Textures that have formats that represent texels (texture elements) that are composed of 8-bit or 16-bit scalars allow access to such packed data storage.

Note Note

The C++ AMP APIs do not provide texture sampling and filtering functionality. You must use the interoperability features of C++ AMP and then write the code in DirectCompute and HLSL.

The norm and unorm types are scalar types that limit the range of float values; this is known as clamping. These types can be explicitly constructed from other scalar types. In casting, the value is first cast to float and then clamped to the respective region that's allowed by norm [-1.0…1.0] or unorm [0.0…1.0]. Casting from +/- infinity returns +/-1. Casting from NaN is undefined. A norm can be implicitly constructed from a unorm and there is no loss of data. The implicit conversion operator to float is defined on these types. Binary operators are defined between these types and other built-in scalar types such as float and int: +, -, *, /, ==, !=, >, <, >=, <=. The compound assignment operators are also supported: +=, -=, *=, /=. The unary negation operator (-) is defined for norm types.

The Short Vector Library provides some of the functionality of the Vector Type that's defined in HLSL and is typically used to define texels. A short vector is a data structure that holds one to four values of the same type. The supported types are double, float, int, norm, uint, and unorm. The type names are shown in the following table. For each type, there is also a corresponding typedef that doesn't have an underscore in the name. The types that have the underscores are in the Concurrency::graphics Namespace. The types that don't have the underscores are in the Concurrency::graphics::direct3d Namespace so that they are clearly separated from the similarly-named fundamental types such as __int8 and __int16.

Length 2

Length 3

Length 4











































If an operator is defined between two short vectors, then it is also defined between a short vector and a scalar. Also, one of these must be true:

  • The scalar’s type must be the same as the short vector’s element type.

  • The scalar’s type can be implicitly converted to the vector’s element type by using only one user-defined conversion.

The operation is carried component-wise between each component of the short vector and the scalar. Here are the valid operators:

Operator type

Valid types

Binary operators

Valid on all types: +, -, *, /,

Valid on integer types: %, ^, |, &, <<, >>

The two vectors must have the same size, and the result is a vector of the same size.

Relational operators

Valid on all types: == and !=

Compound assignment operator

Valid on all types: +=, -=, *=, /=

Valid on integer types: %=, ^=, |=, &=, <<=, >>=

Increment and decrement operators

Valid on all types: ++, --

Both prefix and postfix are valid.

Bitwise NOT operator (~)

Valid on integer types.

Unary - operator

Valid on all types except unorm and uint.

The Short Vector Library supports the vector_type.identifier accessor construct to access the components of a short vector. The identifier, which is known as a swizzling expression, specifies the components of the vector. The expression can be an l-value or an r-value. Individual characters in the identifier may be: x, y, z, and w; or r, g, b, and a. "x" and "r" mean the zero-th component, "y" and "g" mean the first component, and so on. (Notice that "x" and "r" cannot be used in the same identifier.) Therefore, "rgba" and "xyzw" return the same result. Single-component accessors such as "x" and "y" are scalar value types. Multi-component accessors are short vector types. For example, if you construct an int_4 vector that's named fourInts and has the values 2, 4, 6, and 8, then fourInts.y returns the integer 4 and fourInts.rg returns an int_2 object that has the values 2 and 4.

Many GPUs have hardware and caches that are optimized to fetch pixels and texels and to render images and textures. The texture<T,N> class, which is a container class for texel objects, exposes the texture functionality of these GPUs. A texel can be:

  • An int, uint, float, double, norm, or unorm scalar.

  • A short vector that has two or four components. The only exception is double_4, which is not allowed.

The texture object can have a rank of 1, 2, or 3. The texture object can be captured only by reference in the lambda of a call to parallel_for_each. The texture is stored on the GPU as Direct3D texture objects. For more information about textures and texels in Direct3d, see Introduction to Textures in Direct3D 11.

The texel type you use might be one of the many texture formats that are used in graphics programming. For example, an RGBA format could use 32 bits, with 8 bits each for the R, G, B, and A scalar elements. The texture hardware of a graphics card can access the individual elements based on the format. For example, if you are using the RGBA format, the texture hardware can extract each 8-bit element into a 32-bit form. In C++ AMP, you can set the bits per scalar element of your texel so that you can automatically access the individual scalar elements in the code without using bit-shifting.

You can declare a texture object without initialization. The following code example declares several texture objects.

#include <amp.h>
#include <amp_graphics.h>
using namespace concurrency;
using namespace concurrency::graphics;

void declareTextures() {

    // Create a 16-texel texture of int. 
    texture<int, 1> intTexture1(16);  
    texture<int, 1> intTexture2(extent<1>(16)); 

    // Create a 16 x 32 texture of float_2.  
    texture<float_2, 2> floatTexture1(16, 32);  
    texture<float_2, 2> floatTexture2(extent<2>(16, 32));   

    // Create a 2 x 4 x 8 texture of uint_4. 
    texture<uint_4, 3> uintTexture1(2, 4, 8);  
    texture<uint_4, 3> uintTexture2(extent<3>(2, 4, 8));

You can also use a constructor to declare and initialize a texture object. The following code example instantiates a texture object from a vector of float_4 objects. The bits per scalar element is set to the default. You cannot use this constructor with norm, unorm, or the short vectors of norm and unorm, because they do not have a default bits per scalar element.

#include <amp.h>
#include <amp_graphics.h>
#include <vector>
using namespace concurrency;
using namespace concurrency::graphics;

void initializeTexture() {

    std::vector<int_4> texels;
    for (int i = 0; i < 768 * 1024; i++) {
        int_4 i4(i, i, i, i);
texture<int_4, 2> aTexture(768, 1024, texels.begin(), texels.end());

You can also declare and initialize a texture object by using a constructor overload that takes a pointer to the source data, the size of source data in bytes, and the bits per scalar element.

void createTextureWithBPC() {
    // Create the source data.
    float source[1024 * 2]; 
    for (int i = 0; i < 1024 * 2; i++) {
        source[i] = (float)i;

    // Initialize the texture by using the size of source in bytes
    // and bits per scalar element.
    texture<float_2, 1> floatTexture(1024, source, (unsigned int)sizeof(source), 32U); 

The textures in these examples are created on the default view of the default accelerator. You can use other overloads of the constructor if you want to specify an accelerator_view object. You cannot create a texture object on a CPU accelerator.

There are limits on the size of each dimension of the texture object, as shown in the following table. A run-time error is generated if you exceed the limits.


Size limitation







You can read from a texture object by using texture::operator[] Operator, texture::operator() Operator, or texture::get Method. texture::operator[] Operator and texture::operator() Operator return a value, not a reference. Therefore, you cannot write to a texture object by using texture::operator[] Operator.

void readTexture() {
    std::vector<int_2> src;    
    for (int i = 0; i < 16 *32; i++) {
        int_2 i2(i, i);

    std::vector<int_2> dst(16 * 32);  
    array_view<int_2, 2> arr(16, 32, dst);  

    const texture<int_2, 2> tex9(16, 32, src.begin(), src.end());  
    parallel_for_each(tex9.extent, [=, &tex9] (index<2> idx) restrict(amp) {          
        // Use the subscript operator.      
        arr[idx].x += tex9[idx].x; 
        // Use the function () operator.      
        arr[idx].x += tex9(idx).x; 
        // Use the get method.
        arr[idx].y += tex9.get(idx).y; 
        // Use the function () operator.  
        arr[idx].y += tex9(idx[0], idx[1]).y; 


The following code example demonstrates how to store texture channels in a short vector, and then access the individual scalar elements as properties of the short vector.

void UseBitsPerScalarElement() {
    // Create the image data. 
    // Each unsigned int (32-bit) represents four 8-bit scalar elements(r,g,b,a values).
    const int image_height = 16;
    const int image_width = 16;
    std::vector<unsigned int> image(image_height * image_width);

    extent<2> image_extent(image_height, image_width);

    // By using uint_4 and 8 bits per channel, each 8-bit channel in the data source is 
    // stored in one 32-bit component of a uint_4.
    texture<uint_4, 2> image_texture(image_extent,, image_extent.size() * 4U,  8U);

    // Use can access the RGBA values of the source data by using swizzling expressions of the uint_4.
         [&image_texture](index<2> idx) restrict(amp) 
        // 4 bytes are automatically extracted when reading.
        uint_4 color = image_texture[idx]; 
        unsigned int r = color.r; 
        unsigned int g = color.g; 
        unsigned int b = color.b; 
        unsigned int a = color.a; 

The following table lists the valid bits per channel for each sort vector type.

Texture data type

Valid bits per scalar element

int, int_2, int_4

uint, uint_2, uint_4

8, 16, 32

float, float_2, float_4

16, 32

double, double_2


norm, norm_2, norm_4

unorm, unorm_2, unorm, 4

8, 16

Use the texture::set method to write to texture objects. A texture object can be readonly or read/write. For a texture object to be readable and writeable, the following conditions must be true:

  • T has only one scalar component. (Short vectors are not allowed.)

  • T is not double, norm, or unorm.

  • The texture::bits_per_scalar_element property is 32.

If all three are not true, then the texture object is readonly. The first two conditions are checked during compilation. A compilation error is generated if you have code that tries to write to a readonly texture object. The condition for texture::bits_per_scalar_element is detected at run time, and the runtime generates the unsupported_feature exception if you try to write to a readonly texture object.

The following code example writes values to a texture object.

void writeTexture() {
    texture<int, 1> tex1(16); 
    parallel_for_each(tex1.extent, [&tex1] (index<1> idx) restrict(amp) {    
        tex1.set(idx, 0); 


The writeonly_texture_view class provides a writeonly view of a texture object. The writeonly_texture_view object must be captured by value in the lambda expression. The following code example uses a writeonly_texture_view object to write to a texture object that has two components (int_2).

void write2ComponentTexture() {
    texture<int_2, 1> tex4(16); 
    writeonly_texture_view<int_2, 1> wo_tv4(tex4); 
    parallel_for_each(extent<1>(16), [=] (index<1> idx) restrict(amp) {   
        wo_tv4.set(idx, int_2(1, 1)); 

Use can copy between texture objects by using the copy function or the copy_async function, as shown in the following code example.

void copyHostArrayToTexture() {
    // Copy from source array to texture object by using the copy function.
    float floatSource[1024 * 2]; 
    for (int i = 0; i < 1024 * 2; i++) {
        floatSource[i] = (float)i;
    texture<float_2, 1> floatTexture(1024);
    copy(floatSource, (unsigned int)sizeof(floatSource), floatTexture); 

    // Copy from source array to texture object by using the copy function.
    char charSource[16 * 16]; 
    for (int i = 0; i < 16 * 16; i++) {
        charSource[i] = (char)i;
    texture<int, 2> charTexture(16, 16, 8U);
    copy(charSource, (unsigned int)sizeof(charSource), charTexture); 
    // Copy from texture object to source array by using the copy function.
    copy(charTexture, charSource, (unsigned int)sizeof(charSource)); 

You can also copy from one texture to another by using the texture::copy_to method. The two textures can be on different accelerator_views. When you copy to a writeonly_texture_view object, the data is copied to the underlying texture object. The bits per scalar element and the extent must be the same on the source and destination texture objects. If those requirements are not met, the runtime throws an exception.

The C++ AMP runtime supports interoperability between texture<T,1> and the ID3D11Texture1D interface, between texture<T,2> and the ID3D11Texture2D interface, and between texture<T,3> and the ID3D11Texture3D interface. The get_texture method takes a texture object and returns an IUnknown interface. The make_texture method takes an IUnknown interface and an accelerator_view object and returns a texture object.

© 2018 Microsoft