_mm256_permute2_ps

Visual Studio 2010

Visual Studio 2010 SP1 is required.

Microsoft Specific

Generates the XOP YMM instruction vpermil2ps to select floating-point values from its first two sources, with optional zeroing.

__m256 _mm_permute2_ps (
   __m256 src1,
   __m256 src2,
   __m256i selector,
   int control
); 

[in] src1

A 256-bit parameter that contains eight 32-bit floating-point values.

[in] src2

A 256-bit parameter that contains eight 32-bit floating-point values.

[in] selector

A 256-bit parameter that contains eight 32-bit integer selector values.

[in] control

A 32-bit integer parameter that controls the method of deciding whether to zero values in the result.

A 256-bit result r that contains eight 32-bit floating-point values.

Each value in the high 128 bits of the result is either zero or a value chosen from the four 32-bit floating-point values in the high 128 bits of src1 and src2. Each value in the low 128 bits of the result is either zero or a value chosen from the four 32-bit floating-point values in the low 128 bits of src1 and src2.

Intrinsic

Architecture

_mm256_permute2_ps

XOP

Header file <intrin.h>

Each of the four doublewords in the high 128 bits of selector selects the value for its corresponding doubleword of the result from one of the eight 32-bit floating-point values in the high 128 bits of src1 and src2. This value may be replaced by zero before being written to the result, depending on the value of control and the value of bit 3 of the selector doubleword. Similarly, each of the four doublewords in the low 128 bits of selector selects a value from one of the eight 32-bit floating-point values in the low 128 bits of src1 and src2, and this value may also be replaced by zero.

For each doubleword in the high 128 bits of selector, the three low-order bits select one of the floating-point values in src1 or src2, with values 0 through 3 selecting src1[4] through src1[7] and values 4 through 7 selecting src2[4] through src2[7]. For each doubleword in the low 128 bits of selector, the three low-order bits select one of the floating-point values in src1 or src2, with values 0 through 3 selecting src1[0] through src1[3] and values 4 through 7 selecting src2[0] through src2[3].

The next bit of each doubleword in selector will be referred to as the "match" bit below. The high-order 28 bits of each doubleword in selector are ignored.

The fourth source, control, determines the conditions under which result values will be set to 0. The value of control must be 0, 1, 2, or 3. If control is 0 or 1, the selected floating-point value is written to the destination. If control is 2, then the selected floating-point value is written to the destination if the corresponding match bit in selector is 0, but zero is written if the match bit is 1. If control is 3, then the selected floating-point value is written to the destination if the corresponding match bit is 1, but zero is written if the match bit is 0.

The vpermil2ps instruction is part of the XOP family of instructions. Before you use this intrinsic, you must ensure that the processor supports this instruction. To determine hardware support for this instruction, call the __cpuid intrinsic with InfoType = 0x80000001 and check bit 11 of CPUInfo[2] (ECX). This bit is 1 when the instruction is supported, and 0 otherwise.

#include <stdio.h>
#include <intrin.h>
int main()
{
    __m256 a, b, d;
    __m256i select;
    int i;
    for (i = 0; i < 8; i++) {
        a.m256_f32[i] = i;
        b.m256_f32[i] = i+8;
    }
    select.m256i_i32[0] = 5;
    select.m256i_i32[1] = 1 + 8; // turn on match bit
    select.m256i_i32[2] = 2;
    select.m256i_i32[3] = 6 + 8; // turn on match bit
    select.m256i_i32[4] = 5 + 8; // turn on match bit
    select.m256i_i32[5] = 1;
    select.m256i_i32[6] = 2 + 8; // turn on match bit
    select.m256i_i32[7] = 6;

    d = _mm256_permute2_ps(a, b, select, 0); // just select, don't zero
    for (i = 0; i < 8; i++) printf_s(" %6.3f", d.m256_f32[i]);
    printf_s("\n");
    d = _mm256_permute2_ps(a, b, select, 2); // zero if match is 1
    for (i = 0; i < 8; i++) printf_s(" %6.3f", d.m256_f32[i]);
    printf_s("\n");
    d = _mm256_permute2_ps(a, b, select, 3); // zero if match is 0
    for (i = 0; i < 8; i++) printf_s(" %6.3f", d.m256_f32[i]);
    printf_s("\n");
}
  9.000  1.000  2.000 10.000 13.000  5.000  6.000 14.000
  9.000  0.000  2.000  0.000  0.000  5.000  0.000 14.000
  0.000  1.000  0.000 10.000 13.000  0.000  6.000  0.000

Community Additions

ADD
Show: