Visual Studio 2010

Visual Studio 2010 SP1 is required.

Microsoft Specific

Generates the XOP instruction vpshaw to do an arithmetic shift of each of the words in its first source by an amount specified in the second.

__m128i _mm_sha_epi16 (
   __m128i src,
   __m128i counts

[in] src

A 128-bit parameter that contains eight 16-bit signed integers.

[in] counts

A 128-bit parameter that contains sixteen 8-bit signed integers.

A 128-bit result r that contains eight 16-bit signed integers.

r[i] := (counts[2*i] >= 0) ? src[i] << counts[2*i]) : 
                           src[i] >> -counts[2*i]);





Header file <intrin.h>

Each 16-bit signed integer value in src is shifted by the number of bits specified in the corresponding value in counts, and the 16-bit signed integer result is stored as the corresponding value in the destination. If the value in counts is positive, the shift is to the left (toward the most significant bit) and zeros are shifted in at the right end; otherwise, it is to the right and copies of the sign bit are shifted in at the left end. If a shift count is greater than 15 the corresponding result value is 0; if a shift count is less than -15, the result is -1 if the value in src is negative, 0 otherwise. The other values in counts are ignored.

The vpshaw instruction is part of the XOP family of instructions. Before you use this intrinsic, you must ensure that the processor supports this instruction. To determine hardware support for this instruction, call the __cpuid intrinsic with InfoType = 0x80000001 and check bit 11 of CPUInfo[2] (ECX). This bit is 1 when the instruction is supported, and 0 otherwise.

#include <stdio.h>
#include <intrin.h>
int main()
    __m128i a, b, d;
    int i;
    for (i = 0; i < 8; i++) {
        a.m128i_u16[i] = (2*(i+1)) << 12 | (15 - 2*(i+1)) << 8 |
                          2*i << 4 | (15 - 2*i);
        b.m128i_i8[2*i] = 3*i - 12;
    printf_s("data:       ");
    for (i = 0; i < 8; i++) printf_s(" %04x", a.m128i_u16[i]);
    printf_s("\nshifted by  ");
    for (i = 0; i < 8; i++) printf_s(" %4d", b.m128i_i8[2*i]);
    d = _mm_sha_epi16(a, b);
    printf_s("\ngives       ");
    for (i = 0; i < 8; i++) printf_s(" %04x", d.m128i_u16[i]);
data:        2d0f 4b2d 694b 8769 a587 c3a5 e1c3 ffe1
shifted by    -12   -9   -6   -3    0    3    6    9
gives        0002 0025 01a5 f0ed a587 1d28 70c0 c200

