_mm_dp_ps
Microsoft Specific
Emits the Streaming SIMD Extensions 4 (SSE4) instruction dpps. This instruction computes the dot product of single precision floating point values.
__m128 _mm_dp_ps( __m128 a, __m128 b, const int mask );
A 128 bit parameter that contains the 32-bit results of the dot products.
The result can be expressed with the following equations:
tmp0 := (mask4 == 1) ? (a0 * b0) : +0.0 tmp1 := (mask5 == 1) ? (a1 * b1) : +0.0 tmp2 := (mask6 == 1) ? (a2 * b2) : +0.0 tmp3 := (mask7 == 1) ? (a3 * b3) : +0.0 tmp4 := tmp0 + tmp1 + tmp2 + tmp3 r0 := (mask0 == 1) ? tmp4 : +0.0 r1 := (mask1 == 1) ? tmp4 : +0.0 r2 := (mask2 == 1) ? tmp4 : +0.0 r3 := (mask3 == 1) ? tmp4 : +0.0
The immediate bits 4-7 of mask determine which of the corresponding source operand pairs are to be multiplied. Bits 0-3 determine whether the dot product result will be written. If a mask bit is 0, the corresponding product result or written value is +0.0.
r0-r3, a0-a3, and b0-b3are the sequentially ordered 32-bit components of return value r and parameters a and b, respectively. r0, a0, and b0 are the least significant 32 bits.
maski is bit i of parameter mask, where bit 0 is the least significant bit.
Before you use this intrinsic, software must ensure that the underlying processor supports the instruction.
#include <stdio.h>
#include <smmintrin.h>
int main ()
{
__m128 a, b;
const int mask = 0x55;
a.m128_f32[0] = 1.5;
a.m128_f32[1] = 10.25;
a.m128_f32[2] = -11.0625;
a.m128_f32[3] = 81.0;
b.m128_f32[0] = -1.5;
b.m128_f32[1] = 3.125;
b.m128_f32[2] = -50.5;
b.m128_f32[3] = 100.0;
__m128 res = _mm_dp_ps(a, b, mask);
printf_s("Original a: %f\t%f\t%f\t%f\nOriginal b: %f\t%f\t%f\t%f\n",
a.m128_f32[0], a.m128_f32[1], a.m128_f32[2], a.m128_f32[3],
b.m128_f32[0], b.m128_f32[1], b.m128_f32[2], b.m128_f32[3]);
printf_s("Result res: %f\t%f\t%f\t%f\n",
res.m128_f32[0], res.m128_f32[1], res.m128_f32[2], res.m128_f32[3]);
return 0;
}