_mm_lddqu_si128
Microsoft Specific
Generates the lddqu instruction.
__m128i _mm_lddqu_si128(
__m128i const* Data
);
Parameters
- [in] Data
The data to load.
Requirements
Intrinsic |
Architecture |
---|---|
_mm_lddqu_si128 |
Intel SSE3 |
Header file <intrin.h>
Return Value
The return value represents a register containing the loaded data.
Remarks
The lddqu instruction loads 128-bit data but does some special alignment to prevent cache line splits. If the address of the data is 16-byte aligned, the address is loaded as is. If the address is not 16-byte aligned, the load begins at the highest 16-byte-aligned address less than the address of Data. A 32-byte block is loaded, and the requested bytes are extracted. This intrinsic is recommended for improved performance over 128-bit unaligned loads.
This routine is only available as an intrinsic.
Example
// processor: x86 with SSE3
// Execute the lddqu instruction using the intrinsic
// _mm_lddqu_si128
#include <stdio.h>
#include <intrin.h>
#pragma intrinsic( _mm_lddqu_si128 )
int main( )
{
unsigned char pBuf[32];
__m128i i;
__m128i j;
i.m128i_i32[0] = 1;
i.m128i_i32[1] = 2;
i.m128i_i32[2] = 3;
i.m128i_i32[3] = 4;
memcpy(pBuf+7, &i, 16);
printf_s("load unaligned data using _mm_lddqu_si128 for best performance");
j = _mm_lddqu_si128( (const __m128i*)pBuf );
}
load unaligned data using _mm_lddqu_si128 for best performance