Microsoft Specific

Generates the lddqu instruction.

__m128i _mm_lddqu_si128(
   __m128i const* Data


[in] Data

The data to load.

Intrinsic Architecture


Intel SSE3

Header file <intrin.h>

The return value represents a register containing the loaded data.

The lddqu instruction loads 128-bit data but does some special alignment to prevent cache line splits. If the address of the data is 16-byte aligned, the address is loaded as is. If the address is not 16-byte aligned, the load begins at the highest 16-byte-aligned address less than the address of Data. A 32-byte block is loaded, and the requested bytes are extracted. This intrinsic is recommended for improved performance over 128-bit unaligned loads.

This routine is only available as an intrinsic.

// processor: x86 with SSE3
// Execute the lddqu instruction using the intrinsic 
// _mm_lddqu_si128

#include <stdio.h>
#include <intrin.h>

#pragma intrinsic( _mm_lddqu_si128 )

int main( )
    unsigned char pBuf[32];
    __m128i i;
    __m128i j;

    i.m128i_i32[0] = 1;
    i.m128i_i32[1] = 2;
    i.m128i_i32[2] = 3;
    i.m128i_i32[3] = 4;
    memcpy(pBuf+7, &i, 16);

    printf_s("load unaligned data using _mm_lddqu_si128 for best performance");
    j = _mm_lddqu_si128( (const __m128i*)pBuf );


load unaligned data using _mm_lddqu_si128 for best performance

Community Additions