Collapse the table of content
Expand the table of content
Important This document may not represent best practices for current development, links to downloads and other resources may no longer be valid. Current recommended version can be found here. ArchiveDisclaimer


Microsoft Specific

Generates the lddqu instruction.

__m128i _mm_lddqu_si128(
   __m128i const* Data

[in] Data

The data to load.




Intel SSE3

Header file <intrin.h>

The return value represents a register containing the loaded data.

The lddqu instruction loads 128-bit data but does some special alignment to prevent cache line splits. If the address of the data is 16-byte aligned, the address is loaded as is. If the address is not 16-byte aligned, the load begins at the highest 16-byte-aligned address less than the address of Data. A 32-byte block is loaded, and the requested bytes are extracted. This intrinsic is recommended for improved performance over 128-bit unaligned loads.

This routine is only available as an intrinsic.

// processor: x86 with SSE3
// Execute the lddqu instruction using the intrinsic 
// _mm_lddqu_si128

#include <stdio.h>
#include <intrin.h>

#pragma intrinsic( _mm_lddqu_si128 )

int main( )
    unsigned char pBuf[32];
    __m128i i;
    __m128i j;

    i.m128i_i32[0] = 1;
    i.m128i_i32[1] = 2;
    i.m128i_i32[2] = 3;
    i.m128i_i32[3] = 4;
    memcpy(pBuf+7, &i, 16);

    printf_s("load unaligned data using _mm_lddqu_si128 for best performance");
    j = _mm_lddqu_si128( (const __m128i*)pBuf );
load unaligned data using _mm_lddqu_si128 for best performance
© 2015 Microsoft