This topic describes the internal design of the DirectXMath library.
- Calling Conventions
- Graphics Library Type Equivalence
- Global Constants in the DirectXMath Library
- Windows SSE versus SSE2
- Routine Variants
- Platform Inconsistencies
- Platform-specific Extensions
Calling Conventions
To enhance portability and optimize data layout, you need to use the appropriate calling conventions for each platform supported by the DirectXMath Library. Specifically, when you pass XMVECTOR objects as parameters, which are defined as aligned on a 16-byte boundary, there are different sets of calling requirements, depending on the target platform:
For 32-bit Windows
To make full use of available SSE/SSE2 functionality, passing __m128 values (which implements XMVECTOR on that platform) to an inline routine, you need to do the following:
- Use the __fastcall calling conventions to pass the first three __m128 values (XMVECTOR instances) as arguments to a function in a SSE/SSE2 register.
- Pass on the stack any remaining __m128 values passed as arguments to a function.
For 64-bit editions of Windows
The 64-bit editions of the Windows operating system does not support register passing of SSE/SSE2 types, so that __m128 values must be passed a stack based __fastcall convention.
For Windows RT
The Windows RT operating system supports passing the first four __m128 values (XMVECTOR instances) in-register.
The FXMVECTOR, GXMVECTOR, and CXMVECTOR aliases support these conventions:
- Use the FXMVECTOR alias to pass up to the first three instances of XMVECTOR used as arguments to a function.
- Use the GXMVECTOR alias to pass the 4th instance of an XMVECTOR used as an argument to a function.
- Use the CXMVECTOR alias to pass any further instances of XMVECTOR used as arguments.
Note For output parameters, always use XMVECTOR* or XMVECTOR& and ignore them with respect to the preceding rules for input parameters.
The following are example declarations that illustrate this convention:
XMMATRIX XMMatrixLookAtLH(FXMVECTOR EyePosition, FXMVECTOR FocusPosition, FXMVECTOR UpDirection); XMMATRIX XMMatrixTransformation2D(FXMVECTOR ScalingOrigin, float ScalingOrientation, FXMVECTOR Scaling, FXMVECTOR RotationOrigin, float Rotation, GXMVECTOR Translation); void XMVectorSinCos(XMVECTOR* pSin, XMVECTOR* pCos, FXMVECTOR V); XMVECTOR XMVectorHermiteV(FXMVECTOR Position0, FXMVECTOR Tangent0, FXMVECTOR Position1, GXMVECTOR Tangent1, CXMVECTOR T);
For 32-bit Windows
typedef const XMVECTOR FXMVECTOR; typedef const XMVECTOR& GXMVECTOR; typedef const XMVECTOR& CXMVECTOR;
For 64-bit editions of Windows
typedef const XMVECTOR& FXMVECTOR; typedef const XMVECTOR& GXMVECTOR; typedef const XMVECTOR& CXMVECTOR;
Windows RT
typedef const XMVECTOR FXMVECTOR; typedef const XMVECTOR GXMVECTOR; typedef const XMVECTOR& CXMVECTOR;
Note While all the functions are declared inline and in many cases the compiler will not need to use calling conventions for these functions, there are cases where the compiler may decide it is more efficient to not inline the function and in these cases we want the best calling convention possible for each platform.
Graphics Library Type Equivalence
To support the use of the DirectXMath Library, many DirectXMath Library types and structures are equivalent to the Windows implementations of the D3DDECLTYPE and D3DFORMAT types, as well as the DXGI_FORMAT types.
| DirectXMath | D3DDECLTYPE | D3DFORMAT | DXGI_FORMAT |
|---|---|---|---|
| XMBYTE2 | DXGI_FORMAT_R8G8_SINT | ||
| XMBYTE4 | D3DDECLTYPE_BYTE4 (Xbox Only) | D3DFMT_x8x8x8x8 | DXGI_FORMAT_x8x8x8x8_SINT |
| XMBYTEN2 | D3DFMT_V8U8 | DXGI_FORMAT_R8G8_SNORM | |
| XMBYTEN4 | D3DDECLTYPE_BYTE4N (Xbox Only) | D3DFMT_x8x8x8x8 | DXGI_FORMAT_x8x8x8x8_SNORM |
| XMCOLOR | D3DDECLTYPE_D3DCOLOR | D3DFMT_A8R8G8B8 | DXGI_FORMAT_B8G8R8A8_UNORM (DXGI 1.1+) |
| XMDEC4 | D3DDECLTYPE_DEC4 (Xbox Only) | D3DDECLTYPE_DEC3 (Xbox Only) | |
| XMDECN4 | D3DDECLTYPE_DEC4N (Xbox Only) | D3DDECLTYPE_DEC3N (Xbox Only) | |
| XMFLOAT2 | D3DDECLTYPE_FLOAT2 | D3DFMT_G32R32F | DXGI_FORMAT_R32G32_FLOAT |
| XMFLOAT2A | D3DDECLTYPE_FLOAT2 | D3DFMT_G32R32F | DXGI_FORMAT_R32G32_FLOAT |
| XMFLOAT3 | D3DDECLTYPE_FLOAT3 | DXGI_FORMAT_R32G32B32_FLOAT | |
| XMFLOAT3A | D3DDECLTYPE_FLOAT3 | DXGI_FORMAT_R32G32B32_FLOAT | |
| XMFLOAT3PK | DXGI_FORMAT_R11G11B10_FLOAT | ||
| XMFLOAT3SE | DXGI_FORMAT_R9G9B9E5_SHAREDEXP | ||
| XMFLOAT4 | D3DDECLTYPE_FLOAT4 | D3DFMT_A32B32G32R32F | DXGI_FORMAT_R32G32B32A32_FLOAT |
| XMFLOAT4A | D3DDECLTYPE_FLOAT4 | D3DFMT_A32B32G32R32F | DXGI_FORMAT_R32G32B32A32_FLOAT |
| XMHALF2 | D3DDECLTYPE_FLOAT16_2 | D3DFMT_G16R16F | DXGI_FORMAT_R16G16_FLOAT |
| XMHALF4 | D3DDECLTYPE_FLOAT16_4 | D3DFMT_A16B16G16R16F | DXGI_FORMAT_R16G16B16A16_FLOAT |
| XMINT2 | DXGI_FORMAT_R32G32_SINT | ||
| XMINT3 | DXGI_FORMAT_R32G32B32_SINT | ||
| XMINT4 | DXGI_FORMAT_R32G32B32A32_SINT | ||
| XMSHORT2 | D3DDECLTYPE_SHORT2 | D3DFMT_V16U16 | DXGI_FORMAT_R16G16_SINT |
| XMSHORTN2 | D3DDECLTYPE_SHORT2N | D3DFMT_V16U16 | DXGI_FORMAT_R16G16_SNORM |
| XMSHORT4 | D3DDECLTYPE_SHORT4 | D3DFMT_x16x16x16x16 | DXGI_FORMAT_R16G16B16A16_SINT |
| XMSHORTN4 | D3DDECLTYPE_SHORT4N | D3DFMT_x16x16x16x16 | DXGI_FORMAT_R16G16B16A16_SNORM |
| XMUBYTE2 | DXGI_FORMAT_R8G8_UINT | ||
| XMUBYTEN2 | D3DFMT_A8P8, D3DFMT_A8L8 | DXGI_FORMAT_R8G8_UNORM | |
| XMUINT2 | DXGI_FORMAT_R32G32_UINT | ||
| XMUINT3 | DXGI_FORMAT_R32G32B32_UINT | ||
| XMUINT4 | DXGI_FORMAT_R32G32B32A32_UINT | ||
| XMU555 | D3DFMT_X1R5G5B5, D3DFMT_A1R5G5B5 | DXGI_FORMAT_B5G5R5A1_UNORM | |
| XMU565 | D3DFMT_R5G6B5 | DXGI_FORMAT_B5G6R5_UNORM | |
| XMUBYTE4 | D3DDECLTYPE_UBYTE4 | D3DFMT_x8x8x8x8 | DXGI_FORMAT_x8x8x8x8_UINT |
| XMUBYTEN4 | D3DDECLTYPE_UBYTE4N | D3DFMT_x8x8x8x8 | DXGI_FORMAT_x8x8x8x8_UNORM |
| XMUDEC4 |
D3DDECLTYPE_UDEC4 (Xbox Only) D3DDECLTYPE_UDEC3 (Xbox Only) |
D3DFMT_A2R10G10B10 D3DFMT_A2B10G10R10 | DXGI_FORMAT_R10G10B10A2_UINT |
| XMUDECN4 |
D3DDECLTYPE_UDEC4N (Xbox Only) D3DDECLTYPE_UDEC3N (Xbox Only) |
D3DFMT_A2R10G10B10 D3DFMT_A2B10G10R10 | DXGI_FORMAT_R10G10B10A2_UNORM |
| XMUNIBBLE4 | D3DFMT_A4R4G4B4, D3DFMT_X4R4G4B4 | DXGI_FORMAT_B4G4R4A4_UNORM (DXGI 1.2+) | |
| XMUSHORT2 | D3DDECLTYPE_USHORT2 | D3DFMT_G16R16 | DXGI_FORMAT_R16G16_UINT |
| XMUSHORTN2 | D3DDECLTYPE_USHORT2N | D3DFMT_G16R16 | DXGI_FORMAT_R16G16_UNORM |
| XMUSHORT4 | D3DDECLTYPE_USHORT4 (Xbox Only) | D3DFMT_x16x16x16x16 | DXGI_FORMAT_R16G16B16A16_UINT |
| XMUSHORTN4 | D3DDECLTYPE_USHORT4N | D3DFMT_x16x16x16x16 | DXGI_FORMAT_R16G16B16A16_UNORM |
Global Constants in the DirectXMath Library
To reduce the size of the data segment, the DirectXMath Library uses the XMGLOBALCONST macro to make use of a number of global internal constants in its implementation. By convention, such internal global constants are prefixed by g_XM. Typically, they are one of the following types: XMVECTORU32, XMVECTORF32, or XMVECTORI32.
These internal global constants are subject to change in future revisions of the DirectXMath Library. Use public functions that encapsulate the constants when possible rather than direct use of g_XM global values. You can also declare your own global constants using XMGLOBALCONST.
Windows SSE versus SSE2
The SSE instruction set provides support only for single-precision floating-point vectors. DirectXMath must make use of the SSE2 instruction set to provide integer vector support. SSE2 is supported by all Intel processors since the introduction of the Pentium 4, all AMD K8 and later processors, and all x64-capable processors.
Note Windows 8 for x86 requires support for SSE2. All versions of Windows x64 require support for SSE2. Windows RT (also known as Windows on ARM) requires ARM_NEON.
Routine Variants
There are several variants of DirectXMath functions that make it easier to do your work:
- Comparison functions to create complicated conditional branching based on a smaller number of vector comparison operations. The name of these functions end in "R" such as XMVector3InBoundsR. The functions return a comparison record as a UINT return value, or as a UINT out parameter. You can use the XMComparision* macros to test the value.
- Batch functions for performing batch-style operations on larger vector arrays. The name of these functions end in "Stream" such as XMVector3TransformStream. The functions operate on an array of inputs, and they generate an array of outputs. Typically, they take an input and output stride.
- Estimation functions that implement a faster estimation instead of a slower, more accurate result. The name of these functions end in "Est" such as XMVector3NormalizeEst. The quality and performance impact of using estimation varies from platform to platform, but we recommend that you use estimation variants for performance-sensitive code.
Platform Inconsistencies
The DirectXMath library is intended for use in performance-sensitive graphics applications and games. Therefore, the implementation is designed for optimal speed doing normal processing on all supported platforms. Results at boundary-conditions, particularly those that generate floating-point specials, are likely to vary from target to target. This behavior will also depend on other run-time settings, such as the x87 control word for the Windows 32-bit no-intrinsics target or the SSE control word for both Windows 32-bit and 64-bit. Furthermore, there will be differences in boundary-conditions between various CPU vendors.
Don't use DirectXMath in scientific or other applications where numerical accuracy is paramount. Also, this limitation is reflected in the lack of support for double or other extended precision computations.
Note The _XM_NO_INTRINSICS_ scalar code paths generally are written for compliance, not performance. Their boundary-condition results also will vary.
Platform-specific Extensions
The DirectXMath library is intended to simplify C++ SIMD programming providing excellent support for x86, x64, and Windows RT platforms using broadly supported intrinsics instructions (SSE2 and ARM-NEON).
There are times, however, when platform-specific instructions may prove beneficial. Due to the way DirectXMath is implemented, in many cases it is trivial to use DirectXMath types directly in standard compiler-supported intrinsics statements, and to use DirectXMath as the fallback path for platforms that don't support the extended instruction.
For example, here is a simplified example of leveraging the SSE 4.1 dot-product instruction. Note that you must explicitly guard the code-path to avoid generating invalid instruction exceptions at run time. Ensure the code paths do significant enough work to justify the additional cost of branching, complexity of maintaining multiple code-paths, and so on.
#include <windows.h>
#include <stdio.h>
#include <DirectXMath.h>
#include <intrin.h>
#include <smmintrin.h>
using namespace DirectX;
bool g_bSSE41 = false;
void DetectCPUFeatures()
{
#ifndef _M_ARM
// See __cpuid documentation on MSDN for more information
int CPUInfo[4] = {-1};
__cpuid( CPUInfo, 0 );
if ( CPUInfo[0] >= 1 )
{
__cpuid(CPUInfo, 1 );
if ( CPUInfo[2] & 0x80000 )
g_bSSE41 = true;
}
#endif
}
int main()
{
if ( !XMVerifyCPUSupport() )
return -1;
DetectCPUFeatures();
...
XMVECTORF32 v1 = { 1.f, 2.f, 3.f, 4.f };
XMVECTORF32 v2 = { 5.f, 6.f, 7.f, 8.f };
XMVECTOR r2, r3, r4;
if ( g_bSSE41 )
{
#ifndef _M_ARM
r2 = _mm_dp_ps( v1, v2, 0x3f );
r3 = _mm_dp_ps( v1, v2, 0x7f );
r4 = _mm_dp_ps( v1, v2, 0xff );
#endif
}
else
{
r2 = XMVector2Dot( v1, v2 );
r3 = XMVector3Dot( v1, v2 );
r4 = XMVector4Dot( v1, v2 );
}
...
return 0;
}
Send comments about this topic to Microsoft
Build date: 11/14/2012