# 3DNow! Intrinsics

**Visual Studio .NET 2003**

**Microsoft Specific**

This topic contains the 3DNow! intrinsics. For each intrinsic, the header file mm3dnow.h is required.

Additional information about 3DNow! intrinsics can be found at http://www.amd.com/devconn/3dsdk/msvc.html.

The following table lists the 3DNow! intrinsics alphabetically.

Intrinsic | Use |
---|---|

_m_femms | Clears the architectural state when switching between MMX and floating-point instructions. |

_m_from_float | Returns a 64-bit MMX value where the lower half is set to the floating-point, single-precision value from the source operand and the upper half is zero. There is no error return. |

_m_pavgusb | Calculates the rounded averages of eight unsigned 8-bit integer values. |

_m_pf2id | Converts packed floating-point, single-precision values to packed 32-bit integer values. |

_m_pf2iw | Converts packed floating-point, single-precision values to packed 16-bit signed integer values using truncation. |

_m_pfacc | Performs packed floating-point, single-precision accumulation. |

_m_pfadd | Performs packed floating-point, single-precision addition. |

_m_pfcmpeq | Compares packed floating-point, single-precision values to be equal and sets the corresponding return value to ones or zeros based on the result of the comparison. |

_m_pfcmpge | Compares the first packed floating-point, single-precision value to be greater than or equal to the second one and sets the corresponding return value to ones or zeros based on the result of the comparison. |

_m_pfcmpgt | Compares the first packed floating-point, single-precision value to be greater than the second one and sets the corresponding return value to ones or zeros based on the result of the comparison. |

_m_pfmax | Returns the larger of the two packed floating-point, single-precision values. |

_m_pfmin | Returns the smaller of the two packed floating-point, single-precision values. |

_m_pfmul | Performs packed floating-point, single-precision multiplication. |

_m_pfnacc | Performs packed floating-point, single-precision negative accumulation. |

_m_pfpnacc | Performs packed floating-point, single-precision positive-negative accumulation. |

_m_pfrcp | Performs scalar floating-point, low-precision reciprocal approximation. |

_m_pfrcpit1 | Performs the first intermediate step in the Newton-Raphson iteration to refine the reciprocal approximation produced by the _m_pfrcp intrinsic function. |

_m_pfrcpit2 | Performs the second and final step in the Newton-Raphson iteration to refine the reciprocal or reciprocal square root approximation produced by the _m_pfrcp or _m_pfsqrt intrinsic functions, respectively. |

_m_pfrsqit1 | Performs the first intermediate step in the Newton-Raphson iteration to refine the reciprocal square root approximation produced by _m_pfsqrt intrinsic function. |

_m_pfrsqrt | Performs scalar floating-point, low-precision reciprocal square root approximation. |

_m_pfsub | Performs packed floating-point, single-precision subtraction. |

_m_pfsubr | Performs packed floating-point, single-precision reverse subtraction. |

_m_pi2fd | Converts packed 32-bit integer values to packed floating-point, single-precision values. |

_m_pi2fw | Converts packed 16-bit signed integer values to packed floating-point, single-precision values. |

_m_pmulhrw | Multiplies four signed 16-bit integer values in the source operand by four signed 16-bit integer values in the destination operand. |

_m_prefetch | Loads a 32-byte cache line into L1 data cache and sets the cache line state to exclusive. |

_m_prefetchw | Loads a 32-byte cache line into L1 data cache and sets the cache line state to modified. |

_m_pswapd | Swaps upper and lower halves of the source operand. |

_m_to_float | Returns the floating-point, single-precision value from the lower half of the 64-bit MMX value in the source operand. There is no error return.
The compiler correctly ensures that an implict FEMMS is issued before any attempt to use the result of _m_to_float( ) operation. |

void _m_femms( void );

Clears the architectural state when switching between MMX and floating-point instructions. The contents of MMX registers and floating-point stack are undefined after the function is executed.

__m64 _m_from_float(float f);

Returns a 64-bit MMX value where the lower half is set to the floating-point, single-precision value from the source operand and the upper half is zero. There is no error return.

__m64 _m_pavgusb( __m64 m1, __m64 m2 );

Calculates the rounded averages of eight unsigned 8-bit integer values. The **_m_pavgusb** function takes two 64-bit MMX values and returns a 64-bit MMX value. There is no error return.

__m64 _m_pf2id( __m64 m );

Converts packed floating-point, single-precision values to packed 32-bit integer values. The **_m_pf2id** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pf2iw( __m64 m );

Converts packed floating-point, single-precision values to packed 16-bit signed integer values using truncation. The **_m_pf2iw** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pfacc( __m64 m1, __m64 m2 );

Performs packed floating-point, single-precision accumulation. The **_m_pfacc** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pfadd( __m64 m1, __m64 m2 );

Performs packed floating-point, single-precision addition. The **_m_pfadd** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pfcmpeq( __m64 m1, __m64 m2 );

Compares packed floating-point, single-precision values to be equal and sets the corresponding return value to ones or zeros based on the result of the comparison. The **_m_pfcmpeq** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pfcmpge( __m64 m1, __m64 m2 );

Compares the first packed floating-point, single-precision value to be greater than or equal to the second one and sets the corresponding return value to ones or zeros based on the result of the comparison. The **_m_pfcmpge** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pfcmpgt( __m64 m1, __m64 m2 );

Compares the first packed floating-point, single-precision value to be greater than the second one and sets the corresponding return value to ones or zeros based on the result of the comparison. The **_m_pfcmpgt** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pfmax( __m64 m1, __m64 m2 );

Returns the larger of the two packed floating-point, single-precision values. The **_m_pfmax** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pfmin( __m64 m1, __m64 m2 );

Returns the smaller of the two packed floating-point, single-precision values. The **_m_pfmin** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pfmul( __m64 m1, __m64 m2 );

Performs packed floating-point, single-precision multiplication. The **_m_pfmul** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pfnacc( __m64 m1, __m64 m2 );

Performs packed floating-point, single-precision negative accumulation. The **_m_pfnacc** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pfpnacc( __m64 m1, __m64 m2 );

Performs packed floating-point, single-precision positive-negative accumulation. The **_m_pfpnacc** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pfrcp( __m64 m );

Performs scalar floating-point, low-precision reciprocal approximation. The 14-bit accurate result is duplicated in both high and low halves of the return value. The **_m_pfrcp** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pfrcpit1( __m64 m1, __m64 m2 );

Performs the first intermediate step in the Newton-Raphson iteration to refine the reciprocal approximation produced by the **_m_pfrcp** intrinsic function. The **_m_pfrcpit1** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pfrcpit2( __m64 m1, __m64 m2 );

Performs the second and final step in the Newton-Raphson iteration to refine the reciprocal or reciprocal square root approximation produced by the **_m_pfrcp** or **_m_pfsqrt** intrinsic functions, respectively. The **_m_pfrcpit2** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pfrsqit1( __m64 m1, __m64 m2 );

Performs the first intermediate step in the Newton-Raphson iteration to refine the reciprocal square root approximation produced by **_m_pfsqrt** intrinsic function. The **_m_pfrsqit1** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pfrsqrt( __m64 m );

Performs scalar floating-point, low-precision reciprocal square root approximation. The 15-bit accurate result is duplicated in both high and low halves of the return value. The **_m_pfrsqrt** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pfsub( __m64 m1, __m64 m2 );

Performs packed floating-point, single-precision subtraction. The **_m_pfsub** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pfsubr( __m64 m1, __m64 m2 );

Performs packed floating-point, single-precision reverse subtraction. The **_m_pfsubr** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pi2fd( __m64 m );

Converts packed 32-bit integer values to packed floating-point, single-precision values. The **_m_pi2fd** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pi2fw( __m64 m );

Converts packed 16-bit signed integer values to packed floating-point, single-precision values. The **_m_pi2fw** function returns a 64-bit MMX value. There is no error return.

__m64 _m_pmulhrw( __m64 m1, __m64 m2 );

Multiplies four signed 16-bit integer values in the source operand by four signed 16-bit integer values in the destination operand. Each high-order 16-bit result is rounded. The **_m_pmulhrw** function returns a 64-bit MMX value. There is no error return.

void _m_prefetch( void* p );

Loads a 32-byte cache line into L1 data cache and sets the cache line state to **exclusive**. If the line is already in the cache or if a memory fault is detected, then the intrinsic function has no effect. The variable `p`

specifies the address of the cache line to be loaded.

void _m_prefetchw( void* p );

Loads a 32-byte cache line into L1 data cache and sets the cache line state to **modified**. If the line is already in the cache or if a memory fault is detected, then the intrinsic function has no effect. The variable `p`

specifies the address of the cache line to be loaded.

__m64 _m_pswapd( __m64 m );

Swaps upper and lower halves of the source operand and returns a 64-bit MMX value. There is no error return.

float _m_to_float( __m64 m );

Returns the floating-point, single-precision value from the lower half of the 64-bit MMX value in the source operand. There is no error return.

The compiler correctly ensures that an implict FEMMS is issued before any attempt to use the result of _m_to_float( ) operation.

**END Microsoft Specific**