# Intrinsics Overview

**Visual Studio 2010**

Microsoft Specific

3DNow! technology provides up to 26 additional instructions to support high-performance 3D graphics and audio processing. 3DNow! instructions are vector instructions that operate on 64-bit registers. 3DNow! instructions are SIMD; each instruction operates on pairs of 32-bit values. See 3DNow! Intrinsics for the reference documentation for the AMD intrinsics.

Vector instructions operate in parallel on two sets of 32-bit single-precision, floating-point words. Scalar instructions operate on a single set of 32-bit operands (from the low halves of the two 64-bit operands).

The 3DNow! single-precision, floating-point format is compatible with the IEEE 754 single-precision format. This format comprises a 1-bit sign, an 8-bit biased exponent, and a 23-bit significand with one hidden integer bit for a total of 24 bits in the significand. The bias of the exponent is 127, consistent with the IEEE single-precision standard. The significands are normalized to be within the range of [1,2).

In contrast to the IEEE standard that dictates four rounding modes, 3DNow! technology supports one rounding mode, as either round-to-nearest or round-to-zero (truncation). The hardware implementation of 3DNow! technology determines the rounding mode. The AMD processors implement round-to-nearest mode. Regardless of the rounding mode used, the floating-point-to-integer and integer-to-floating-point conversion instructions, PF2ID and PI2FD, always use the round-to-zero (truncation) mode.

The largest representable normal number in magnitude for this precision in hexadecimal has an exponent of FEh and a significand of 7FFFFFh, with a numerical value of 2127 (2 – 2–23). All results that overflow above the maximum representable positive value are saturated to either this maximum representable normal number or to positive infinity. Similarly, all results that overflow below the minimum representable negative value are saturated to either this minimum representable normal number or to negative infinity.

The implementation of 3DNow! technology determines how arithmetic overflow is handled, as either properly signed maximum or minimum representable normal numbers or properly signed infinities. The processor generates properly signed maximum or minimum representable normal numbers.

Infinities and NANs are not supported as operands to 3DNow! instructions.

The smallest representable normal number in magnitude for this precision in hexadecimal has an exponent of 01h and a significand of 000000h, with a numerical value of 2–126. Accordingly, all results below this minimum representable value in magnitude are held to zero. The following table shows the exponent ranges supported by the 3DNow! technology.

Biased exponent | Description |
---|---|

FFh | Unsupported. Unsupported numbers can be used as operands. The results of operations with unsupported numbers are undefined. |

00h | Zero. |

00h<x<FFh | Normal. |

01h | 2 (1–127) lowest possible exponent. |

FEh | 2 (254–127) largest possible exponent. |

Like MMX instructions, 3DNow! instructions do not generate numeric exceptions or set any status flags. It is the user's responsibility to ensure that in-range data is provided to 3DNow! instructions and that all computations remain within valid ranges (or are held as expected).

The register operations of all 3DNow! floating-point instructions are executed by either the register X unit or the register Y unit. One operation can be issued to each register unit at each clock cycle for a maximum issue and execution rate of two 3DNow! operations per cycle.

Normally, in high-performance 3DNow! code, all 3DNow! instructions are properly scheduled apart from each other to avoid delays caused by execution resource contentions (as well as taking into account dependencies and execution latencies).

For further information regarding code optimization on the AMD-K6 processor, see the *AMD-K6 Processor Code Optimization Application Note*, order number 21924. This document provides in-depth discussions of code optimization techniques for the processor.

For execution resources information on the AMD Athlon processor, refer to the *AMD Athlon Processor x86 Code Optimization Guide*, order number 22007.

The 3DNow! performance enhancement instructions for AMD processors are summarized in the following tables.

Operation | Function | Opcode |
---|---|---|

PAVGUSB | Packed 8-bit unsigned integer averaging | BFh |

PFADD | Packed floating-point addition | 9Eh |

PFSUB | Packed floating-point subtraction | 9Ah |

PFSUBR | Packed floating-point reverse subtraction | Aah |

PFACC | Packed floating-point accumulate | Aeh |

PFCMPGE | Packed floating-point comparison, greater or equal | 90h |

PFCMPGT | Packed floating-point comparison, greater | A0h |

PFCMPEQ | Packed floating-point comparison, equal | B0h |

PFMIN | Packed floating-point minimum | 94h |

PFMAX | Packed floating-point maximum | A4h |

PI2FD | Packed 32-bit integer to floating-point conversion | 0Dh |

PF2ID | Packed floating-point to 32-bit integer | 1Dh |

PFRCP | Packed floating-point reciprocal approximation | 96h |

PFRSQRT | Packed floating-point reciprocal square root approximation | 97h |

PFMUL | Packed floating-point multiplication | B4h |

PFRCPIT1 | Packed floating-point reciprocal first iteration step | A6h |

PFRSQIT1 | Packed floating-point reciprocal square root first iteration step | A7h |

PFRCPIT2 | Packed floating-point reciprocal/reciprocal square root second iteration step | B6h |

PMULHRW | Packed 16-bit integer multiply with rounding | B7h |

Operation | Function | Opcode second byte |
---|---|---|

FEMMS | Faster entry/exit of the MMX or floating-point state. | 0Eh |

PREFETCH/PREFETCHW | The function prefetches at least a 32-byte line into L1 data cache (Dcache). The AMD-K6-2 and AMD-K6-III processors execute the PREFETCHW instruction identically to the PREFETCH instruction. On the AMD Athlon processor, PREFETCHW can increase performance by providing a hint to the processor of an intent to modify the cache line.
| 0Dh |

Operation | Function | Opcode / imm8 |
---|---|---|

PF2IW | Packed floating-point to integer word conversion with sign extend | 0Fh 0Fh / 1Ch |

PFNACC | Packed floating-point negative accumulate | 0Fh 0Fh / 8Ah |

PFPNACC | Packed floating-point mixed positive-negative accumulate | 0Fh 0Fh / 8Eh |

PI2FW | Packed integer word to floating-point conversion | 0Fh 0Fh / 0Ch |

PSWAPD | Packed swap doubleword | 0Fh 0Fh / BBh |

Operation | Function | Opcode / imm8 |
---|---|---|

MASKMOVQ | Streaming (cache bypass) store using byte mask | 0Fh F7h |

MOVNTQ | Streaming (cache bypass) store | 0Fh E7h |

PAVGB | Packed average of unsigned byte | 0Fh E0h |

PAVGW | Packed average of unsigned word | 0Fh E3h |

PEXTRW | Extract word into integer register | 0Fh C5h |

PINSRW | Insert word from integer register | 0Fh C4h |

PMAXSW | Packed maximum signed word | 0Fh Eeh |

PMAXUB | Packed maximum unsigned byte | 0Fh Deh |

PMINSW | Packed minimum signed word | 0Fh Eah |

PMINUB | Packed minimum unsigned byte | 0Fh Dah |

PMOVMSKB | Move byte mask to integer register | 0Fh D7h |

PMULHUW | Packed multiply high unsigned word | 0Fh E4h |

PREFETCHNTA | Move data closer to the processor using the NTA reference | 0Fh 18h 0* |

PREFETCHT0 | Move data closer to the processor using the T0 reference | 0Fh 18h 1* |

PREFETCHT1 | Move data closer to the processor using the T1 reference | 0Fh 18h 2* |

PREFETCHT2 | Move data closer to the processor using the T2 reference | 0Fh 18h 3* |

PSADBW | Packed sum of absolute byte differences | 0Fh F6h |

PSHUFW | Packed shuffle word | 0Fh 70h |

SFENCE | Store fence | 0Fh AEh / 7h |

*The number after the opcode indicates the different prefetch modes in the modR/M byte.

For further information regarding code optimization on the AMD-K6-2 processor, see the AMD-K6-2 Processor Code Optimization Application Note, order number 21924. This document provides in-depth discussions of code optimization techniques for the AMD-K6 family processor.

For execution resources information on the AMD Athlon processor, refer to the AMD Athlon Processor x86 Code Optimization Guide, order number 22007. This document provides in-depth discussions of code optimization techniques for the AMD Athlon processor.

See http://go.microsoft.com/fwlink/?LinkID=95131 for the online versions of these documents.