3DNow! technology provides up to 26 additional instructions to support high-performance 3D graphics and audio processing. 3DNow! instructions are vector instructions that operate on 64-bit registers. 3DNow! instructions are SIMD; each instruction operates on pairs of 32-bit values. See 3DNow! Intrinsics for the reference documentation for the AMD intrinsics.
Vector instructions operate in parallel on two sets of 32-bit single-precision, floating-point words. Scalar instructions operate on a single set of 32-bit operands (from the low halves of the two 64-bit operands).
The 3DNow! single-precision, floating-point format is compatible with the IEEE 754 single-precision format. This format comprises a 1-bit sign, an 8-bit biased exponent, and a 23-bit significand with one hidden integer bit for a total of 24 bits in the significand. The bias of the exponent is 127, consistent with the IEEE single-precision standard. The significands are normalized to be within the range of [1,2).
In contrast to the IEEE standard that dictates four rounding modes, 3DNow! technology supports one rounding mode, as either round-to-nearest or round-to-zero (truncation). The hardware implementation of 3DNow! technology determines the rounding mode. The AMD processors implement round-to-nearest mode. Regardless of the rounding mode used, the floating-point-to-integer and integer-to-floating-point conversion instructions, PF2ID and PI2FD, always use the round-to-zero (truncation) mode.
The largest representable normal number in magnitude for this precision in hexadecimal has an exponent of FEh and a significand of 7FFFFFh, with a numerical value of 2127 (2 – 2–23). All results that overflow above the maximum representable positive value are saturated to either this maximum representable normal number or to positive infinity. Similarly, all results that overflow below the minimum representable negative value are saturated to either this minimum representable normal number or to negative infinity.
The implementation of 3DNow! technology determines how arithmetic overflow is handled, as either properly signed maximum or minimum representable normal numbers or properly signed infinities. The processor generates properly signed maximum or minimum representable normal numbers.
Infinities and NANs are not supported as operands to 3DNow! instructions.
The smallest representable normal number in magnitude for this precision in hexadecimal has an exponent of 01h and a significand of 000000h, with a numerical value of 2–126. Accordingly, all results below this minimum representable value in magnitude are held to zero. The following table shows the exponent ranges supported by the 3DNow! technology.
Unsupported. Unsupported numbers can be used as operands. The results of operations with unsupported numbers are undefined.
2 (1–127) lowest possible exponent.
2 (254–127) largest possible exponent.
Like MMX instructions, 3DNow! instructions do not generate numeric exceptions or set any status flags. It is the user's responsibility to ensure that in-range data is provided to 3DNow! instructions and that all computations remain within valid ranges (or are held as expected).
The register operations of all 3DNow! floating-point instructions are executed by either the register X unit or the register Y unit. One operation can be issued to each register unit at each clock cycle for a maximum issue and execution rate of two 3DNow! operations per cycle.
Normally, in high-performance 3DNow! code, all 3DNow! instructions are properly scheduled apart from each other to avoid delays caused by execution resource contentions (as well as taking into account dependencies and execution latencies).
For further information regarding code optimization on the AMD-K6 processor, see the AMD-K6 Processor Code Optimization Application Note, order number 21924. This document provides in-depth discussions of code optimization techniques for the processor.
For execution resources information on the AMD Athlon processor, refer to the AMD Athlon Processor x86 Code Optimization Guide, order number 22007.
The 3DNow! performance enhancement instructions for AMD processors are summarized in the following tables.
Packed 8-bit unsigned integer averaging
Packed floating-point addition
Packed floating-point subtraction
Packed floating-point reverse subtraction
Packed floating-point accumulate
Packed floating-point comparison, greater or equal
Packed floating-point comparison, greater
Packed floating-point comparison, equal
Packed floating-point minimum
Packed floating-point maximum
Packed 32-bit integer to floating-point conversion
Packed floating-point to 32-bit integer
Packed floating-point reciprocal approximation
Packed floating-point reciprocal square root approximation
Packed floating-point multiplication
Packed floating-point reciprocal first iteration step
Packed floating-point reciprocal square root first iteration step
Packed floating-point reciprocal/reciprocal square root second iteration step
Packed 16-bit integer multiply with rounding
Opcode second byte
Faster entry/exit of the MMX or floating-point state.
Opcode / imm8
Packed floating-point to integer word conversion with sign extend
0Fh 0Fh / 1Ch
Packed floating-point negative accumulate
0Fh 0Fh / 8Ah
Packed floating-point mixed positive-negative accumulate
0Fh 0Fh / 8Eh
Packed integer word to floating-point conversion
0Fh 0Fh / 0Ch
Packed swap doubleword
0Fh 0Fh / BBh
Opcode / imm8
Streaming (cache bypass) store using byte mask
Streaming (cache bypass) store
Packed average of unsigned byte
Packed average of unsigned word
Extract word into integer register
Insert word from integer register
Packed maximum signed word
Packed maximum unsigned byte
Packed minimum signed word
Packed minimum unsigned byte
Move byte mask to integer register
Packed multiply high unsigned word
Move data closer to the processor using the NTA reference
0Fh 18h 0*
Move data closer to the processor using the T0 reference
0Fh 18h 1*
Move data closer to the processor using the T1 reference
0Fh 18h 2*
Move data closer to the processor using the T2 reference
0Fh 18h 3*
Packed sum of absolute byte differences
Packed shuffle word
0Fh AEh / 7h
*The number after the opcode indicates the different prefetch modes in the modR/M byte.
For further information regarding code optimization on the AMD-K6-2 processor, see the AMD-K6-2 Processor Code Optimization Application Note, order number 21924. This document provides in-depth discussions of code optimization techniques for the AMD-K6 family processor.
For execution resources information on the AMD Athlon processor, refer to the AMD Athlon Processor x86 Code Optimization Guide, order number 22007. This document provides in-depth discussions of code optimization techniques for the AMD Athlon processor.
See http://go.microsoft.com/fwlink/?LinkID=95131 for the online versions of these documents.