Microsoft Visual C++ FloatingPoint Optimization
Eric Fleegal
Microsoft Corporation
Revised June 2004
Applies To
Microsoft® Visual C++®
Summary: Get a handle on optimizing floatingpoint code using the Microsoft Visual C++ 2005 (formerly known as Visual C++ "Whidbey") method of managing floatingpoint semantics. Create fast programs while ensuring that only safe optimizations are performed on floatingpoint code. (33 printed pages)
Contents
Optimization of FloatingPoint Code in C++
The fp:precise Mode for FloatingPoint Semantics
The fp:fast Mode for FloatingPoint Semantics
The fp:strict Mode for FloatingPoint Semantics
The fenv_access Pragma
The fp_contract Pragma
The float_control Pragma
Enabling FloatingPoint Exception Semantics
Related Books
Optimization of FloatingPoint Code in C++
An optimizing C++ compiler not only translates source code into machine code, it arranges the machine instructions in such a way as to improve efficiency and/or reduce size. Unfortunately, many common optimizations are not necessarily safe when applied to floatingpoint computations. A good example of this can be seen with the following summation algorithm [1]:
float KahanSum( const float A[], int n ) { float sum=0, C=0, Y, T; for (int i=0; i<n; i++) { Y = A[i]  C; T = sum + Y; C = T  sum  Y; sum = T; } return sum; }
This function adds n
float values in the array vector A
. Within the loop body, the algorithm computes a "correction" value which is then applied to the next step of the summation. This method greatly reduces cumulative rounding errors as compared to a simple summation, while retaining O(n) time complexity.
A naïve C++ compiler might assume that floatingpoint arithmetic follows the same algebraic rules as Real number arithmetic. Such a compiler might then erroneously conclude that
C = T  sum  Y ==> (sum+Y)sumY ==> 0;
That is, that the perceived value of C is always a constant zero. If this constant value is then propagated into subsequent expressions, the loop body is reduced to a simple summation. To be precise,
Y = A[i]  C ==> Y = A[i] T = sum + Y ==> T = sum + A[i] sum = T ==> sum = sum + A[i]
Thus, to the naïve compiler, a logical transformation of the KahanSum function would be:
float KahanSum( const float A[], int n ) { float sum=0; // C, Y & T are now unused for (int i=0; i<n; i++) sum = sum + A[i]; return sum; }
Although the transformed algorithm is faster, it is not at all an accurate representation of the programmer's intention. The carefully crafted error correction has been removed entirely, and we're left with a simple, direct summation algorithm with all its associated error.
Of course a sophisticated C++ compiler would know that algebraic rules of Real arithmetic do not generally apply to floatingpoint arithmetic. However, even a sophisticated C++ compiler might still incorrectly interpret the programmer's intention.
Consider a common optimization that attempts to hold as many values in registers as possible (called "enregistering" a value). In the KahanSum example, this optimization might attempt to enregister the variables C
, Y
and T
since they're used only within the loop body. If the register precision is 52bits (double) instead of 23bits (single), this optimization effectively type promotes C
, Y
and T
to type double
. If the sum
variable is not likewise enregistered, it will remain encoded in single precision. This transforms the semantics of KahanSum to the following
float KahanSum( const float A[], int n ) { float sum=0; double C=0, Y, T; // now held inregister for (int i=0; i<n; i++) { Y = A[i]  C; T = sum + Y; C = T  sum  Y; sum = (float) T; } return sum; }
Although Y, T and C are now computed at a higher precision, this new encoding may produce a less accurate result depending on the values in A[]. Thus even seemingly harmless optimizations may have negative consequences.
These kinds of optimization problems aren't restricted to "tricky" floatingpoint code. Even simple floatingpoint algorithms can fail when optimized incorrectly. Consider a simple, directsummation algorithm:
float Sum( const float A[], int n ) { float sum=0; for (int i=0; i<n; i++) sum = sum + A[i]; return sum; }
Because some floatingpoint units are capable of performing multiple operations simultaneously, a compiler might choose to engage a scalar reduction optimization. This optimization effectively transforms the simple Sum function from above into the following:
float Sum( const float A[], int n ) { int n4 = nn%4; // or n4=n4&(~3) int i; float sum=0, sum1=0, sum2=0, sum3=0; for (i=0; i<n4; i+=4) { sum = sum + A[i]; sum1 = sum1 + A[i+1]; sum2 = sum2 + A[i+2]; sum3 = sum3 + A[i+3]; } sum = sum + sum1 + sum2 + sum3; for (; i<n; i++) sum = sum + A[i]; return sum; }
The function now maintains four separate summations, which can be processed simultaneously at each step. Although the optimized function is now much faster, the optimized results can be quite different from the nonoptimized results. In making this change, the compiler assumed associative floatingpoint addition; that is, that these two expressions are equivalent (a+b)+c == a+(b+c). However, associativity does not always hold true for floatingpoint numbers. Instead of computing the summation as:
sum = A[0]+A[1]+A[2]+...+A[n1]
the transformed function now computes the result as
sum = (A[0]+A[4]+A[8]+...) +(A[1]+A[5]+A[9]+...) +(A[2]+A[6]+A[10]+...) +(A[3]+A[7]+A[11]+...) +...
For some values of A[], this different ordering of addition operations may produce unexpected results. To further complicate matters, some programmers may choose to anticipate such optimizations and compensate for them appropriately. In this case, a program can construct the array A
in a different order so that the optimized sum
produces the expected results. Moreover, in many circumstances the accuracy of the optimized result may be "close enough". This is especially true when the optimization provides compelling speed benefits. Video games, for example, require as much speed as possible but don't often require highly accurate floatingpoint computations. Compiler makers must therefore provide a mechanism for programmers to control the often disparate goals of speed and accuracy.
Some compilers resolve the tradeoff between speed and accuracy by providing a separate "switch" for each type of optimization. This allows developers to disable optimizations that are causing changes to floatingpoint accuracy for their particular application. While this solution may offer a high degree of control over the compiler, it introduces several additional problems:
 It is often unclear which switches to enable or disable.
 Disabling any single optimization may adversely affect the performance of non floatingpoint code.
 Each additional switch incurs many new switch combinations; the number of combinations quickly becomes unwieldy.
So while providing separate switches for each optimization may seem appealing, using such compilers can be cumbersome and unreliable.
Many C++ compilers offer a "consistency" floatingpoint model, (through a /Op or /fltconsistency switch) which enables a developer to create programs compliant with strict floatingpoint semantics. When engaged, this model prevents the compiler from using most optimizations on floatingpoint computations while allowing those optimizations for nonfloatingpoint code. The consistency model, however, has a darkside. In order to return predictable results on different FPU architectures, nearly all implementations of /Op round intermediate expressions to the user specified precision; for example, consider the following expression:
float a, b, c, d, e; . . . a = b*c + d*e;
In order to produce consistent and repeatable results under /Op, this expression gets evaluated as if it were implemented as follows:
float x = b*c; float y = d*e; a = x+y;
The final result now suffers from singleprecision rounding errors at each step in evaluating the expression. Although this interpretation doesn't strictly break any C++ semantics rules, it's almost never the best way to evaluate floatingpoint expressions. It is generally more desirable to compute the intermediate results in as high as precision as is practical. For instance, it would be better to compute the expression a=b*c+d*e in a higher precision as in,
double x = b*c; double y = d*e; double z = x+y; a = (float)z;
or better yet
long double x = b*c; long double y = d*e long double z = x+y; a = (float)z;
When computing the intermediate results in a higher precision, the final result is significantly more accurate. Ironically, by adopting a consistency model, the likelihood of error is increased precisely when the user is trying to reduce error by disabling unsafe optimizations. Thus the consistency model can seriously reduce efficiency while simultaneously providing no guarantee of increased accuracy. To serious numerical programmers, this doesn't seem like a very good tradeoff and is the primary reason that the model is not generally well received.
Beginning with version 8.0 (Visual C++® 2005), the Microsoft C++ compiler provides a much better alternative. It allows programmers to select one of three general floatingpoint modes: fp:precise, fp:fast and fp:strict.
 Under fp:precise, only safe optimizations are performed on floatingpoint code and, unlike /Op, intermediate computations are consistently performed at the highest practical precision.
 fp:fast mode relaxes floatingpoint rules allowing for more aggressive optimization at the expense of accuracy.
 fp:strict mode provides all the general correctness of fp:precise while enabling fpexception semantics and preventing illegal transformations in the presence of FPU environment changes (e.g. changes to the register precision, rounding direction etc).
Floatingpoint exception semantics can be controlled independently by either a commandline switch or a compiler pragma; by default, floatingpoint exception semantics are disabled under fp:precise and enabled under fp:strict. The compiler also provides control over FPU environment sensitivity and certain floatingpoint specific optimizations, such as contractions. This straightforward model gives developers a great deal of control over the compilation of floatingpoint code without the burden of too many compiler switches or the prospect of undesirable sideeffects.
The fp:precise Mode for FloatingPoint Semantics
The default floatingpoint semantics mode is fp:precise. When this mode is selected, the compiler strictly adheres to a set of safety rules when optimizing floatingpoint operations. These rules permit the compiler to generate efficient machine code while maintaining the accuracy of floatingpoint computations. To facilitate the production of fast programs, the fp:precise model disables floatingpoint exception semantics (although they can be explicitly enabled). Microsoft has selected fp:precise as the default floatingpoint mode because it creates both fast and accurate programs.
To explicitly request the fp:precise mode using the commandline compiler, use the fp:precise switch:
cl fp:precise source.cpp
This instructs the compiler to use fp:precise semantics when generating code for the source.cpp file. The fp:precise model can also be invoked on a functionbyfunction basis using the float_control compiler pragma.
Under the fp:precise mode, the compiler never performs any optimizations that perturb the accuracy of floatingpoint computations. The compiler will always round correctly at assignments, typecasts and function calls, and intermediate rounding will be consistently performed at the same precision as the FPU registers. Safe optimizations, such as contractions, are enabled by default. Exception semantics and FPU environment sensitivity are disabled by default.
fp:precise FloatingPoint Semantics  Explanation 

Rounding Semantics  Explicit rounding at assignments, typecasts, and function calls Intermediate expressions will be evaluated at register precision 
Algebraic Transformations  Strict adherence to nonassociative, nondistributive floatingpoint algebra, unless a transformation is guaranteed to always produce the same results. 
Contractions  Permitted by default (see also The fp_contract Pragma) 
Order of Floatingpoint Evaluation  The compiler may reorder the evaluation of floatingpoint expressions provided that the final results are not altered. 
FPU Environment Access  Disabled by default (see also The fpenv_access Pragma). The default precision and rounding mode is assumed. 
Floatingpoint Exception Semantics  Disabled by default (see also fp:except switch) 
Rounding Semantics for FloatingPoint Expressions Under fp:precise
The fp:precise model always performs intermediate computations at the highest practical precision, explicitly rounding only at certain points in expression evaluation. Rounding to the userspecified precision always occurs in four places: (a) when an assignment is made, (b) when a typecast is performed, (c) when a floatingpoint value is passed as an argument to a function and (d) when a floatingpoint value is returned from a function. Because intermediate computations are always performed at register precision, the accuracy of intermediate results is platform dependant (though precision will always be at least as accurate as the user specified precision).
Consider the assignment expression in the following code. The expression on the righthand side of the assignment operator '=' will be computed at register precision, and then explicitly rounded to the type of the lefthand side of the assignment.
float a, b, c, d; double x; ... x = a*b + c*d;
is computed as
float a, b, c, d; double x; ... register tmp1 = a*b; register tmp2 = c*d; register tmp3 = tmp1+tmp2; x = (double) tmp3;
To explicitly round an intermediate result, introduce a typecast. For example, if the previous code is modified by adding an explicit typecast, the intermediate expression (c*d) will be rounded to the type of the typecast.
float a, b, c, d; double x; . . . x = a*b + (float)(c*d);
is computed as
float a, b, c, d; double x; . . . register tmp1 = a*b; float tmp2 = c*d; register tmp3 = tmp1+tmp2; x = (double) tmp3
One implication of this rounding method is that some seemingly equivalent transformations don't actually have identical semantics. For instance, the following transformation splits a single assignment expression into two assignment expressions.
float a, b, c, d; . . . a = b*(c+d);
is NOT equivalent to
float a, b, c, d; . . . a = c+d; a = b*a;
Likewise:
a = b*(c+d);
is NOT equivalent to
a = b*(a=c+d);
These encodings do not have equivalent semantics because the second encodings have each introduced an additional assignment operation, and hence an additional rounding point.
When a function returns a floatingpoint value, the value will be rounded to the type of the function. When a floatingpoint value is passed as a parameter to a function, the value will be rounded to the type of the parameter. For example:
float sumsqr(float a, float b) { return a*a + b*b; }
is computed as
float sumsqr(float a, float b) { register tmp3 = a*a; register tmp4 = b*b; register tmp5 = tmp3+tmp4; return (float) tmp5; }
Likewise:
float w, x, y, z; double c; ... c = symsqr(w*x+y, z);
is computed as
float x, y, z; double c; ... register tmp1 = w*x; register tmp2 = tmp1+y; float tmp3 = tmp2; c = symsqr( tmp3, z);
Architecturespecific rounding under fp:precise
Processor  Rounding Precision for Intermediate Expressions 

x86  Intermediate expressions are computed at the default 53bit precision with an extended range provided by a 16bit exponent. When these 53:16 values are "spilled" to memory (as can happen during a function call), the extended exponent range will be narrowed to 11bits. That is, spilled values are cast to the standard double precision format with only an 11bit exponent.
A user may switch to extended 64bit precision for intermediate rounding by altering the floatingpoint control word using _controlfp and by enabling FPU environment access (see The fpenv_access Pragma). However, when extended precision registervalues are spilled to memory, the intermediate results will still be rounded to double precision. This particular semantic is subject to change. 
ia64  Intermediate expressions are always computed at 64bit precision (that is, extended precision). 
amd64  FP semantics on amd64 are somewhat different from other platforms. For performance reasons, intermediate operations are computed at the widest precision of either operand instead of at the widest precision available. To force computations to be computed using a wider precision than the operands, users need to introduce a cast operation on at least one operand in a subexpression.
This particular semantic is subject to change. 
Algebraic Transformations Under fp:precise
When the fp:precise mode is enabled, the compiler will never perform algebraic transformations unless the final result is provably identical. Many of the familiar algebraic rules for Real number arithmetic do not always hold for floatingpoint arithmetic. For example, the following expressions are equivalent for Reals, but not necessarily for Floats.
Form  Description 

(a+b)+c = a+(b+c)  Associative rule for addition 
(a*b)*c = a*(b*c)  Associative rule for multiplication 
a*(b+c) = a*b + b*c  Distribution of multiplication over addition 
(a+b)(ab) = a*ab*b  Algebraic Factoring 
a/b = a*(1/b)  Division by multiplicative inverse 
a*1.0 = a  Multiplicative identity 
As depicted in the introduction example with the function KahanSum, the compiler might be tempted to perform various algebraic transformations in order to produce considerably faster programs. Although optimizations dependent on such algebraic transformations are almost always incorrect, there are occasions for which they are perfectly safe. For instance, it is sometimes desirable to replace division by a constant value with multiplication by the multiplicativeinverse of the constant:
const double four = 4.0; double a, b; ... a = b/four;
May be transformed into
const double four = 4.0; const double tmp0 = 1/4.0; double a, b; ... a = b*tmp0;
This is a safe transformation because the optimizer can determine at compile time that x/4.0 == x*(1/4.0) for all floatingpoint values of x, including infinities and NaN. By replacing a division operation with a multiplication, the compiler can save several cycles—especially on FPUs that don't directly implement division, but require the compiler to generate a combination of reciprocalapproximation and multiplyadd instructions. The compiler may perform such an optimization under fp:precise only when the replacement multiplication produces the exact same result as the division. The compiler may also perform trivial transformations under fp:precise, provided the results are identical. These include:
Form  Description 

(a+b) == (b+a)  Commutative rule for addition 
(a*b) == (b*a)  Commutative rule for multiplication 
1.0*x*y == x*1.0*y == x*y*1.0 == x*y  Multiplication by 1.0 
x/1.0*y == x*y/1.0 == x*y  Division by 1.0 
2.0*x == x+x  Multiplication by 2.0 
Contractions Under fp:precise
A key architectural feature of many modern floatingpoint units is the ability to perform a multiplication followed by an addition as a single operation with no intermediate roundoff error. For example, Intel's Itanium architecture provides instructions to combine each of these ternary operations, (a*b+c), (a*bc) and (ca*b), into a single floatingpoint instruction (fma, fms and fnma respectively). These single instructions are faster than executing separate multiply and add instructions, and are more accurate since there is no intermediate rounding of the product. This optimization can significantly speed up functions containing several interleaved multiply and add operations. For example, consider the following algorithm which computes the dotproduct of two ndimensional vectors.
float dotProduct( float x[], float y[], int n ) { float p=0.0; for (int i=0; i<n; i++) p += x[i]*y[i]; return p; }
This computation can be performed a series of multiplyadd instructions of the form p = p + x[i]*y[i]
.
The contraction optimization can be independently controlled using the fp_contract compiler pragma. By default, the fp:precise model allows for contractions since they improve both accuracy and speed. Under fp:precise, the compiler will never contract an expression with explicit rounding.
Examples
float a, b, c, d, e, t; ... d = a*b + c; // may be contracted d += a*b; // may be contracted d = a*b + e*d; // may be contracted into a mult followed by a multadd etc... d = (float)a*b + c; // won't be contracted because of explicit rounding t = a*b; // (this assignment rounds a*b to float) d = t + c; // won't be contracted because of rounding of a*b
Order of FloatingPoint Expression Evaluation Under fp:precise
Optimizations that preserve the order of floatingpoint expression evaluation are always safe and are therefore permitted under the fp:precise mode. Consider the following function which computes the dot product of two ndimensional vectors in single precision. The first code block below is the original function as it might be encoded by a programmer, followed by the same function after a partial loopunrolling optimization (changes italicized).
//original function float dotProduct( float x[], float y[], int n ) { float p=0; for (int i=0; i<n; i++) p += x[i]*y[i]; return p; } //after a partial loopunrolling float dotProduct( float x[], float y[], int n ) { int n4= n/4*4; // or n4=n&(~3); float p=0; int i; for (i=0; i<n4; i+=4) { p+=x[i]*y[i]; p+=x[i+1]*y[i+1]; p+=x[i+2]*y[i+2]; p+=x[i+3]*y[i+3]; } // last n%4 elements for (; i<n; i++) p+=x[i]*y[i]; return p; }
The primary benefit of this optimization is that it reduces the number of conditional loopbranching by as much as 75%. Also, by increasing the number of operations within the loop body, the compiler may now have more opportunities to optimize further. For instance, some FPUs may be able to perform the multiplyadd in p+=x[i]*y[i] while simultaneously fetching the values for x[i+1] and y[i+1] for use in the next step. This type of optimization is perfectly safe for floatingpoint computations because it preserves the order of operations.
It's often advantageous for the compiler to reorder entire operations in order to produce faster code. Consider the following code:
double a, b, c, d; double x, y, z; ... x = a*a*a + b*b*b + c*c*c; ... y = a*a + b*b + c*c; ... z = a + b + c;
C++ semantic rules indicate that the program should produce results as if it first computed x, then y and finally z. Suppose the compiler has only four available floatingpoint registers. If the compiler is forced to compute x, y and z in order, it might choose to generate code with the following semantics:
double a, b, c, d; double x, y, z; register r0, r1, r2, r3; ... // Compute x r0 = a; // r1 = a*a*a r1 = r0*r0; r1 = r1*r0; r0 = b; // r2 = b*b*b r2 = r0*r0; r2 = r2*r0; r0 = c; // r3 = c*c*c r3 = r0*r0; r3 = r3*r0; r0 = r1 + r2; r0 = r0 + r3; x = r0; // x = r1+r2+r3 . . . // Compute y r0 = a; // r1 = a*a r1 = r0*r0; r0 = b; // r2 = b*b r2 = r0*r0; r0 = c; // r3 = c*c r3 = r0*r0; r0 = r1 + r2; r0 = r0 + r3; y = r0; // y = r1+r2+r3 . . . // Compute z r1 = a; r2 = b; r3 = c; r0 = r1 + r2; r0 = r0 + r3; z = r0; // z = r1+r2+r3
There are several clearly redundant operations is this encoding (italicized). If the compiler strictly follows C++ semantic rules, this ordering is necessary because the program might access the FPU environment inbetween each assignment. However, the default settings for fp:precise allow the compiler to optimize as though the program doesn't access the environment, allowing it to reorder these expressions. It's then free to remove the redundancies by computing the three values in reverse order, as follows:
double a, b, c, d; double x, y, z; register r0, r1, r2, r3; ... // Compute z r1 = a; r2 = b; r3 = c; r0 = r1+r2; r0 = r0+r3; z = r0; ... // Compute y r1 = r1*r1; r2 = r2*r2; r3 = r3*r3; r0 = r1+r2; r0 = r0+r3; y = r0; ... // Compute x r0 = a; r1 = r1*r0; r0 = b; r2 = r2*r0; r0 = c; r3 = r3*r0; r0 = r1+r2; r0 = r0+r3; x = r0;
This encoding is clearly superior, having reduced the number of fpinstructions by almost 40%. The results for x, y and z are the same as before, but computed with less overhead.
Under fp:precise, the compiler may also interlace common subexpressions so as to produce faster code. For example, code to compute the roots of a quadratic equation might be written as follows:
double a, b, c, root0, root1; ... root0 = (b + sqrt(b*b4*a*c))/(2*a); root1 = (b  sqrt(b*b4*a*c))/(2*a);
Although these expressions only differ by a single operation, the programmer may have written it this way to guarantee that each root value will be computed in the highest practical precision. Under fp:precise, the compiler is free to interlace the computation of root0 and root1 to remove common subexpressions without losing precision. For instance, the following has removed several redundant steps while producing the exact same answer.
double a, b, c, root0, root1; ... register tmp0 = b; register tmp1 = sqrt(b*b4*a*c); register tmp2 = 2*a; root0 = (tmp0+tmp1)/tmp2; root1 = (tmp0tmp1)/tmp2;
Other optimizations may attempt to move the evaluation of certain independent expressions. Consider the following algorithm which contains a conditionalbranch within a loopbody.
vector<double> a(n); double d, s; . . . for (int i=0; i<n; i++) { if (abs(d)>1.0) s = s+a[i]/d; else s = s+a[i]*d; }
The compiler may detect that the value of the expression (abs(d)>1) is invariant within the loop body. This enables the compiler to "hoist" the if statement outside the loop body, transforming the above code into the following:
vector<double> a(n); double d, s; . . . if (abs(d)>1.0) for (int i=0; i<n; i++) s = s+a[i]/d; else for (int i=0; i<n; i++) s = s+a[i]*d;
After the transformation, there is no longer a conditional branch in either of the loop bodies, thus greatly improving the overall performance of the loop. This type of optimization is perfectly safe because the evaluation of the expression (abs(d)>1.0)
is independent of other expressions.
In the presence of FPU environment access or floatingpoint exceptions, these types of optimization are contraindicated because they change the semantic flow. Such optimizations are only available under the fp:precise mode because FPU environment access and floatingpoint exception semantics are disabled by default. Functions that access the FPU environment can explicitly disable such optimizations by using the fenv_access compiler pragma. Likewise, functions using floatingpoint exceptions should use the float_control(except…) compiler pragma (or use the /fp:except command line switch).
In summary, the fp:precise mode allows the compiler to reorder the evaluation of floatingpoint expressions on condition that the final results are not altered and that results are not dependant on the FPU environment or on floatingpoint exceptions.
FPU Environment Access Under fp:precise
When the fp:precise mode is enabled, the compiler assumes that a program does not access or alter the FPU environment. As stated earlier, this assumption enables the compiler to reorder or move floatingpoint operations to improve efficiency under fp:precise.
Some programs may alter the floatingpoint roundingdirection by using the _controlfp function. For instance, some programs compute upper and lower error bounds on arithmetic operations by performing the same computation twice, first while rounding towards negative infinity, then while rounding towards positive infinity. Since the FPU provides a convenient way to control rounding, a programmer may choose to change rounding mode by altering the FPU environment. The following code computes an exact error bound of a floatingpoint multiplication by altering the FPU environment.
double a, b, cLower, cUpper; . . . _controlfp( _RC_DOWN, _MCW_RC ); // round to ∞ cLower = a*b; _controlfp( _RC_UP, _MCW_RC ); // round to +∞ cUpper = a*b; _controlfp( _RC_NEAR, _MCW_RC ); // restore rounding mode
Under fp:precise, the compiler always assumes the default FPU environment, so the optimizer is free to ignore the calls to _controlfp and reduce the above assignments to cUpper = cLower = a*b
; this would clearly yield incorrect results. To prevent such optimizations, enable FPU environment access by using the fenv_access compiler pragma.
Other programs may attempt to detect certain floatingpoint errors by checking the FPU's status word. For example, the following code checks for the dividebyzero and inexact conditions
double a, b, c, r; float x; . . . _clearfp(); r = (a*b + sqrt(b*b4*a*c))/(2*a); if (_statusfp() & _SW_ZERODIVIDE) handle divide by zero as a special case _clearfp(); x = r; if (_statusfp() & _SW_INEXACT) handle inexact error as a special case etc...
Under fp:precise ,optimizations that reorder expression evaluation may change the points at which certain errors occur. Programs accessing the status word should enable FPU environment access by using the fenv_access compiler pragma.
See pragma fenv_access for more information
FloatingPoint Exception Semantics Under fp:precise
By default, floatingpoint exception semantics are disabled under fp:precise. Most C++ programmers prefer to handle exceptional floatingpoint conditions without using system or C++ exceptions. Moreover, as stated earlier, disabling floatingpoint exception semantics allows the compiler greater flexibility when optimizing floatingpoint operations. Use either the fp:except switch or the fp_control pragma to enable floatingpoint exception semantics when using the fp:precise model.
See Also
Enabling floatingpoint Exception Semantics
The fp:fast Mode for FloatingPoint Semantics
When the fp:fast mode is enabled, the compiler relaxes the rules that fp:precise uses when optimizing floatingpoint operations. This mode allows the compiler to further optimize floatingpoint code for speed at the expense of floatingpoint accuracy and correctness. Programs that do not rely on highly accurate floatingpoint computations may experience a significant speed improvement by enabling the fp:fast mode.
The fp:fast floatingpoint mode is enabled using a commandline compiler switch as follows:
cl fp:fast source.cpp or cl /fp:fast source.cpp
This example instructs the compiler to use fp:fast semantics when generating code for the source.cpp file. The fp:fast model can also be invoked on a functionbyfunction basis using the float_control compiler pragma.
See Also
The float_control Pragma
Under the fp:fast mode, the compiler may perform optimizations that alter the accuracy of floatingpoint computations. The compiler may not round correctly at assignments, typecasts or function calls, and intermediate rounding will not always be performed. Floatingpoint specific optimizations, such as contractions, are always enabled. Floatingpoint exception semantics and FPU environment sensitivity are disabled and unavailable.
fp:fast floatingpoint semantics  Explanation 

Rounding Semantics  Explicit rounding at assignments, typecasts, and function calls may be ignored. Intermediate expressions may be rounded at less than register precision according to performance requirements. 
Algebraic Transformations  The compiler may transform expressions according realnumber associative, distributive algebra; these transformations are not guaranteed to be either accurate or correct. 
Contractions  Always enabled; cannot be disabled by pragma fp_contract 
Order of Floatingpoint Evaluation  The compiler may reorder the evaluation of floatingpoint expressions, even when such changes may alter the final results 
FPU Environment Access  Disabled. Not available 
Floatingpoint Exception Semantics  Disabled. Not available 
Rounding Semantics for FloatingPoint Expressions Under fp:fast
Unlike the fp:precise model, the fp:fast model performs intermediate calculations at the most convenient precision. Rounding at assignments, typecasts and functioncalls may not always occur. For instance, the first function below introduces three singleprecision variables (C, Y
and
T). The compiler may choose to enregister these variables, in effect of type promoting C, Y
and T to doubleprecision.
Original function:
float KahanSum( const float A[], int n ) { float sum=0, C=0, Y, T; for (int i=0; i<n; i++) { Y = A[i]  C; T = sum + Y; C = T  sum  Y; sum = T; } return sum; }
Variables enregistered:
float KahanSum( const float A[], int n ) { float sum=0; double C=0, Y, T; // now held inregister for (int i=0; i<n; i++) { Y = A[i]  C; T = sum + Y; C = T  sum  Y; sum = (float) T; } return sum; }
In this example, fp:fast has subverted the intent of the original function. The final optimized result, held in the variable sum, may be quite perturbed from the correct result.
Under fp:fast, the compiler will typically attempt to maintain at least the precision specified by the source code. However, in some instances the compiler may choose to perform intermediate expressions at a lower precision than specified in the source code. For example, the first code block below calls a double precision version of the squareroot function. Under fp:fast, the compiler may choose to replace the call to the double precision sqrt with a call to a single precision sqrt function. This has the effect of introducing additional lowerprecision rounding at the point of the function call.
Original function
double sqrt(double)... float a, b, c; double d; . . . double d = a * b; float c = sqrt((float) d);
Optimized function
float sqrtf(float)... . . . float a, b,c; . . . double d = a * b; // upconvert to doubleprecision by C/C++ promotion rules float tmp0 = (float) d; // downconvert to singleprecision by explicit cast float c = sqrtf(tmp0); // square root computed in singleprecision
Although less accurate, this optimization may be especially beneficial when targeting processors that provide single precision, intrinsic versions of functions such as sqrt. Just precisely when the compiler will use such optimizations is both platform and context dependant.
Furthermore, there is no guaranteed consistency for the precision of intermediate computations, which may be performed at any precision level available to the compiler. Although the compiler will attempt to maintain at least the level of precision as specified by the code, fp:fast allows the optimizer to downcast intermediate computations in order to produce faster or smaller machine code. For instance, the compiler may further optimize the code from above to round the intermediate multiplication to single precision.
float sqrtf(float)... float a, b, c; . . . // in this simple example, the store to d is optimized away float tmp1 = a * b; // converts have been removed float c = sqrtf(tmp1); // square root computed in singleprecision
This kind of additional rounding may result from using a lower precision floatingpoint unit, such as SSE2, to perform some of the intermediate computations. The accuracy of fp:fast rounding is therefore platform dependant; code that compiles well for one processor may not necessarily work well for another processor. It's left to the user to determine if the speed benefits outweigh any accuracy problems.
If fp:fast optimization is particularly problematic for a specific function, the floatingpoint mode can be locally switched to fp:precise using the float_control compiler pragma.
Algebraic Transformations Under fp:fast
The fp:fast mode enables the compiler to perform certain, unsafe algebraic transformations to floating point expressions. For example, the following unsafe optimizations may be employed under fp:fast.
Original Code  Step #1  Step #2 

double a, b, c;
 double a, b, c;
 double a, b, c;

In step 1, the compiler observes that z = y
– a
– b
is always equal to zero. Although this is technically an invalid observation, it is permitted under fp:fast. The compiler then propagates the constant value zero to every subsequent use of the variable z. In step 2, the compiler further optimizes by observing that x0==x, x*0==0, etc. Again, even though these observations are not strictly valid, they are permitted under fp:fast. The optimized code is now much faster, but may also be considerably less accurate or even incorrect.
Any of the following (unsafe) algebraic rules may be employed by the optimizer when the fp:fast mode is enabled:
Form  Description 

(a+b)+c = a+(b+c)  Associative rule for addition 
(a*b)*c = a*(b*c)  Associative rule for multiplication 
a*(b+c) = a*b + b*c  Distribution of multiplication over addition 
(a+b)(ab) = a*ab*b  Algebraic Factoring 
a/b = a*(1/b)  Division by multiplicative inverse 
a*1.0 = a, a/1.0 = a  Multiplicative identity 
a±0.0 = a, 0.0a = a  Additive identity 
a/a = 1.0, aa = 0.0  Cancellation 
If fp:fast optimization is particularly problematic for a particular function, the floatingpoint mode can be locally switched to fp:precise using the float_control compiler pragma.
Order of FloatingPoint Expression Evaluation Under fp:fast
Unlike fp:precise, fp:fast allows the compiler to reorder floatingpoint operations so as to produce faster code, even though the reordering may alter the final result. Thus, some optimizations under fp:fast may not preserve the intended order of expressions. For instance, consider the following function that computes the dot product of two ndimensional vectors.
float dotProduct( float x[], float y[], int n ) { float p=0; for (int i=0; i<n; i++) p += x[i]*y[i]; return p; }
Under fp:fast, the optimizer may perform a scalar reduction of the dotProduct function effectively transforming the function as follows:
float dotProduct( float x[], float y[],int n ) { int n4= n/4*4; // or n4=n&(~3); float p=0, p2=0, p3=0, p4=0; int i; for (i=0; i<n4; i+=4) { p+=x[i]*y[i]; p2+=x[i+1]*y[i+1]; p3+=x[i+2]*y[i+2]; p4+=x[i+3]*y[i+3]; } p+=p2+p3+p4; // last n%4 elements for (; i<n; i++) p+=x[i]*y[i]; return p; }
In the optimized version of the function four separate productsummations are taken simultaneously and then added together. This optimization can speed up the computation of the dotProduct by as much as a factor of four depending on the target processor, but the final result may be so inaccurate as to render it useless. If such optimizations are particularly problematic for single function or translation unit, the floatingpoint mode can be locally switched to fp:precise using the float_control compiler pragma.
The fp:strict Mode for FloatingPoint Semantics
When the fp:strict mode is enabled, the compiler adheres to all the same rules that fp:precise uses when optimizing floatingpoint operations. This mode also enables floatingpoint exception semantics and sensitivity to the FPU environment and disables certain optimizations such as contractions. It is the strictest mode of operation.
The fp:strict floatingpoint mode is enabled using a commandline compiler switch as follows:
cl fp:strict source.cpp or cl /fp:strict source.cpp
This example instructs the compiler to use fp:strict semantics when generating code for the source.cpp file. The fp:strict model can also be invoked on a functionbyfunction basis using the float_control compiler pragma.
See also:
The float_control Pragma
Under the fp:strict mode, the compiler never performs any optimizations that perturb the accuracy of floatingpoint computations. The compiler will always round correctly at assignments, typecasts and function calls, and intermediate rounding will be consistently performed at the same precision as the FPU registers. Floatingpoint exception semantics and FPU environment sensitivity are enabled by default. Certain optimizations, such as contractions, are disabled because the compiler cannot guarantee correctness in every case.
fp:strict FloatingPoint Semantics  Explanation 

Rounding Semantics  Explicit rounding at assignments, typecasts, and function calls Intermediate expressions will be evaluated at register precision. Same as fp:precise 
Algebraic Transformations  Strict adherence to nonassociative, nondistributive floatingpoint algebra, unless a transformation is guaranteed to always produce the same results. Same as fp:precise 
Contractions  Always disabled 
Order of Floatingpoint Evaluation  The compiler will not reorder the evaluation of floatingpoint expressions 
FPU Environment Access  Always enabled. 
Floatingpoint Exception Semantics  Enabled by default. 
FloatingPoint Exception Semantics Under fp:strict
By default, floatingpoint exception semantics are enabled under the fp:strict model. To disable these semantics, use either the "fp:except"
switch or introduce a float_control(except, off)
pragma.
See also:
Enabling floatingpoint Exception Semantics
The float_control Pragma
The fenv_access Pragma
Usage:
#pragma fenv_access( [ on  off ] )
The fenv_access pragma allows the compiler to make certain optimizations that might subvert FPU flag tests and FPU mode changes. When the state of fenv_access is disabled, the compiler can assume the default FPU modes are in effect and that FPU flags are not tested. By default, environment access is disable for the fp:precise mode, though it may be explicitly enabled using this pragma. Under fp:strict, fenv_access is always enabled and cannot be disabled. Under fp:fast, fenv_access is always disabled, and cannot be enabled.
As described in the fp:precise section, some programmers may alter the floatingpoint roundingdirection using the _controlfp function. For example, to compute the upper and lower error bounds on arithmetic operations, some programs perform the same computation twice, first while rounding towards negative infinity, then while rounding towards positive infinity. Since the FPU provides a convenient way to control rounding, a programmer may choose to change rounding mode by altering the FPU environment. The following code computes an exact error bound of a floatingpoint multiplication by altering the FPU environment.
double a, b, cLower, cUpper; . . . _controlfp( _RC_DOWN, _MCW_RC ); // round to 8 cLower = a*b; _controlfp( _RC_UP, _MCW_RC ); // round to +8 cUpper = a*b; _controlfp( _RC_NEAR, _MCW_RC ); // restore rounding mode
When disabled, the fenv_access pragma allows the compiler to assume the default FPU environment; thus the optimizer is free to ignore the calls to _controlfp and reduce the above assignments to cUpper = cLower = a*b
. When enabled, however, fenv_access prevents such optimizations.
Programs may also check the FPU status word to detect certain floatingpoint errors. For example, the following code checks for the dividebyzero and inexact conditions
double a, b, c, r; float x; . . . _clearfp(); r = (a*b + sqrt(b*b4*a*c))/(2*a); if (_statusfp() & _SW_ZERODIVIDE) handle divide by zero as a special case _clearfp(); x = (a*b + sqrt(b*b4*a*c))/(2*a); if (_statusfp() & _SW_INEXACT) handle inexact error as a special case etc...
When fenv_access is disabled, the compiler might rearrange the execution order of the floatingpoint expressions, thus possibly subverting the FPU status checks. Enabling fenv_access prevents such optimizations.
The fp_contract Pragma
Usage:
#pragma fp_contract( [ on  off ] )
As described in the fp:precise section, contraction is a fundamental architectural feature for many modern floatingpoint units. Contractions provide the ability to perform a multiplication followed by an addition as a single operation with no intermediate roundoff error. For example, Intel's Itanium architecture provides instructions to combine each of these ternary operations, (a*b+c), (a*bc) and (ca*b), into a single floatingpoint instruction (fma, fms and fnma respectively). These single instructions are faster than executing separate multiply and add instructions, and are more accurate since there is no intermediate rounding of the product. A contracted operation can computes the value of (a*b+c) as if both operations were computed to infinite precision, and then rounded to the nearest floatingpoint number. This optimization can significantly speed up functions containing several interleaved multiply and add operations. For example, consider the following algorithm which computes the dotproduct of two ndimensional vectors.
float dotProduct( float x[], float y[], int n ) { float p=0.0; for (int i=0; i<n; i++) p += x[i]*y[i]; return p; }
This computation can be performed a series of multiplyadd instructions of the form p = p + x[i]*y[i]
.
The fp_contract pragma specifies whether floatingpoint expressions can be contracted. By default, the fp:precise mode allows for contractions since they improve both accuracy and speed. Contractions are always enabled for the fp:fast mode. However, because contractions can subvert the explicit detection of error conditions, the fp_contract
pragma is always disabled under the fp:strict mode. Examples of expressions that may be contracted when the fp_contract
pragma is enabled:
float a, b, c, d, e, t; ... d = a*b + c; // may be contracted d += a*b; // may be contracted d = a*b + e*d; // may be contracted into a mult followed by a multadd etc... d = (float)a*b + c; // won't be contracted because of explicit rounding t = a*b; // (this assignment rounds a*b to float) d = t + c; // won't be contracted because of rounding of a*b
The float_control Pragma
The /fp:precise, /fp:fast, /fp:strict and /fp:except switches control floatingpoint semantics on a filebyfile basis. The float_control pragma provides such control on a functionbyfunction basis.
Usage:
#pragma float_control(push) #pragma float_control(pop) #pragma float_control( precise, on  off [, push] ) #pragma float_control( except, on  off [, push] )
The pragmas float_control(push) and float_control(pop) respectively push and pop the current state of the floatingpoint mode and the exception option onto a stack. Note that the state of the fenv_access and fp_contract pragma are not affected by pragma float_control(push/pop).
Calling the pragma float_control(precise, on  off) will enable or disable precisemode semantics. Likewise, the pragma float_control(except, on  off) will enable or disable exception semantics. Exception semantics can only be enabled when precise semantics are also enabled. When the optional push argument is present, the states of the float_control options are pushed prior to changing semantics.
Setting the FloatingPoint Semantic Mode on a FunctionbyFunction Basis
The commandline switches are in fact shorthand for setting the four different floatingpoint pragmas. To explicitly choose a particular floatingpoint semantic mode on a functionbyfunction basis, select each of the four floatingpoint option pragmas as described in the following table:
float_control(precise)  float_control(except)  fp_contract  fenv_access  

fp:strict  on  on  off  on 
fp:strict fp:except  on  off  off  on 
fp:precise  on  off  on  off 
fp:precise –fp:except  on  on  on  off 
fp:fast  off  off  on  off 
For example, the following explicitly enables fp:fast semantics.
#pragma float_control( except, off ) // disable exception semantics #pragma float_control( precise, off ) // disable precise semantics #pragma fp_contract(on) // enable contractions #pragma fenv_access(off) // disable fpu environment sensitivity
Note Exception semantics must be turnedoff before turning off "precise" semantics.
Enabling FloatingPoint Exception Semantics
Certain exceptional floatingpoint conditions, such as dividing by zero, can cause the FPU to signal a hardware exception. Floatingpoint exceptions are disabled by default. Floatingpoint exceptions are enabled by modifying the FPU control word with the _controlfp function. For example, the following code enables the dividebyzero floatingpoint exception:
_clearfp(); // always call _clearfp before enabling/unmasking a FPU exception _controlfp( _EM_ZERODIVIDE, _MCW_EM );
When the dividebyzero exception is enabled, any division operation with a denominator equal to zero will cause a FPU exception to be signaled.
To restore the FPU control word to the default mode,
call _controlfp(_CW_DEFAULT, ~0)
.
Enabling floatingpoint exception semantics with the fp:except
flag is not the same as enabling floatingpoint exceptions. When floatingpoint exception semantics are enabled, the compiler must account for the possibility that any floatingpoint operation might throw an exception. Because the FPU is a separate processor unit, instructions executing on the FPU can be performed concurrently with instructions on other units.
When a floatingpoint exception is enabled, the FPU will halt execution of the offending instruction and then signal an exceptional condition by setting the FPU status word. When the CPU reaches the next floatingpoint instruction, it first checks for any pending FPU exceptions. If there is a pending exception, the processor traps it by calling an exception handler provided by the Operating System. This means that when a floatingpoint operation encounters an exceptional condition, the corresponding exception won't be detected until the next floatingpoint operation is executed. For example, the following code traps a dividebyzero exception:
double a, b, c; . . . ...unmasking of FPU exceptions omitted... __try { b/c; // assume c==0.0 printf("This line shouldn't be reached when c==0.0\n"); c = 2.0*b; } __except( EXCEPTION_EXECUTE_HANDLER ) { printf("SEH Exception Detected\n"); }
If a dividebyzero condition occurs in the expression a=b/c
, the FPU won't trap/raise the exception until the next floatingpoint operation in the expression 2.0*b
. This results in the following output:
This line shouldn't be reached when c==0.0 SEH Exception Detected
The printf corresponding to the first line of the output should not have been reached; it was reached because the floatingpoint exception caused by the expression b
wasn't raised until execution reached /c
2.0*b
. To raise the exception just after executing b/c
, the compiler must introduce a "wait" instruction:
. . . __try { b/c; // assume this expression will cause a "dividebyzero" exception __asm fwait; printf("This line shouldn't be reached when c==0.0\n"); c = 2.0*b; } . . .
This "wait" instruction forces the processor to synchronize with the state of the FPU and handle any pending exceptions. The compiler will only generate these "wait" instructions when floatingpoint semantics are enabled. When these semantics are disabled, as there are by default, programs may encounter synchronicity errors, similar to the one above, when using floatingpoint exceptions.
When floatingpoint semantics are enabled, the compiler will not only introduce "wait" instructions, it will also prevent the compiler from illegally optimizing floatingpoint code in the presence of possible exceptions. This includes any transformations that alter the points at which exceptions are thrown. Because of these factors, enabling floatingpoint semantics may considerably reduce the efficiency of the generated machine code thus degrading an application's performance.
Floatingpoint exception semantics are enabled by default under the fp:strict mode. To enable these semantics in the fp:precise mode, add the fp:except
switch to the compiler commandline. Floatingpoint exception semantics can also be enabled and disabled on a functionbyfunction basis using the float_control pragma.
FloatingPoint Exceptions as C++ Exceptions
As with all hardware exceptions, floatingpoint exceptions do not intrinsically cause a C++ exception, but instead trigger a structured exception. To map floatingpoint structured exceptions to C++ exceptions, users can introduce a custom SEH exception translator. First, introduce a C++ exception corresponding to each floatingpoint exception:
class float_exception : public std::exception {}; class fe_denormal_operand : public float_exception {}; class fe_divide_by_zero : public float_exception {}; class fe_inexact_result : public float_exception {}; class fe_invalid_operation : public float_exception {}; class fe_overflow : public float_exception {}; class fe_stack_check : public float_exception {}; class fe_underflow : public float_exception {};
Then, introduce a translation function that will detect a floatingpoint SEH exception and throw the corresponding C++ exception. To use this function, set the structuredexception handler translator for the current process thread with the _set_se_translator(
…)
function from the runtime library.
void se_fe_trans_func( unsigned int u, EXCEPTION_POINTERS* pExp ) { switch (u) { case STATUS_FLOAT_DENORMAL_OPERAND: throw fe_denormal_operand(); case STATUS_FLOAT_DIVIDE_BY_ZERO: throw fe_divide_by_zero(); etc... }; } . . . _set_se_translator(se_fe_trans_func);
Once this mapping is initialized, floatingpoint exceptions will behave as though they are C++ exceptions. For example:
try { floatingpoint code that might throw dividebyzero or other floatingpoint exception } catch(fe_divide_by_zero) { cout << "fe_divide_by_zero exception detected" << endl; } catch(float_exception) { cout << "float_exception exception detected" << endl; }
Related Books
Methods and Applications of Interval Analysis
Improving FloatingPoint Programming
References
[1]Goldberg, David
"What Every Computer Scientist Should Know About FloatingPoint Arithmetic"
Computing Surveys, March 1991, pg. 203
About the author
Eric Fleegal works for the Visual C++ group at Microsoft where he is responsible for determining the quality of floatingpoint code generation. His interests include validated numerical computing and generative programming. His most recent research involves validating floatingpoint optimizations using intervalanalysis.