This documentation is archived and is not being maintained.

Optimizing Your Code with Visual C++

Visual Studio .NET 2003

Mark Lacey
Microsoft Corporation

April 2003

Summary: This document describes the code optimization features available in the Visual C++® .NET 2003 product. Additionally, for those readers not already familiar with the advancements made in Visual C++® .NET 2002, a short section describes the new Whole Program Optimization feature that was introduced there. Finally, some "best practices" for optimizing are discussed, as well as general enhancements to the Visual C++ compiler. (8 printed pages)

Applies to:
   Visual C++ .NET 2003


Visual C++ .NET 2003
Visual C++ .NET 2002
Best Practices for Optimizing
Enhanced Compiler Options in Visual C++ .NET


It's always frustrating to get a new tool and not feel confident that you're using it in the best possible way. This white paper attempts to alleviate the concerns you might have with the Visual C++ optimizer so that you feel comfortable that you're getting the most you can out of it.

Visual C++ .NET 2003

The Visual C++ .NET 2003 release adds two new performance-related compiler options and additionally includes improvements to several of the optimizations that were shipped as part of Visual C++ .NET 2002.

The first of the new performance-related options is /G7. This tells the compiler to optimize code for the Intel Pentium 4 and AMD Athlon processors.

The performance improvement achieved by compiling an application with /G7 varies, but when comparing to code generated by Visual C++ .NET 2002, it's not unusual to see 5-10 percent reduction in execution time for typical programs, and even 10-15 percent for programs that contain a lot of floating-point code. The range of improvement can vary greatly, and in some cases users will see over 20 percent improvement when compiling with /G7 and running on the latest generation processors.

Using /G7 does not mean that the compiler will produce code that only runs on the Intel Pentium 4 and AMD Athlon processors. Code compiled with /G7 will continue to run on older generations of these processors, although there might be some minor performance penalty. In addition, we've observed some cases where compiling with /G7 produces code that runs slower on the AMD Athlon.

When no /Gx option is specified, the compiler defaults to /GB, "blended" optimization mode. In both the 2002 and 2003 releases of Visual C++ .NET, /GB is equivalent to /G6, which is said to optimize code for the Intel Pentium Pro, Pentium II, and Pentium III.

One example of an improvement when using /G7 is better choice of instructions for Intel Pentium 4 when doing integer multiplication by a constant. For example, consider the following code:

int i;
// Do something that assigns a value to i. 
return i*15;

When compiling with /G6, the default, we produce:

mov   eax, DWORD PTR _i$[esp-4]
imul   eax, 15

When compiling with /G7, we produce the faster (but longer) sequence that avoids using the imul instruction, which has a 14-cycle latency on the Intel Pentium 4:

mov   ecx, DWORD PTR _i$[esp-4]
mov   eax, ecx
shl   eax, 4
sub   eax, ecx

The second performance-related option is /arch:[argument], which takes an argument of SSE or SSE2. This option allows the compiler to make use of Streaming SIMD Extensions (SSE) and Streaming SIMD Extensions 2 (SSE2) instructions, as well as other new instructions that are available on processors that support SSE and/or SSE2. When compiling with /arch:SSE, the resulting code will only run on processors that support the SSE instructions as well as CMOV, FCOMI, FCOMIP, FUCOMI, and FUCOMIP. Similarly, when compiling with /arch:SSE2, the resulting code will only run on processors that support the SSE2 instructions.

As with /G7, the performance improvement from compiling an application with /arch:SSE or /arch:SSE2 varies. The typical gain is 2-3 percent reduction in execution time, though in some outlying cases over 5 percent reduction in execution time has been measured.

The /arch:SSE option has the following specific effects:

  • Making use of SSE instructions for single precision floating-point (float) variables when it provides a performance improvement.
  • Making use of the CMOV instruction, which was first introduced in the Intel Pentium Pro processor.
  • Making use of the FCOMI, FCOMIP, FUCOMI, and FUCOMIP instructions, which were also first introduced in the Pentium Pro processor.

The /arch:SSE2 option includes all of the effects of the /arch:SSE option and additionally has the following effects:

  • Making use of SSE2 instructions for double precision floating-point (double) variables when it provides a performance improvement.
  • Making use of SSE2 instructions for 64-bit shifts.

In addition to these benefits, when doing builds using the /GL (Whole Program Optimization) option with either /arch:SSE or /arch:SSE2, the compiler will use custom calling conventions for functions with floating-point arguments and return values.

Finally, several of the optimizations introduced in previous versions of the product were enhanced in Visual C++ .NET 2003. One of the enhancements is the ability to eliminate the passing of "dead" parameters — those parameters that are not referenced in the called function. For example:

f1(int i, int j, int k)
   return i+k;

   int n = a+b+c+d;

   m = f1(3,n,4);
   return 0;

In function f1(), the second parameter is never used. When compiling with the /GL — Whole Program Optimization — option, the compiler will produce something like the following for the call to f1() in main():

mov   eax, 4
mov   ecx, 3
call   ?f1@@YAHHHH@Z
mov   DWORD PTR ?m@@3HA, eax

In this example, the calculation for the value of 'n' is never performed, and only the two parameters referenced in f1() are passed to f1() (and they are passed in registers rather than on the stack). Also, this example was compiled with inlining disabled, because with inlining enabled the call is entirely optimized away and the remaining code sets 'm' to the value 7.

Visual C++ .NET 2002

The 2002 release of Visual C++ .NET introduced the ability to do Whole Program Optimization (WPO) via the /GL compiler option. This is done by storing an intermediate representation of the program (rather than object code) in the object files that are created by the compiler, and then having the linker invoke the optimizer and code generator during link-time to make use of information about the entire program when optimizing. This is all done in a very transparent way involving minimal changes to the build process.

One of the major benefits of WPO is the ability to inline functions across source modules. This greatly enhances the ability of the inliner to improve performance by inlining small functions across an entire executable. Additional benefits of WPO include the ability to more accurately track memory aliasing and register usage to eliminate stores and reloads around function calls.

The following example shows some of the improvements in generated code that can happen as a result of using WPO.

// File 1
extern void func (int *, int *);

int g, h;

   int i = 0;
   int j = 1;

   g = 5;
   h = 6;

   func(&i, &j);

   g = g + i;
   h = h + i;

   return 0;

// File 2

extern int g;
extern int h;

func(int *pi, int *pj)
   *pj = g;
   h = *pi;

When compiling this example without the /GL option, you will see code similar to the following produced for File 1:

sub   esp, 8
lea   eax, DWORD PTR _j$[esp+8]
push   eax
lea   ecx, DWORD PTR _i$[esp+12]
push   ecx
mov   DWORD PTR _i$[esp+16], 0
mov   DWORD PTR _j$[esp+16], 1
mov   DWORD PTR ?g@@3HA, 5
mov   DWORD PTR ?h@@3HA, 6
call   ?func@@YAXPAH0@Z
mov   eax, DWORD PTR _i$[esp+16]
mov   edx, DWORD PTR ?g@@3HA
mov   ecx, DWORD PTR ?h@@3HA
add   edx, eax
add   ecx, eax
mov   DWORD PTR ?g@@3HA, edx
mov   DWORD PTR ?h@@3HA, ecx
xor   eax, eax
add   esp, 16
ret   0

When compiling with /GL, you will see code similar to the following. Note that quite a few instructions have been deleted. It should also be noted that this manufactured example was compiled with /Ob0 (no inlining) because with inlining enabled, virtually the entire example gets optimized away.

sub   esp, 8
lea   ecx, DWORD PTR _j$[esp+8]
lea   edx, DWORD PTR _i$[esp+8]
mov   DWORD PTR _i$[esp+8], 0
mov   DWORD PTR ?g@@3HA, 5
mov   DWORD PTR ?h@@3HA, 6
call   ?func@@YAXPAH0@Z
mov   DWORD PTR ?g@@3HA, 5
xor   eax, eax
add   esp, 8
ret   0

In addition to adding the /GL option, the inliner in the optimizer was enhanced and improved in such a way that the default when compiling with /Ox, /O1, or /O2 is for the compiler to choose which functions are candidates for inlining. This is in contrast to previous versions of Visual C++ where by default only functions marked "inline" or "__inline" were considered. To obtain the old behavior, the /Ob1 option can be added after the primary optimization choice (/Ox, /O1, or /O2).

Best Practices for Optimizing

The Visual C++ compiler contains two primary optimization options, /O1 and /O2. The meaning of /O1 is "minimize size." It translates to turning on each of the following options individually:

  • /Og – Enable global optimization
  • /Os – Favor small code
  • /Oy – Frame-pointer omission
  • /Ob2 – Let the compiler choose inline candidates
  • /GF – Enable read-only string pooling
  • /Gy – Enable function-level linking

The /O2 option means "maximize speed" and is similar to the /O1 option with the exception that /Ot ("favor fast code") is specified rather than /Os, and /Oi ("expand intrinsics inline") is also enabled.

Generally speaking, small applications should be compiled with /O2, and large applications should be compiled with /O1 because very large applications can end up putting a lot of stress on the instruction cache of the processor, and this can lead to worse performance. To minimize this, use /O1 to reduce the amount of "code bloat" introduced by the optimizer from certain transformations such a loop unrolling or selection of larger, faster sequences of code.

Generally, the optimizations disabled by /O1 are beneficial for small sequences of code, but used without restriction across a large application they can increase code size by enough to be an overall loss due to the instruction cache issue.

After choosing the primary optimization option, it's always a good idea to profile code looking for hotspots, and then experiment with different optimization options on that code. In particular, if you choose to use /O1 as your primary optimization option for all files, it's useful to fine-tune hot functions so that they are compiled to "maximize speed."

The Visual C++ compiler supports an optimization pragma to allow fine-tuning of optimization control around specific functions.

For example, let's say you've compiled your application with /O1 and then profiling your application shows that function fiddle() is one of the "hot" functions of the program. You'll want to do something like:

#pragma optimize("t", on)
int fiddle(S *p)
#pragma optimize("", on)

This wraps the function with a pair of optimize pragmas, the first to change the optimization mode to "maximize speed" and the second to revert to the optimization mode specified on the command-line.

Although Visual C++ does not ship with a profiler, as of the time of this writing you can find the Compuware Corporation DevPartner Profiler Community Edition at:

In addition to the /O1 and /O2 options, there is the /Ox option which behaves similarly to /O2, and can be combined with /Os to get a result similar to /O1. It's recommended that you use /O1 and /O2 rather than using /Ox, as /O1 and /O2 provide more benefit.

Earlier in this article the /G7, /arch, and /GL compiler options were discussed.

In addition to these auxiliary optimization options, Visual C++ also provides:

  • /GA, which optimizes static thread-local storage (TLS) accesses for EXEs.
    Note   Do not use this when compiling code that will end up in a DLL.
  • /Gr, which enables the __fastcall calling convention by default, which means the first two parameters will be passed in registers (if they fit).

These two options provide additional benefits for optimized builds of your code — it's worth trying them out (where appropriate) and measuring the results you get.

One additional switch worth considering is the /opt:ref option that can be passed to the linker. This option enables the elimination of unreferenced functions and data as well as enabling /opt:icf which tries to combine identical functions (such as those you might get from the expansion of templates) to further reduce code size.

Enhanced Compiler Options in Visual C++ .NET

There are three important compiler options that most projects should use when building with Visual C++ .NET 2003. Each of these was also present in Visual C++ .NET 2002, although there have been enhancements made in the 2003 release of the product.

The following table describes the intended use for each of these compilers briefly. Refer to the Visual C++ documentation for further details.

Compiler Use
/RTC1 Used in debug (non-optimized) builds. Inserts run time checks to help you track down coding mistakes (such as using uninitialized memory, or mixing __stdcall and __cdecl calling conventions).
/GS Inserts checks to detect when buffer overruns occur that overwrite the return address of the current executing function. Attempts to thwart the "hijacking" of the return address by malicious code.
Note   This is not a magic bullet or substitute for good coding and good testing. You should audit your code to try to ensure that buffer overruns cannot occur.
/Wp64 Detects 64-bit portability problems present in your code. Using this compiler option now will allow for easier transition at a later time to a 64-bit system.


Visual C++ .NET 2003 contains two new performance-enhancing optimization options and improvements to several of the optimizations that were shipped as part of Visual C++ .NET 2002. Understanding these, as well as three compiler options that should probably be used on every project compiled by Visual C++, will help you optimize builds of your code.