|Important||This document may not represent best practices for current development, links to downloads and other resources may no longer be valid. Current recommended version can be found here.|
Profile-guided optimization lets you optimize an output file, where the optimizer uses data from test runs of the .exe or .dll file. The data represents how the program is likely to perform in a production environment.
Profile-guided optimizations are only available for native targets: x86, Itanium Processor Family (IPF), or x64. Profile-guided optimizations are not available for output files that will run on the common language runtime. Even if you produce an assembly with mixed native and managed code (compile with /clr), you cannot use profile-guided optimization on just the native code.
Information that is gathered from profiling test runs will override optimizations that would otherwise be in effect if you specify /Ob, /Os, or /Ot. For more information, see /Ob (Inline Function Expansion) and /Os, /Ot (Favor Small Code, Favor Fast Code).
The following is an overview of the process of using profile-guided optimizations:
Compile one or more source code files with /GL.
Each module built with /GL can be examined during profile-guided optimization test runs to capture run-time behavior. Every module in a profile-guided optimization build does not have to be compiled with /GL. However, only those modules compiled with /GL will be instrumented and later available for profile-guided optimizations.
Link with /LTCG:PGINSTRUMENT.
/LTCG:PGINSTRUMENT creates an empty .pgd file. After test-run data is added to the .pgd file, it can be used as input to the next link step (creating the optimized image). When specifying /LTCG:PGINSTRUMENT, you can optionally specify /PGD with a nondefault name or location for the .pgd file.
Profile the application.
Each time a profiled EXE session ends or a profiled DLL is unloaded, a appname!#.pgc file is created. A .pgc file contains information about a particular application test run. # is a number starting with 1 that is incremented based on the number of other appname!#.pgc files in the directory. You can delete a .pgc file if the test run does not represent a scenario you want to optimize.
During a test run, you can force closure of the currently open .pgc file and the creation of a new .pgc file with the pgosweep utility (for example, when the end of a test scenario does not coincide with application shutdown).
Link with /LTCG:PGOPTIMIZE.
/LTCG:PGOPTIMIZE creates the optimized image. This step takes as input the .pgd file. For more information, see /LTCG:PGOPTIMIZE.
It is even possible to create the optimized output file and later determine that additional profiling would be useful to create a more optimized image. If the instrumented image and its .pgd file are available, you can do additional test runs and rebuilt the optimized image with the newer .pgd file.
The following is a list of the profile-guided optimizations:
Inlining – For example, if there exists a function A that frequently calls function B, and function B is relatively small, then profile-guided optimizations will inline function B in function A.
Virtual Call Speculation – If a virtual call, or other call through a function pointer, frequently targets a certain function, a profile-guided optimization can insert a conditionally-executed direct call to the frequently-targeted function, and the direct call can be inlined.
Register Allocation – Optimizing with profile data results in better register allocation.
Basic Block Optimization – Basic block optimization allows commonly executed basic blocks that temporally execute within a given frame to be placed in the same set of pages (locality). This minimizes the number of pages used, thus minimizing memory overhead.
Size/Speed Optimization – Functions where the program spends a lot of time can be optimized for speed.
Function Layout – Based on the call graph and profiled caller/callee behavior, functions that tend to be along the same execution path are placed in the same section.
Conditional Branch Optimization – With the value probes, profile-guided optimizations can find if a given value in a switch statement is used more often than other values. This value can then be pulled out of the switch statement. The same can be done with if/else instructions where the optimizer can order the if/else so that either the if or else block is placed first depending on which block is more frequently true.
Dead Code Separation – Code that is not called during profiling is moved to a special section that is appended to the end of the set of sections. This effectively keeps this section out of the often-used pages.
EH Code Separation – The EH code, being exceptionally executed, can often be moved to a separate section when profile-guided optimizations can determine that the exceptions occur only on exceptional conditions.
Memory Intrinsics – The expansion of intrinsics can be decided better if it can be determined if an intrinsic is called frequently. An intrinsic can also be optimized based on the block size of moves or copies.
For more information, see Walkthrough: Using Profile-Guided Optimizations.