Advanced Basics

Scaling Up: The Very Busy Background Compiler

Matthew Gertz

Contents

Rich Information Leads to Rich Features
Scaling Up the Background Compiler
Performance Tips for Earlier Versions
Going Forward

One of the features that distinguishes Visual Basic® from the other languages in Visual Studio® is its use of a background compiler (BC). The BC runs from the moment you start up a Visual Basic project until the time you close the last one down. Whenever you type in a line of code and commit it, the BC picks up the change and compiles it, adding the resulting information to the internal representation of the application you're building. The BC reacts to external changes as well, keeping its compiled state up-to-date as you add or remove project references, change project-level settings (such as Option Strict or Option Explicit), or check out a more recent copy of a project file from source control.

The most obvious benefits of the BC are that you can launch into debugging or build an executable for distribution quite quickly. Because the application is already "built," all that remains is for you to load it into the debugger or drop it to disk. But the BC also has a much broader impact on your development process in Visual Studio that is not so obvious—nearly all of the Visual Basic IDE features in Visual Studio .NET 2002 and Visual Studio .NET 2003 rely on the BC for information. Without the BC, these features would either not function at all or would be seriously diminished.

This added functionality does not come without costs, of course. Since the Visual Basic UI features generally need information from the compiler in order to function, the UI will occasionally block on background compilation in order to wait for a complete representation of the symbols. In general this delay is not noticeable, but for large solutions the delay can be quite apparent, especially when working with large blocks of code all at once. This trade-off between functionality and performance has been a focus of the Visual Studio development team. Several of the improvements resulting from their efforts are found in Visual Studio 2005.

In this column, I'll discuss how the BC is used by the various Visual Basic Editor features, what Microsoft has done to improve their performance, and steps you can take to minimize the impact of the BC in your development projects.

Rich Information Leads to Rich Features

The reason that the Visual Basic Editor features can provide you with such rich information is that they all have access to the internal compiler symbols generated by the BC. For example, IntelliSense® can distinguish between two similarly named variables in different or nested scopes, regardless of whether or not they exist in the current project or in some other part of the solution, because the compiler makes it very clear which one is which. Without that information, IntelliSense would have to parse text and make guesses about the type to which a particular variable binds.

Different editor features require different levels of information from the background compiler. Because of this, the BC exposes methods by which a given feature can wait for a specific state to be reached before proceeding, without having to wait for a complete compilation. Figure 1 lists the different states of the compiler.

Figure 1 Compiler States

State Description
No State Essentially, no compilation information is available at this time.
Declared Everything is parsed and it is clear which code is a keyword and which code is a variable. The boundaries between blocks of code are also clearly delineated. Although it is clear that something is a variable, its type is not understood yet, nor is it known whether it's the same as some similarly named variable.
Bound In this state it is clear what the variables mean in the context of other variables and references.
Types Emitted A more esoteric state, but the upshot is that now a variety of metadata about the to-be-compiled app can be made available. (From a UI point of view, this is generally not an important state.)
Compiled All of the intermediate language is generated and ready to go.

For example, pretty-listing, auto-complete, and smart indentation simply require the parsing to be done in order to fully understand what the code looks like, and so they only require information that is generated during the transition to the Declared state (and don't even need to be fully at Declared to operate correctly). IntelliSense always requires "bound" information because it needs to fully understand what the various types mean. However, IntelliSense works on whatever information was generated during the last "commit" (carriage return, arrow down, paste, focus change, and so on), and so IntelliSense is not blocked from operating while typing. Error generation can happen in all of the states, depending on the type of error, and is a consequence of moving to a different state unless compilation is changed or restarted for any reason. But the listing of errors to the task list or error list is specifically coded to wait for keystrokes to end if the user continues typing (since you would just be recompiling anyway on the next commit), so again, there's no blockage. Figure 2 provides some examples of features and the compiler states that they rely on.

Figure 2 Some Compiler Features

Figure 2** Some Compiler Features **

To improve performance and functionality, the members of the compiler team sometimes modify existing features to use information from the BC. In Visual Studio .NET 2003, dropdowns, the object browser, and the class view began using the BC information. Previously, in Visual Studio .NET 2002, these features were supported by a slow XML-based library manager that was essentially duplicating what the compiler was doing (with much less accuracy). In switching over to compiler symbols, the performance and accuracy of those features dramatically improved. Based on this success, the team (the Visual Basic Compiler, Editor, and Debugger team) also changed the code model engine to use the BC information in Visual Studio 2005 and this has resulted in substantial savings for the new class designer and various forms and macros. All of these features now require "bound" information, since they need to know the precise identity of an object in order to function correctly.

New features are also being developed to take advantage of the information in the BC. Error correction is a brand new feature for Visual Basic 2005 that uses compiler information to suggest fixes to the user. Although errors can happen at any state, the current set of error corrections relate to errors generated during the Declared and Bound compilations. Therefore, error correction is dependent on those states—first to recognize that there are indeed errors, and then to determine which fixes to suggest. Edit and Continue, which is being reintroduced in Visual Basic 2005, relies heavily on compiler information to detect which edits would require a reset of the debugger, as well to generate the actual code to be injected and verify that it creates no errors before committing it. The Find Symbol and Rename Symbol features, which are also new in Visual Basic 2005 rely on Bound state information to correctly identify symbols.

Scaling Up the Background Compiler

One of the biggest changes the team made involves how we handle decompilation—that is, resetting the compiler to an earlier state based on some change that the customer has made. Decompilation, followed by recompilation, is required to some extent anytime a change is made to the project, but sometimes it can be taken too far. For example, deleting a resource file should not take your project down to No State—the information collected through Bound state is still going to be valid. Although decompiling to No State on a change is always totally safe, too much decompilation can severely impact the performance of the compiler. Visual Studio .NET 2002 sometimes does too much decompilation, and therefore runs into this problem. In order to address it, the team made major modifications all across our architecture to be more efficient about when we decompile, and how far we need to go when we do. For example, sometimes we decompile to No State in Visual Studio .NET 2002 when you add another carriage return after an End Sub, under the assumption that you've modified the procedure (since the change was on the boundary of the procedure and may have redefined it). That issue, along with many others, was fixed in Visual Studio .NET 2003, so when making sweeping changes in large solutions, the response is much faster.

Many more changes of this sort have been made for Visual Studio 2005 along with more streamlined handling of references to reduce startup and operation time.

Performance Tips for Earlier Versions

Now you may be wondering how to improve the performance of your Visual Basic .NET 2002 and Visual Basic .NET 2003 development sessions. While you can't turn off the background compiler there are other steps you can take to improve performance.

Probably the most important step is to avoid complicated references. For any set of projects in a solution that reference each other, the BC needs to wait until all projects are at the same state before moving onto the next one. Thus, if project A references project B, which references C, which references D, and changes are made to D, then A, B, and C will all need to be decompiled to some level in order to incorporate the new information from D. More complicated dependency graphs cause even further delays.

The way you reference code in other assemblies is also important. Visual Basic supports two kinds of references (excluding Web references, which are another issue entirely): file references, which are simply references to assemblies on disk, and project-to-project references, which are references to another project within the same solution. In Visual Studio .NET 2002 and 2003, both types of references can be added by right-clicking on the References node in the Solution Explorer and selecting Add Reference. This will bring up the Add Reference dialog which has tabs for .NET file references, COM file references, and project references (see Figure 3).

Figure 3 Add Reference Dialog

Figure 3** Add Reference Dialog **

You should use project-to-project references only when you are developing both the application's project and its referenced project at the same time. If, however, you have already finalized a referenced assembly, then a file reference to it would be a more appropriate reference type to use, since you'd have no need for the BC to even be thinking about recompiling the reference—in fact, there's no need for the source of the referenced assembly to be in the solution at all. (Note that, in Visual Studio 2005 the BC is much more clever about handling project-to-project references, but as a rule it's going to be faster when you don't load another project unnecessarily.)

I don't like to recommend that people turn off features to improve performance (and my team is working hard to ensure that that's never necessary), but in large-scale applications you can turn off whatever features you don't need to use at that moment. For example, you may not need the pretty-lister when you quickly need to modify large blocks of code (pasting a large chunk of code into it or modifying large if blocks, for instance). Most of these features don't require the BC to even be at Declared state, so technically this is not a BC issue, but these other operations do take some processing time. The various editing options can be controlled from Tools | Options. IntelliSense is controlled from the General page (where it is called Statement Completion, as shown in Figure 4).

Figure 4 IntelliSense Options

Figure 4** IntelliSense Options **

Smart indentation (the only indentation choice that relies on Declared information) can be changed from the Tabs page, shown in Figure 5. Pretty-listing and other code-formatting options can be controlled from the Visual Basic page, shown in Figure 6.

Figure 5 Smart Indenting in Tabs Page in Tools

Figure 5** Smart Indenting in Tabs Page in Tools **

Avoiding situations that result in decompilation is also a big time-saver. Decompilation in Visual Basic happens per-file, which generally translates to per-class for most people, and so the more referenced code that you have between your various classes and structures in different files, the more likely it is that decompilation will occur. (Note that Visual Studio 2005 will support partial classes—that is, classes defined across multiple files—and in such cases all relevant files may need to decompile to update a given class.) A general rule of thumb is to organize classes and structures so that the info requests are all one way—when class A makes a request on class B, class B shouldn't have to call back into class A to complete its job (which would create a dependency on the compilation state of A's code).

Figure 6 Visual Basic Editing Features

Figure 6** Visual Basic Editing Features **

The factoring of your code is another important consideration for editor performance, as certain features (such as the pretty-lister) are geared towards well-factored code. Adding or removing an if construct that is 5,000 lines of code is definitely going to slow down the pretty-lister, as the indentations and scopes for the altered block of code must be recalculated and relisted in the editor. (The compiler team actually disabled pretty-listing at about 8000 affected lines for this reason, even if the pretty-listing option was turned on.) Well-factored code will always respond better than poorly factored code, since there will be less affected code in any block when listing back (especially important in those rare cases when the pretty-lister needs "bound" information).

Going Forward

High-performance code is a journey for my team, not a destination, and although we've made great strides in improving the performance of the BC and the features affected by it, both in Visual Studio .NET 2003 and in the upcoming Visual Studio 2005 release, we are continuing to research and implement improvements in our codebase. The goal is to make the performance implications of background compilation so minimal that users don't even realize it's happening. Ideas we're exploring include delay-starting of the BC (to improve load time for solutions), tightening up decompilation so that even less code needs to be reset on any given change, and better caching of information that is likely not to change during compilation/decompilation (removing the need to block on state in certain cases).

Send your questions and comments to  basics@microsoft.com.

Matthew Gertz has worked at Microsoft for over 10 years as a developer on ActiveX Control Pad, Visual InterDev, Visual Studio Enterprise Edition, and Visual Basic. He is currently the Development Lead for the Microsoft Visual Basic Compiler, Editor, and Debugger team.