This article discusses the prerelease version of the Microsoft .NET Framework 4.5. All related information is subject to change.
On the Microsoft .NET Framework team, we’ve always understood that improving performance is at least as valuable to developers as adding new runtime features and library APIs. The .NET Framework 4.5 includes significant investments in performance, benefiting all application scenarios. In addition, because .NET 4.5 is an update to .NET 4, even your .NET 4 applications can enjoy many of the performance improvements to existing .NET 4 features.
When it comes to enabling developers to deliver satisfying application experiences, startup time (see msdn.microsoft.com/magazine/cc337892), memory usage (see msdn.microsoft.com/magazine/dd882521), throughput and responsiveness really matter. We set goals on improving these metrics for the different application scenarios, and then we design changes to meet or exceed them. In this article, I’ll provide a high-level overview of some of the key performance improvements we made in the .NET Framework 4.5.
In this release, we focused on: exploiting multiple processor cores to improve performance, reducing latency in the garbage collector and improving the code quality of native images. Following are some of the key performance improvement features.
Multicore Just-in-Time (JIT) We continually monitor low-level hardware advancements and work with chip vendors to achieve the best hardware-assisted performance. In particular, we’ve had multicore chips in our performance labs since they were available and have made appropriate changes to exploit that particular hardware change; however, those changes benefited very few customers at first.
At this point, nearly every PC has at least two cores, such that new features that require more than one core are immediately broadly useful. Early in the development of .NET 4.5, we set out to determine if it was reasonable to use multiple processor cores to share the task of JIT compilation—specifically as part of application startup—to speed up the overall experience. As part of that investigation, we discovered that enough managed apps have a minimum threshold number of JIT-compiled methods to make the investment worthwhile.
The feature works by JIT compiling methods likely to be executed on a background thread, which on a multicore machine will run on another core, in parallel. In the ideal case, the second core quickly gets ahead of the mainline execution of the application, so most methods are JIT compiled by the time they’re needed. In order to know which methods to compile, the feature generates profile data that keeps track of the methods that are executed and then is guided by this profile data on a later run. This requirement of generating profile data is the primary way in which you interact with this feature.
With a minimal addition of code, you can use this feature of the runtime to significantly improve the startup times of both client applications and Web sites. In particular, you need to make straightforward calls to two static methods on the ProfileOptimization class, in the System.Runtime namespace. See the MSDN documentation for more information. Note that this feature is enabled by default for ASP.NET 4.5 applications and Silverlight 5 applications.
Optimized Native Images For several releases, we have enabled you to precompile code to native images via a tool called Native Image Generation (NGen). The consequent native images typically result in significantly faster application startup than is seen with JIT compilation. In this release we introduced a supplemental tool called Managed Profile Guided Optimization (MPGO), which optimizes the layout of native images for even greater performance. MPGO uses a profile-guided optimization technology, very similar in concept to multicore JIT described earlier. The profile data for the app includes a representative scenario or set of scenarios, which can be used to reorder the layout of a native image such that methods and other data structures needed at startup are colocated densely within one part of the native image, resulting in shorter startup time and less working set (an application’s memory usage). In our own tests and experience, we typically see a benefit from MPGO with larger managed applications (for example, large interactive GUI applications), and we recommend its use largely along those lines.
The MPGO tool generates profile data for an intermediate language (IL) DLL and adds the profile as a resource to the IL DLL. The NGen tool is used to precompile IL DLLs after profiling, and it performs additional optimization due to the presence of the profile data. Figure 1 visualizes the process flow.
Figure 1 Process Flow with the MPGO Tool
Large Object Heap (LOH) Allocator Many .NET developers requested a solution for the LOH fragmentation issue or a way to force compaction of the LOH. You can read more about how the LOH works in the June 2008 CLR Inside Out column by Maoni Stephens at msdn.microsoft.com/magazine/cc534993. To summarize, any object of size 85,000 bytes or more gets allocated on the LOH. Currently, the LOH isn’t compacted. Compacting the LOH would consume a lot of time because the garbage collector would have to move large objects and, hence, is an expensive proposition. When objects on the LOH get collected, they leave free spaces in between objects that survive the collection, which leads to the fragmentation issue.
To explain a bit more, the CLR makes a free list out of the dead objects, allowing them to be reused later to satisfy large object allocation requests; adjacent dead objects are made into one free object. Eventually a program can end up in a situation where these free memory fragments between live large objects aren’t large enough to make further object allocations in the LOH, and because compaction isn’t an option, we quickly run into problems. This leads to applications becoming unresponsive and eventually leads to out-of-memory exceptions.
In .NET 4.5 we’ve made some changes to make efficient use of memory fragments in the LOH, particularly concerning the way we manage the free list. The changes apply to both workstation and server garbage collection (GC). Please note that this doesn’t change the 85,000-byte limit for LOH objects.
Background GC for Server In .NET 4 we enabled background GC for workstation GC. Since that time, we’ve seen a greater frequency of machines with upper-end heap sizes that range from a few gigabytes to tens of gigabytes. Even an optimized parallel collector such as ours can take seconds to collect such large heaps, thus blocking application threads for seconds. Background GC for server introduces support for concurrent collections to our server collector. It minimizes long blocking collections while continuing to maintain high application throughput.
If you’re using server GC, you don’t need to do anything to take advantage of this new feature; the server background GC will automatically happen. The high-level background GC characteristics are the same for client and server GC:
A new asynchronous programming model was introduced as part of the Visual Studio Async CTP and is now an important part of .NET 4.5. These new language features in .NET 4.5 enable you to productively write asynchronous code. Two new language keywords in C# and Visual Basic called “async” and “await” enable this new model. .NET 4.5 has also been updated to support asynchronous applications that use these new keywords.
The Visual Studio Asynchronous Programming portal on MSDN (msdn.microsoft.com/vstudio/async) is a great resource for samples, white papers and talks on the new language features and support.
A number of improvements were made to the parallel computing libraries (PCLs) in .NET 4.5 to enhance the existing APIs.
Faster Lighter-Weight Tasks The System.Threading.Tasks.Task and Task<TResult> classes were optimized to use less memory and execute faster in key scenarios. Specifically, cases related to creating Tasks and scheduling continuations saw performance improvements of up to 60 percent.
More PLINQ Queries Execute in Parallel PLINQ falls back to sequential execution when it thinks that it would do more harm (make things slower) by parallelizing a query. These decisions are educated guesses and not always perfect, and in .NET 4.5, PLINQ will recognize more classes of queries that it can successfully parallelize.
Faster Concurrent Collections A number of tweaks were made to System.Collections.Concurrent.ConcurrentDictionary<TKey, TValue> to make it faster for certain scenarios.
For more details on these changes, check out the Parallel Computing Platform team blog at blogs.msdn.com/b/pfxteam.
Null Bit Compression Row Support Null data is especially common for customers leveraging the SQL Server 2008 sparse columns feature. Customers leveraging the sparse columns feature can potentially produce result sets that contain a large number of null columns. For this scenario, row null-bit-compression (SQLNBCROW token or, simply, NBCROW) was introduced. This reduces the space used by the result set rows sent from the server with large numbers of columns by compressing multiple columns with NULL values into a bit mask. This significantly helps the tabular data stream (TDS) protocol’s compression of data where there are many nulled columns in the data.
Auto-Compiled LINQ Queries When you write a LINQ to Entities query today, the Entity Framework walks over the expression tree generated by the C#/Visual Basic compiler and translates (or compiles) that into SQL, as shown in Figure 2.
Figure 2 A LINQ to Entities Query Translated into SQL
Compiling the expression tree into SQL involves some overhead, though, particularly for more complex queries. In previous versions of the Entity Framework, if you wanted to avoid having to pay this performance penalty every time a LINQ query was executed, you had to use the CompiledQuery class.
This new version of the Entity Framework supports a new feature called Auto-Compiled LINQ Queries. Now every LINQ to Entities query that you execute automatically gets compiled and placed in the Entity Framework query plan cache. Each additional time you run the query, the Entity Framework will find it in its query cache and won’t have to go through the whole compilation process again. You can read more about it at bit.ly/iCaM2b.
The Windows Communication Foundation (WCF) and Windows Workflow Foundation (WF) team has also done a bunch of performance improvements in this release, such as:
You can find more details on the Workflow Team blog at blogs.msdn.com/b/workflowteam and in the MSDN Library article at bit.ly/n5VCtU.
Improving site density (also defined as “per-site memory consumption”) and cold startup time of sites in the case of shared hosting have been two key performance goals for the ASP.NET team for .NET 4.5.
In shared hosting scenarios, many sites share the same machine. In such environments traffic is usually low. Data provided by some hosting companies shows that most of the time the request per second is below 1 rps, with occasional peaks of 2 rps or more. This means that many worker processes will probably die when they’re idle for a long time (20 minutes by default in IIS 7 and later). Thus startup time becomes very important. In ASP.NET, that’s the time it takes a Web site to receive a request and respond to it, when the worker process was down versus when the Web site was already compiled.
We implemented several features in this release to improve startup time for the shared hosting scenarios. The features used are:
<!-- ... -->
<compilation profileGuidedOptimizations="None" />
<!-- ... -->
sc config sysmain start=auto
reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\PrefetchParameters" /v EnablePrefetcher /t REG_DWORD /d 2 /f
reg add "HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Prefetcher" /v MaxPrefetchFiles /t REG_DWORD /d 8192 /f
net start sysmain
<!-- ... -->
<!-- ... -->
<!-- ... -->
<!-- ... -->
More details on ASP.NET performance improvements can be found in the “Getting Started with the Next Version of ASP.NET” white paper at bit.ly/A66I7R.
The list presented here isn’t exhaustive. There are more minor changes to improve performance that were omitted to keep the scope of this article limited to the major features. Apart from this, the .NET Framework performance teams have also been busy working on performance improvements specific to Windows 8 Managed Metro-style applications. Once you download and try the .NET Framework 4.5 and Visual Studio 11 beta for Windows 8, please let us know if you have any feedback or suggestions for the releases ahead.
Shared Hosting: Also known as “shared Web hosting,” high-density Web hosting enables hundreds—if not thousands—of Web sites to run on the same server. By sharing hardware costs, each site can be maintained for a lower cost. This technique has considerably lowered the barrier to entry for Web site owners.
Cold Startup: Cold startup is the time taken to launch when your application wasn’t already present in memory. You can experience cold startup by starting an application after a system reboot. For large applications, cold startup can take several seconds because the required pages (code, static data, registry and so on) aren’t present in memory and expensive disk accesses are required to bring the pages into memory.
Warm Startup: Warm startup is the time taken to start an application that’s already present in memory. For example, if an application was launched a few seconds earlier, it’s likely that most of the pages are already loaded in memory and the OS will reuse them, saving expensive disk-access time. This is why an application is much faster to start up the second time you run it (or why a second .NET app starts faster than the first, because parts of .NET will already be loaded in memory).
Native Image Generation, or NGen: Refers to the process of precompiling Intermediate Language (IL) executables into machine code prior to execution time. This results in two primary performance benefits. First, it reduces application startup time by avoiding the need to compile code at run time. Second, it improves memory usage by allowing for code pages to be shared across multiple processes. There’s also a tool, NGen.exe, that creates native images and installs them into the Native Image Cache (NIC) on the local computer. The runtime loads native images when they’re available.
Profile Guided Optimization: Profile guided optimization has been proven to enhance the startup and execution times of native and managed applications. Windows provides the toolset and infrastructure to perform profile guided optimization for native assemblies, whereas the CLR provides the toolset and infrastructure to perform profile guided optimizations for managed assemblies (called Managed Profile Guided Optimization, or MPGO). These technologies are used by many teams within Microsoft to improve the performance of their applications. For example, the CLR performs profile guided optimization of native assemblies (C++ profile guided optimization) and managed assemblies (using MPGO).
Garbage Collector: The .NET runtime supports automatic memory management. It tracks every memory allocation made by the managed program and periodically calls a garbage collector that finds memory no longer in use and reuses it for new allocations. An important optimization the garbage collector performs is that it doesn’t search the whole heap every time, but rather partitions the heap into three generations (generation 0, generation 1 and generation 2). For more information on the garbage collector, please read the June 2009 CLR Inside Out column at msdn.microsoft.com/magazine/dd882521.
Compacting: In the context of garbage collection, when the heap reaches a state where it’s sufficiently fragmented, the garbage collector compacts the heap by moving live objects close to each other. The primary goal of compacting the heap is to make larger blocks of memory available in which to allocate more objects.
Ashwin Kamath is a program manager on the CLR team of .NET and drove the performance and reliability features for the .NET Framework 4.5. He’s currently working on diagnostics features for the Windows Phone development platform.
Thanks to the following technical experts for reviewing this article: Surupa Biswas, Eric Dettinger, Wenlong Dong, Layla Driscoll, Dave Hiniker, Piyush Joshi, Ashok Kamath, Richard Lander, Vance Morrison, Subramanian Ramaswamy, Jose Reyes, Danny Shih and Bill Wert
Nice article Ashwin. Could you also share(Or a link) to the details about the LOH free space optimizations ?
Hello Ashwin, Can you do these changes in context of SharePoint 2007, 2010 and get the performance improvements ?
Here is another article about MPGO: http://blogs.msdn.com/b/dotnet/archive/2012/03/20/improving-launch-per...
Any documentation with example on how to use MGPO/Multicore JIT?
Here is more information on native image generation that also talks about Automatic Ngen: http://msdn.microsoft.com/en-us/library/hh691758(v=vs.110).aspx
@James079, I will pass on your request for more info to the Ngen team. Yes, Automatic Ngen is restricted to .NET 4.5 for Windows 8 only.
Why is the Multicore Just-In-Time feature not activated by default in .NET 4.5? Why do we need to call methods on the ProfileOptimization class to achieve this?
Thanks for the information - curious to know more about LOH fragmentation issue though, so essentially what we are saying is that with the changes done around LOH in .net 4.5, there will be less probable chances of going out of memory? The design issue though still exists, right?
Where can I find more info about automatic ngen ? Will it be restricted to Windows 8 ? Thanks
Good Article. Thanks Ashwin for the summary of performance improvements. --Austin.
I'm still waiting for auto property initialization ... http://connect.microsoft.com/VisualStudio/feedback/details/361647/add-initialization-to-automatic-properties-in-c
Great article. Thanks!
Automatic Ngen is present for .NET 4.5 on Windows 8 Consumer Preview. Thanks Ashwin Kamath
I seem to remember something about automatic ngen in one of the build talks. Since you didn't mention it in the article, does that mean it will not be included in .Net 4.5?
More MSDN Magazine Blog entries >
Browse All MSDN Magazines
Subscribe to MSDN Flash newsletter
Receive the MSDN Flash e-mail newsletter every other week, with news and information personalized to your interests and areas of focus.