Export (0) Print
Expand All
Abortable Thread Pool
The Analytic Hierarchy Process
API Test Automation in .NET
Asynchronous HttpWebRequests, Interface Implementation, and More
Bad Code? FxCop to the Rescue
Basics of .NET Internationalization
Behind the Scenes: Discover the Design Patterns You're Already Using in the .NET Framework
BigInteger, GetFiles, and More
Binary Serialization of DataSets
Building Voice User Interfaces
Can't Commit?: Volatile Resource Managers in .NET Bring Transactions to the Common Type
CLR Inside Out: Base Class Library Performance Tips and Tricks
CLR Inside Out: Ensuring .NET Framework 2.0 Compatibility
CLR Inside Out: Extending System.Diagnostics
CLR Profiler: No Code Can Hide from the Profiling API in the .NET Framework 2.0
Concurrent Affairs: Build a Richer Thread Synchronization Lock
Custom Cultures: Extend Your Code's Global Reach With New Features In The .NET Framework 2.0
Cutting Edge: Collections and Data Binding
Const in C#, Exception Filters, IWin32Window, and More
Creating a Custom Metrics Tool
DataGridView
DataSets vs. Collections
Determining .NET Assembly and Method References
Experimenting with F#
File Copy Progress, Custom Thread Pools
Finalizers, Assembly Names, MethodInfo, and More
Got Directory Services?: New Ways to Manage Active Directory using the .NET Framework 2.0
High Availability: Keep Your Code Running with the Reliability Features of the .NET Framework
How Microsoft Uses Reflection
ICustomTypeDescriptor, Part 2
ICustomTypeDescriptor, Part 1
Iterating NTFS Streams
JIT and Run: Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects
Lightweight UI Test Automation with .NET
Low-Level UI Test Automation
Make Your Apps Fly with the New Enterprise Performance Tool
Managed Spy: Deliver The Power Of Spy++ To Windows Forms With Our New Tool
Memory Models: Understand the Impact of Low-Lock Techniques in Multithreaded Apps
Microsoft Java Virtual Machine Update
Microsoft .NET Framework Delivers the Platform for an Integrated, Service-Oriented Web, Part 2
Mini Dump Snapshots and the New SOS
Mutant Power: Create A Simple Mutation Testing System With The .NET Framework
NamedGZipStream, Covariance and Contravariance
.NET Internationalization Utilities
.NET Profiling: Write Profilers With Ease Using High-Level Wrapper Classes
No More Hangs: Advanced Techniques To Avoid And Detect Deadlocks In .NET Apps
The Perfect Host: Create and Host Custom Designers with the .NET Framework 2.0
Phoenix Rising
Scheme Is Love
Security Enhancements in the .NET Framework 2.0
Sepia Tone, StringLogicalComparer, and More
Software Testing Paradoxes
Stay Alert: Use Managed Code To Generate A Secure Audit Trail
Stream Decorator, Single-Instance Apps
StringStream, Methods with Timeouts
SUPERASSERT Goes .NET
Tailor Your Application by Building a Custom Forms Designer with .NET
Test Harness Design Patterns
ThreadPoolPriority, and MethodImplAttribute
ThreadPoolWait and HandleLeakTracker
Three Vital FXCop Rules
A Tidal Wave of Change
To Confirm is Useless, to Undo Divine
Touch All the Bases: Give Your .NET App Brains and Brawn with the Intelligence of Neural Networks
Transactions for Memory
Trustworthy Software
Tune in to Channel 9
UDP Delivers: Take Total Control Of Your Networking With .NET and UDP
UI on the Fly: Use the .NET Framework to Generate and Execute Custom Controls at Run Time
Unexpected Errors in Managed Applications
Unhandled Exceptions and Tracing in the .NET Framework 2.0
Using Combinations to Improve Your Software Test Case Generation
Wandering Code: Write Mobile Agents In .NET To Roam And Interact On Your Network
What Makes Good Code Good?
XML Comments, Late-bound COM, and More
Expand Minimize

Boosting the Performance of the Microsoft .NET Framework

 

Allan McNaughton
Intel Corporation

August 2002

Summary: Three technologies that directly improve Microsoft .NET Framework application performance on the Intel architecture are profiled: Just-In-Time compilation, garbage collection, and Hyper-Threading technology. (4 printed pages)

With the release of its new .NET Framework, Microsoft® introduced the common language runtime, a platform abstraction layer that provides an optimized environment for application execution.

The runtime is the execution engine for the Microsoft .NET Framework and is responsible for compiling managed code, verifying type safety, managing the thread pool, executing code, and handling memory management issues, among other things. The functionality embodied in this layer hides substantial complexity and allows the developer to focus on solving the task at hand. It is this layer that is aware of the target Intel® processor version and can best take advantage of the processor’s unique characteristics.

We will look at three technologies that directly improve .NET Framework application performance on the Intel architecture: Just-In-Time compilation, garbage collection, and Hyper-Threading technology.

Just-In-Time Means Run-It-Fast

A key component of the runtime that results in fast execution of .NET Framework applications is the Microsoft Just-In-Time compiler (JIT). The JIT takes common MSIL (Microsoft Intermediate Language) code that is produced by Microsoft Visual Studio® .NET, the Microsoft .NET Framework SDK, or other tools that target the .NET Framework, and uses it to generate native x86 instructions. Intel processors have evolved rapidly from the early Pentium® processor design to the latest multiprocessor-based systems using the Intel Xeon™ processor. When the JIT encounters a genuine Intel processor, the code produced takes advantage of Intel NetBurst™ architecture innovations and hyper-threading technology. The JIT compiler version 1.1, due to ship later this year, will also take advantage of Streaming SIMD Extensions 2 (SSE2).

As the JIT is activated at run time, it is able to utilize information that isn't available to a regular compiler. This information can be used to provide further performance enhancements by allowing for increasingly sophisticated optimizations. The runtime effect of these additional JIT-only optimizations can be quite dramatic. The JIT can dynamically remove levels of indirection and inline frequently called methods, as it knows what code paths are actually executing. These optimizations result in a more responsive .NET application due to fewer cache misses and reduced branching. It is possible for all this to be accomplished without a perceptible delay in application execution due to the raw speed of the latest Xeon and Pentium 4 processors and the Mobile Intel Pentium 4 processor - M. (These latest processors currently run with clock rates up to 2.4 GHz, and it's reasonable to expect that they'll go even faster in the future.)

Intelligent Garbage Collection

The .NET Framework is unique in its ability to support code running in both managed and native (unmanaged) models (the latter being true in the case of C++). While both managed and native code can execute in the runtime environment, only managed code contains the information that allows the runtime to guarantee safe execution. (In essence, unmanaged code was written before the .NET Framework; in this case, it is code written using the Microsoft Foundation Classes. Managed code targets the .NET Framework specifically.) By writing code for the managed model, a developer is freed from the error-prone task of managing application memory. A common misconception is that code written to rely on garbage collection will be slower than native C++ code with explicit object allocation and deallocation. This may be the case with other implementations of garbage collection, but not Microsoft's; it is designed to incorporate information about processor cache size, dynamic workload, object locality, and multiprocessor configuration.

The runtime garbage collection implementation uses a generational, mark-and-compact collector algorithm that allows for excellent performance. It assumes that the newly created objects are smaller and tend to be accessed often, whereas older objects are larger and accessed less frequently. The algorithm divides the heap allocation graph into several sub-graphs, called generations, which allow it to spend as little time as possible collecting. Generation 0 contains young, frequently accessed objects. Generations 1 and 2 are for larger, older objects that can be collected less often.

This design improves utilization of the processor cache over what is possible using native memory management. A processor such as the Intel Xeon processor has up to 512KB of L2 Advanced Transfer Cache that resides on-chip and is connected to the core via a 256-bit (32-byte) interface that transfers data to the core at rates far greater than any system can replenish the cache. The garbage collector resolves the L2 cache size and optimizes the Generation 0 heap size accordingly. This optimization greatly increases the probability that the referenced object will reside in cache and can be accessed quickly by the processor core.

In traditional malloc libraries memory is placed at the address of the last freed block of sufficient size. This behavior results in poor locality as the objects needed for a method can be dispersed anywhere in the address space. The improved runtime behavior ensures objects are ordered by time of allocation in a contiguous block of memory. This design results in excellent locality and fewer cache misses as objects that are referenced within a method have a high probability of residing on the same page. Even with the Pentium 4 and Xeon processors' advanced prefetch capability, the cost to bring in an object from off-chip memory is high. These extremely fast processors benefit greatly from keeping the pipeline full of user instructions and data.

Running in Parallel

In early 2002, Intel introduced an architectural innovation that results in even better performance of .NET Framework applications. With Hyper-Threading technology, one physical Xeon processor can be viewed as two logical processors each with its own state. The performance improvements due to this design arise from two factors: an application can schedule threads to execute simultaneously on the logical processors in a physical processor, and on-chip execution resources are utilized at a higher level than when only a single thread is consuming the execution resources.

The runtime is designed to fully exploit parallelism with support for Hyper-Threading and multiprocessor systems. A developer writing a .NET Framework application does not need additional code to take advantage of these features since the runtime manages all the runtime details with its thread pool. The pool can spawn new threads and block others to optimize runtime CPU utilization. It also recycles threads when they are done, starting them up again without the overhead of killing and creating new ones. The thread pool is also aware of issues specific to managed code, such as garbage collection cycles, and adjusts its scheduling logic accordingly.

.NET Framework applications designated for deployment on multiprocessor machines, such as those using the Intel Xeon MP processors, should make use of the server-optimized runtime implementation (MSCorSvr.dll). This version of the runtime has a modified garbage collector that splits the managed heap into several sections, with one per CPU. Each CPU has its own garbage-collection thread, so they can all participate simultaneously in the collection cycle.

Getting a Jump on the Competition

.NET Framework developers interested in getting early availability and discounts on the latest Intel solutions and software tools should consider the Intel Early Access Program. This program jumpstarts a development team intending to build world-class .NET Framework solutions on an Intel platform. It also provides a great outlet for your work by offering co-marketing and sales development opportunities with Intel.

Resources

Intel Developer Services

Performance Considerations for Runtime Technologies in the .NET Framework

Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework

Hyper-Threading 2 for 1: Hyper-Threading Technology Makes Intel Xeon Processors Do Double Duty

Intel NetBurst Architecture

Intel Early Access Program

Show:
© 2014 Microsoft