CLR Inside Out
Inside Diagnostic Tools for .NET
Many diagnostic tools use the CLR Profiling API-even those that aren't strictly profilers. So if you've ever wondered how these tools work, a look at the Profiling API is a good start. In this column, you'll see how they work and look at some useful tips and tricks. You'll also find some essential resources in the "Other Profiling Resources" sidebar.
To use the CLR Profiling API, you create a DLL using an unmanaged language-typically C++, then you set some environment variables that instruct the common language runtime (CLR) to load the DLL and allow it to use the Profiling API. When loaded, this DLL effectively becomes an extension of the CLR itself, receiving callbacks, requesting information, and making changes deep within the implementation of the CLR. The Profiling API can provide notification of many activities within the CLR and managed code, including the creation and destruction of appdomains, loading and unloading assemblies, JIT compiling functions, executing functions, throwing and catching exceptions, and doing garbage collections. Using the Profiling API, you can get information about parts of the application, such as names and locations of assemblies, descriptions of types and functions, and locations and layout of objects in memory. Finally, you can use the Profiling API to modify settings, instructions, and the like, including disabling optimizations in the JIT compiler, changing the intermediate language (IL) for a function, or even creating new types and functions.
As you'll see, the combination of these capabilities can power a broad range of diagnostic tools.
Performance profilers are diagnostic tools that show how an application spends its time. If some operations in an application are running too slowly, a profiler can identify where that's happening so you can focus the optimization effort.
There are two main categories of performance profilers: tracing and sampling. The basic task of a tracing profiler is to log the time of every function call or return as an application runs. From this log, the profiler knows how control flow moves through the application, and how the time is spent. In the Visual Studio® 2005 Performance Tools, choosing Instrumentation from the Performance Wizard specifies a tracing profiler (see Figure 1).
Figure 1 Performance Profiling is Visual Studio 2005 (Click the image for a larger view)
In practice, especially in a managed code app, there are a lot of function calls. Even if the profiler can very quickly log each function entry and exit, the overhead adds up and can cause the application to run very slowly during profiling. Not only does this make profiling painful for the developer, it also drastically changes the timing of the app, revealing or hiding subtle race conditions or even causing timeouts. As a result, tracing profilers generally allow you to choose which modules or functions are interesting to profile.
At a minimum, a tracing profiler logs the name of the function being entered or exited, and some measure of the time. That might be the clock time (or "wall-clock time") or it might be the actual number of CPU cycles spent. The profiler can also log other information, such as the arguments passed to the function, to help you better understand the application. Of course, the more information it logs, the longer the logging takes, and the greater the overhead.
There are many ways to write a tracing profiler for .NET. Some, which don't use the Profiling API at all, open the assemblies on disk and insert log-writing code at the beginning and end of functions. This approach is analogous to the way tracing profilers were written for unmanaged code.
Even those who do use the Profiling API have options. With a couple of function calls, a profiler can ask for callbacks from the CLR every time a function is called or returns, and every time an exception passes through a function. The profiler can even choose to receive these callbacks only for certain functions, or to obtain information about the functions' arguments and return values. The profiler then does its logging from inside these callbacks.
If the built-in enter/leave callbacks aren't powerful enough, a profiler can also use the profiling API to modify a function's code before it gets JIT-compiled. This approach is similar to modifying the assemblies on disk, except that it occurs at runtime in memory and doesn't require the extra step of creating the instrumented assemblies before the profiling run.
Sampling profilers use simple statistics to enable the profiling of an entire application without the high overhead of a tracing profiler. As an application runs, a sampling profiler periodically checks (or samples) to see what the app is doing. The simplest sample just records which function is executing at the time. At the end, with hundreds or thousands of samples accumulated, the profiler analyzes how many samples appear in each function and extrapolates from that to estimate how much time is being spent in those functions. For example, if 25 percent of the samples occurred in function Foo, it's likely that the application spent 25 percent of its time running Foo. With enough samples, such estimates can be fairly accurate, and if the act of taking a sample is fast enough, this kind of profiling will not slow down the application very much. More advanced sampling profilers also record the call stack at the time of the sample, to give you a better idea of context. Choosing "Sampling" from the Visual Studio 2005 Performance Wizard lets you use a sampling profiler. Sampling profilers use various techniques to actually take the sample. Some create a new thread that sits in a loop sleeping, periodically waking to see what the other threads are doing. Others work with the operating system kernel to receive an interrupt on a regular basis.
Regardless of how the profiler gets control, it finds the current native-code instruction pointer, uses the CLR Profiling API to determine whether that instruction is actually in JIT-compiled managed code and, if so, which function it's in. Profilers that take call stacks for samples also use the Profiling API to find the managed-code functions on the stack.
Memory profilers examine an application's memory usage. In unmanaged code, most memory profilers focus on finding instances where developers had forgotten to free memory or where they'd written to memory they hadn't allocated. While the CLR's built-in bounds checking and garbage collector make such problems disappear, an application's memory usage can still lead to performance problems, especially if it forces the garbage collector to work in suboptimal ways. In fact, when investigating a performance problem in a managed application, it is often best to start by looking at its memory usage rather than how it spends its time.
Managed memory profilers typically keep track of memory usage over the lifetime of an application. They track objects from their creation, as they're moved around by various garbage collections, until they are finally released. A developer can use this data to recognize problematic allocation patterns and clean up the application's memory usage. The Visual Studio 2005 Performance Tools can profile memory usage while performing a sampling or tracing profile; in the property pages for your performance project there are checkboxes for enabling memory profiling. CLR Profiler is a dedicated memory profiler tool and it comes with source code, so it's also a great example profiler as seen in Figure 2.
Figure 2 Memory Profiling in Visual Studio 2005 (Click the image for a larger view)
Managed memory profilers lean heavily on the CLR Profiling API, which notifies them of object allocations, enables them to discover the size and layout of objects, and tells them how objects are moved and deleted during garbage collections.
Occasionally you'll need to get a piece of information that's not easily obtainable from commonly available profilers. Sometimes, you can just put some tracing statements into your code, but that can be prohibitively expensive for large applications. If you're familiar with the CLR Profiling API, however, you can often build a custom profiler to answer a specific question. Here are some questions your profiler can answer:
- Are any objects being finalized after they've already been disposed? Are there disposable objects being finalized without being disposed?
- What does the call graph of my program look like?
- Who is throwing all the handled exceptions?
The power of the Profiling API-particularly its ability to change a function's code or introduce whole new types-is the backbone behind many tools beyond just profilers. In fact, the API should probably have been given a much more generic name. I hear about new types of tools being built with the Profiling API on a regular basis; here are a few of my favorites.
Fault Injectors Much of an application's code is devoted to handling errors and corner cases. Correspondingly, a large part of its testing is devoted to error-handling code, which is difficult because error conditions are often hard to produce on demand. Fault injectors help exercise failure paths in an application by forcing failures.
There are many ways to build a fault injector. One way is using the Profiling API's ability to change the code of a function before it's JIT-compiled. For example, the fault injector can simulate an out-of-memory condition by changing a function to throw an OutOfMemoryException instead of allocating an object, or it can simulate a network outage by replacing a call to a network function with a throw of a TimeoutException. Figure 3 shows an example.
Figure 3 Typical Fault Injection Modification
After Fault Injection
sb = new StringBuilder();
StringBuilder sb = null;
throw new OutOfMemoryException();
Code-Coverage Tools Code-coverage tools are similar to tracing profilers, but instead of measuring time, code coverage tools merely record which chunks of code are executed. Since so much of testing is about exercising corner cases, measuring the code coverage of an application's test suite is one way of estimating thoroughness of testing. If a function is never executed during an entire test run, then the function can't have been tested. CoverageEye.Net, available from a GotDotNet Workspace, is a code-coverage tool built on the CLR Profiling API (see Figure 4), and its accompanying source code is another great sample of Profiling API usage.
Some code-coverage tools only want to know which functions were executed, and can thus use the profiling API's built-in enter/leave callbacks. Others go deeper and record which paths have been taken through a function by inserting logging statements into the function's code.
Figure 4 Code Coverage Data from CoverageEye (Click the image for a larger view)
Flight Recorders Flight recorders are similar to tracing profilers and coverage tools, but their goal is more far-reaching-to record as much information about the execution of an application as possible, producing a log of activity. You can then peruse this log after a failure occurs, looking for clues as to its cause.
Flight recorders will typically take information from wherever they can get it-tracing infrastructures like Event Tracing for Windows® (ETW), the system event log, window messages, and so on. The Profiling API provides a treasure trove of information about an application's execution.
Aspect Weavers Aspect-oriented programming (AOP) lets you reduce the number of places in your code that you need to repeat the same logging tasks. For example, if many functions in an application need to write log entries, in a traditional programming language you would copy and paste logging code into each of those functions. Even if the logging code makes use of a logging utility class, certain kinds of updates would require you to touch every function with logging code in it.
In an aspect-oriented approach, you can place a logging aspect on any function that needs logging. In a managed language, the logging aspect may be a new attribute type, say LogAttribute. Then you write code just once to implement the aspect. This aspect-implementation code typically works with a tool called an aspect weaver to insert the logging code into any function with a LogAttribute.
One way to build an aspect weaver is using the Profiling API's ability to add new types and to modify the code for functions prior to JITing. I tend not to recommend this approach, because only one DLL can use the profiling API at a time. If that single DLL is an aspect weaver, you can't simultaneously use a performance profiler to measure your aspect-oriented application.
Profiling API Tips and Tricks
1. Have the build register your profiler Before you can run your profiler, you have to put some information about it in the registry. This is easy to forget, so it's best to configure the Visual C++® build process to do it for you. In the property pages of your project, set a Post Build Event to do "regsvr32 /s $(TargetPath)".
2. Start your profiler with F5 in Visual Studio 2005 To start an application under a profiler, you set a couple of environment variables that specify which profiler to use and then run the application in that environment. If you're using Visual C++ 2005 to develop your profiler, you can speed up the build/run/debug cycle by having Visual Studio set these environment variables for you when you launch an application from within the IDE. In the property pages of your C++ project, in the Debugging pane, enter information about the target application and the profiler environment variables. Make sure you set Debugger Type to Native Only.
3. Strange behavior? Stop when Win32 exceptions are thrown When a profiler DLL gets loaded into a process, it effectively becomes an extension of the CLR. While the CLR always erects barriers before transitioning into managed code, it seldom does anything special before calling a function in a profiler DLL-such functions are effectively treated like functions within the CLR!
When an exception-say, an access violation (AV)-comes out of a profiler function, it emerges in the middle of a function within the CLR that may not have been expecting an exception at that point. That can lead to a variety of strange behaviors, including drastic behavior changes in the application being profiled.
So, if you run into strange behavior when you're running an application under your profiler, set the debugger to stop when native exceptions are thrown by opening the Exceptions dialog and clicking the Thrown checkbox next to Win32 Exceptions.
Now when you're running your profiler under the Visual Studio debugger, it will stop as soon as it sees an AV or any other Win32® exception. If you see one of those getting raised inside your profiler, or inside a function called by your profiler, you've probably found the cause of the weird behavior.
In managed code in particular, a Win32 AV doesn't always indicate a big problem. For example, in many cases NullReferenceExceptions begin life as Win32 AVs. Since you're debugging in native-only mode, you won't see the managed exceptions, and it may be difficult to tell whether an AV is coming from a bug in your profiler. As a result, you may want to use this trick only when you run into strange behaviors that might be caused by exceptions in profiler code.
4. Implement ICorProfilerCallback2 If you're writing a profiler for the .NET Framework 2.0 or the .NET Framework 3.0 (which contains the former), make sure your profiler class inherits from ICorProfilerCallback2 (even if all the methods just return E_NOTIMPL), and that the QueryInterface method responds to requests for IID_ICorProfilerCallback2. The CLR 2.0 has a few concepts that didn't exist in the CLR 1.x-such as generics-and the way a profiler signals to the CLR that it can handle those concepts is by responding to a QueryInterface for the new interface.
Send your questions and comments to
Jonathan Keljo is a Program Manager on the CLR team. During his time with the CLR, he's worked on the Debugging API, exception system, threading APIs, Managed Debugging Assistants, and memory dump support. His current responsibilities include error reporting, the Profiling API, and special projects.