CLR Inside Out - Production Diagnostics Improvements in CLR 4
By Jon Langdon | May 2010
On the Common Language Runtime (CLR) team, we have a group whose focus is providing APIs and services that enable others to build diagnostics tools for managed code. The two biggest components we own (in terms of engineering resources dedicated) are the managed debugging and profiling APIs (ICorDebug* and ICorProfiler*, respectively).
Like the rest of the CLR and framework teams, our value is realized only through the applications built on top of our contributions. For example, Visual Studio teams consume these debugging and profiling APIs for their managed debugger and performance profiling tools, and a number of third-party developers are building tools with the profiling API.
For the past 10 years, much of the focus in this area, for both the CLR and Visual Studio, has been on enabling developer desktop scenarios: source-stepping in a debugger to find code errors; launching an application under a performance profiler to help pinpoint slow code paths; edit-and-continue to help reduce time spent in the edit-build-debug cycle; and so on. These tools can be useful for finding bugs in your applications after they’ve been installed on a user’s machine or deployed to a server (both cases hereafter referred to as production), and we do have a number of third-party vendors building world-class production diagnostics tools on top of our work.
However, we consistently get feedback from customers and these vendors stressing the importance of making it even easier to find bugs throughout the life of an application. After all, software bugs are generally considered to be more costly to fix the later they’re found in the application lifecycle.
CLR 4 (the runtime underlying the Microsoft .NET Framework 4) is the first release in which we’ve made a significant effort to address that feedback and to begin expanding the scenarios our diagnostic APIs support toward the production end of the spectrum.
In this article, I will take a look at some of the scenarios we understand to be particularly painful today, the approach we’re taking to solve them and the types of tools they enable. Specifically, I will explain how we’ve evolved the debugging API to support dump debugging for application-crash and -hang scenarios, and how we’ve made it easier to detect when hangs are caused by multi-threading issues.
I will also describe how adding the ability to attach profiling tools to an already-running application will further ease troubleshooting these same scenarios, and greatly reduce the amount of time it takes to diagnose problems caused by excessive memory consumption.
Finally, I will briefly explain how we made profiling tools easier to deploy by removing the dependency on the registry. Throughout, the focus is primarily on the types of new tools our work enables, but where applicable I have included references to additional resources that will help you understand how you can take advantage of our work through Visual Studio.
One popular feature we’re delivering with Visual Studio 2010 is managed dump debugging. Process dumps, typically referred to just as dumps, are commonly used in production debugging scenarios for both native and managed code. A dump is essentially a snapshot of a process’ state at a given point in time. Specifically, it’s the contents of the process’s virtual memory (or some subset thereof) dumped into a file.
Prior to Visual Studio 2010, in order to debug managed code in dumps you needed to use the specialized Windows Debugger extension sos.dll to analyze the dumps, instead of more familiar tools like Visual Studio (where you probably wrote and debugged your code during product development). Our goal for the high-level experience we want you to have when using a dump to diagnose issues in Visual Studio is that of stopped-state live debugging: what you experience when you’re debugging code and stopped at a breakpoint.
The most common point in time to collect a dump is when there’s an unhandled exception in an application—a crash. You use the dump to figure out why the crash occurred, typically starting by looking at the call stack of the faulting thread. Other scenarios where dumps are used are application hangs and memory-usage issues.
For example, if your Web site has stopped processing requests, you might attach a debugger, gather a dump and restart the application. Offline analysis of the dump might show, for example, that all of your threads processing requests are waiting on a connection to the database, or perhaps you find a deadlock in your code. Memory-usage issues can manifest in various ways from an end user’s perspective: the application slows down because of excessive garbage collection; service is interrupted because the application ran out of virtual memory and needed to be restarted; and so forth.
Through CLR 2, the debugging APIs provided support for debugging running processes only, which made it difficult for tools to target the scenarios just described. Basically, the API was not designed with dump debugging scenarios in mind. The fact that the API uses a helper thread running in the target process to service debugger requests underscores this point.
For example, in CLR 2, when a managed debugger wants to walk a thread’s stack, it sends a request to the helper thread in the process being debugged. The CLR in that process services the request and returns the results to the debugger. Because a dump is just a file, there’s no helper thread to service requests in this case.
To provide a solution for debugging managed code in a dump file, we needed to build an API that did not require running code in the target to inspect managed-code state. Yet because debugger writers (primarily Visual Studio) already have a significant investment in the CLR debugging API to provide live debugging, we did not want to force the use of two different APIs.
Where we landed in CLR 4 was reimplementing a number of the debugger APIs (primarily those required for code and data inspection) to remove the use of the helper thread. The result is that the existing API no longer needs to care whether the target is a dump file or a live process. In addition, debugger writers are able to use the same API to target both live and dump debugging scenarios. When live debugging specifically for execution control—setting breakpoints and stepping through code—the debugging API still uses a helper thread. Over the long term, we intend to remove the dependency for these scenarios as well. Rick Byers (a former developer on the debugging services API) has a useful blog post describing this work in more detail at blogs.msdn.com/rmbyers/archive/2008/10/27/icordebug-re-architecture-in-clr-4-0.aspx.
You can now use ICorDebug to inspect managed code and data in a dump file: walk stacks, enumerate locals, get the exception type and so on. For crashes and hangs, there’s often enough context available from the thread stacks and ancillary data to find the cause of the issue.
While we know the memory diagnostics and other scenarios are also important, we simply did not have enough time in the CLR 4 schedule to build new APIs that give debuggers the ability to inspect the managed heap in the way this scenario requires. Moving forward, as we continue to expand the production diagnostics scenarios we support, I expect this is something we will add. Later in the article I discuss other work we’ve done to help address that scenario.
I’d also like to explicitly call out that this work supports both 32- and 64-bit targets and both managed-only and mixed-mode (native and managed) debugging. Visual Studio 2010 provides mixed-mode for dumps containing managed code.
Monitor Lock Inspection
Multi-threaded programming can be difficult. Whether you are explicitly writing multi-threaded code or leveraging frameworks or libraries that are doing it for you, diagnosing issues in asynchronous and parallel code can be quite challenging. When you have a logical unit of work executing on a single thread, understanding causality is much more straightforward and can often be determined by simply looking at the thread’s call stack. But when that work is divided among multiple threads, tracing the flow becomes much harder. Why is the work not completing? Is some portion of it blocked on something?
With multiple cores becoming commonplace, developers are looking more and more to parallel programming as a means for performance improvement, rather than simply relying on chip speed advancements. Microsoft developers are included in this lot, and over the past few years we’ve been significantly focused on making it easier for developers to be successful in this area. From a diagnostics perspective, we’ve added a few simple-yet-helpful APIs that enable tools to help developers better cope with the complexities of multi-threaded code.
To the CLR debugger APIs we’ve added inspection APIs for monitor locks. Simply put, monitors provide a way for programs to synchronize access to a shared resource (some object in your .NET code) across multiple threads. So while one thread has the resource locked, another thread waits for it. When the thread owning the lock releases it, the first thread waiting may now acquire the resource.
In the .NET Framework, monitors are exposed directly through the System.Threading.Monitor namespace, but more commonly through the lock and SyncLock keywords in C# and Visual Basic, respectively. They’re also used in the implementation of synchronized methods, the Task Parallel Library (TPL) and other asynchronous programming models. The new debugger APIs let you better understand what object, if any, a given thread is blocked on and what thread, if any, holds a lock on a given object. Leveraging these APIs, debuggers can help developers pinpoint deadlocks and understand when multiple threads contending for a resource (lock convoys) may be affecting an application’s performance.
For an example of the type of tools this work enables, check out the parallel debugging features in Visual Studio 2010. Daniel Moth and Stephen Toub provided a great overview of these in the September 2009 issue of MSDN Magazine (msdn.microsoft.com/magazine/ee410778).
One of the things that excites us the most about the dump debugging work is that building an abstracted view of the debug target means new inspection functionality is added, such as the monitor-lock-inspection feature, which provides value for both live and dump debugging scenarios. While I expect this feature to be extremely valuable for developers while they’re initially developing an application, it’s the dump debugging support that makes monitor-lock inspection a compelling addition to the production diagnostic features in CLR 4.
Tess Ferrandez, a Microsoft support engineer, has a Channel 9 video (channel9.msdn.com/posts/Glucose/Hanselminutes-on-9-Debugging-Crash-Dumps-with-Tess-Ferrandez-and-VS2010/) in which she simulates a lock-convoy scenario common to what she has found when troubleshooting customer applications. She then walks through how to use Visual Studio 2010 to diagnose the problem. It’s a great example of the types of scenarios these new features enable.
While we believe the tools these features enable will help decrease the amount of time it takes developers to resolve issues in production, we do not expect (nor want) dump debugging to be the only way to diagnose production issues.
In the case of diagnosing issues of excessive memory use issues, we usually start with a list of object instances grouped by type with their count and aggregate size and progress toward understanding object reference chains. Shuttling dump files containing this information between production and development machines, operations staff, support engineers and developers can be unwieldy and time-consuming. And as applications grow in size—this is especially prevalent with 64-bit applications—the dump files also grow in size and take longer to move around and process. It was with these high-level scenarios in mind that we undertook the profiling features described in the following sections.
There are a number of different types of tools built on top of the CLR’s profiling API. Scenarios involving the profiling API generally focus on three functional categories: performance, memory and instrumentation.
Performance profilers (like the one that ships in some of the Visual Studio versions) focus on telling you where your code is spending time. Memory profilers focus on detailing your application’s memory consumption. Instrumenting profilers do, well, everything else.
Let me clarify that last statement a bit. One of the facilities the profiling API provides is the ability to insert intermediate language (IL) into managed code at run time. We call this code instrumentation. Customers use this functionality to build tools that deliver a wide range of scenarios from code coverage to fault injection to enterprise-class production monitoring of .NET Framework-based applications.
One of the benefits the profiling API has over the debugging API is that it is designed to be extremely lightweight. Both are event-driven APIs—for example, there are events in both for assembly load, thread create, exception thrown and so on—but with the profiling API you register only for the events you care about. Additionally, the profiling DLL is loaded inside the target process, ensuring fast access to run time state.
In contrast, the debugger API reports every event to an out-of-process debugger, when attached, and suspends the runtime on each event. These are just a couple of the reasons the profiling API is an attractive option for building online diagnostics tools targeting production environments.
Profiler Attach and Detach
While several vendors are building always-on, production-application-monitoring tools via IL instrumentation, we do not have many tools leveraging the performance- and memory-monitoring facilities in the profiling API to provide reactive diagnostics support. The main impediment in this scenario has been the inability to attach CLR profiling API-based tools to an already-running process.
In versions preceding CLR 4, the CLR checks during startup to see if a profiler was registered. If it finds a registered profiler, the CLR loads it and delivers callbacks as requested. The DLL is never unloaded. This is generally fine if the tool’s job is to build an end-to-end picture of application behavior, but it does not work for problems you didn’t know existed when the application started.
Perhaps the most painful example of this is the memory-usage-diagnostics scenario. Today, in this scenario we often find that the pattern of diagnostics is to gather multiple dumps, look at the differences in allocated types between each, build a timeline of growth, and then find in the application code where the suspicious types are being referenced. Perhaps the issue is a poorly implemented caching scheme, or maybe event handlers in one type are holding references to another type that’s otherwise out of scope. Customers and support engineers spend a lot of time diagnosing these types of issues.
For starters, as I briefly mentioned earlier, dumps of large processes are large themselves, and getting them to an expert for diagnosis can introduce long delays in the time to resolution. Further, there is the problem that you have to involve an expert, primarily due to the fact that the data is only exposed via a Windows Debugger extension. There’s no public API that allows tools to consume the data and build intuitive views on top of it or integrate it with other tools that could aid in analysis.
To ease the pain of this scenario (and some others), we added a new API that allows profilers to attach to a running process and leverage a subset of the existing profiling APIs. Those available after attaching to the process allow sampling (see “VS 2010: Attaching the Profiler to a Managed Application” at blogs.msdn.com/profiler/archive/2009/12/07/vs2010-attaching-the-profiler-to-a-managed-application.aspx) and memory diagnostics: walking the stack; mapping function addresses to symbolic names; most of the Garbage Collector (GC) callbacks; and object inspection.
In the scenario I just described, tools can leverage this functionality to allow customers to attach to an application experiencing slow response times or excessive memory growth, understand what’s currently executing, and know what types are alive on the managed heap and what’s keeping them alive. After gathering the information, you can detach the tool and the CLR will unload the profiler DLL. While the rest of the workflow will be similar—finding where these types are referenced or created in your code—we expect tools of this nature to have a sizeable impact on the mean-time-to-resolution for these issues. Dave Broman, developer on the CLR profiling API, discusses this and other profiling features in more depth on his blog (blogs.msdn.com/davbr).
I do, however, want to call out two limitations explicitly in this article. First, memory-diagnostics on attach is limited to non-concurrent (or blocking) GC modes: when the GC suspends execution of all managed code while performing a GC. While concurrent GC is the default, ASP.NET uses server-mode GC, which is a non-concurrent mode. This is the environment in which we see most of these issues. We do appreciate the usefulness of attach-profiling diagnostics for client applications and expect to deliver this in a future release. We simply prioritized the more common case for CLR 4.
Second, you are unable to use the profiling API to instrument IL after attach. This functionality is especially important to our enterprise-monitoring tool vendors who want to be able to change their instrumentation dynamically in response to runtime conditions. We commonly refer to this feature as re-JIT. Today, you have one opportunity to change the IL body of a method: when it’s first being JIT-compiled. We expect delivering re-JIT to be a significant and important undertaking and are actively investigating both the customer value and technical implications as we consider delivering the work in a future release.
Registry-Free Activation for Profilers
Profiler attach and the supporting work of enabling specific profiling APIs after attach was the biggest piece of work we did in the CLR profiling API for CLR 4. But we were delighted to find another very small (in terms of engineering cost) feature that has a surprisingly large impact on customers building production-class tools.
Prior to CLR 4, for a tool to get its profiler DLL loaded in a managed application it had to create two key pieces of configuration information. The first was a pair of environment variables that told the CLR to enable profiling and what profiler implementation to load (its CLSID or ProgID) when the CLR starts up. Given that profiler DLLs are implemented as in-process COM servers (with the CLR being the client), the second piece of configuration data was the corresponding COM registration information stored in the registry. This basically told the runtime, by way of COM, where to find the DLL on disk.
During CLR 4 planning, while trying to understand how we could make it easier for vendors to build production-class tools, we had some interesting conversations with some of our support engineers and tools vendors. The feedback we received was that, when faced with an application failure in production, customers often care less about impact to the application (it’s already failing; they just want to get the diagnostic information and move on) and more about impact to the state of the machine. Many customers go to great lengths to document and police the configuration of their machines, and introducing changes to that configuration can introduce risk.
So, with respect to the CLR’s part of the solution, we wanted to make it possible to enable xcopy-deployable diagnostics tools to both minimize the risk of state changes and to reduce the time it takes to get the tools up and running on the machine.
In CLR 4 we removed the need to register the COM profiler DLL in the registry. We still require environment variables if you want to launch your application under the profiler, but in the attach scenarios detailed previously, there is no configuration required. A tool simply calls the new attach API, passes the path to the DLL and the CLR loads the profiler. Moving forward, we’re looking into ways to further simplify profiler configuration as customers still find the environment-variable solution challenging in some scenarios.
Combined, the two profiling features I just discussed enable a class of low-overhead tools that customers can, in the case of unexpected failures, quickly deploy to a production machine to gather performance or memory diagnostic information to help pinpoint the cause of the problem. Then, just as quickly, the user can return the machine to its original state.
Working on diagnostics features for the CLR is bittersweet. I see many examples of how much customers struggle with certain issues, and yet there’s always cool work we can do to help ease development and make customers happier. If that list of issues challenging our customers isn’t changing—that is, we’re not removing items from it—then we’re not succeeding.
In CLR 4 I think we built features that not only provide immediate value in addressing some top customer problems, but also lay a foundation upon which we can continue to build in the future. Moving forward, we will continue to invest in APIs and services that make it even easier to expose meaningful diagnostic information in all phases of an application’s life so you can focus more time on building new features for your customers and less time debugging existing ones.
If you’re interested in providing us feedback on diagnostics work we’re considering for our next release, we’ve posted a survey at surveymonkey.com/s/developer-productivity-survey. As we are now about to ship CLR 4, we’re focusing much of our time on planning for upcoming releases and data from the survey will help us prioritize our efforts. If you have a few minutes, we would love to hear from you.
Jon Langdon is a program manager on the CLR team where he focuses on diagnostics. Prior to joining the CLR team, he was a consultant with Microsoft Services, helping customers diagnose and fix problems with large-scale, enterprise applications.
Thanks to the following technical expert for reviewing this article: Rick Byers