Inspect and Optimize Memory Usage with the .NET Profiler API

Article
10/22/2019

Design

Inspect and Optimize Your Program's Memory Usage with the .NET Profiler API

Jay Hilyard

Code download available at:NetProfilerAPI.exe(1,492 KB)

This article assumes you're familiar with C#, C++, COM, and .NET

Level of Difficulty123

SUMMARY

Developers using .NET often make memory leak tracking a low priority because the common language runtime takes care of garbage collection. What few developers realize, however, is that their objects' lifespans, along with their size and what other objects have been instantiated, all affect how they are cleaned up. Depending on the particular circumstances, these combinations can negatively affect performance, especially over the lifetime of an application. This article presents a way for developers to see memory usage and understand garbage collection using the .NET Profiler API. Along the way, a sample application to demonstrate these principles is built.

Contents

Good Memory Usage
Overview of the Profiler API
Detecting Memory Usage with the Profiler API
Profiler Debugging Tips
Conclusion

P rograms running under Windows® allocate memory in order to represent different types of resources that are needed. These allocations can be thought of as objects that encapsulate the memory and any other resource state that the program needs.

When an application runs correctly, the resources and memory that are used are freed for use by other programs in the system. There are cases, however, when something goes wrong in the application and either the resource state or the memory or both are not properly freed, which results in a resource or memory leak. These errors can be difficult to identify. The garbage collector (GC) ensures that the memory a program allocates to accomplish tasks is freed without the developer needing to worry about it.

The more you know about garbage collection, the better you can construct your program to work with it. Objects in .NET are allocated from a piece of memory known as the managed heap. The heap is described as managed because after you ask it for memory, the garbage collector takes care of cleanup. This may seem like a lot of overhead since the garbage collector has to keep track of every object allocated in the .NET common language runtime (CLR), but it actually works quite efficiently.

Objects can be small, containing a few integers or larger, holding a database connection and lots of state info. Objects can be self-contained or contain or use other objects internally. The GC's job is to determine when objects should be collected to free memory for other programs, and it does this when it thinks it is full by marking the objects that can be deleted and then removing them from the managed heap. The garbage collector thinks it's full when it attempts to allocate a new object and sees that there is no more memory available in the managed heap. When the GC attempts to allocate memory and determines that it is full, it will then attempt to clean up some of the memory you have allocated in your application in order to make room for the new object.

The GC thinks about your objects in slightly different ways and takes their differences into account when deciding when to collect an object it considers no longer useful. One of the ways it does this is that it has a set of root objects it uses to determine what objects are available for collection. A reference to an object is considered to be a root if it generally falls into one of the following categories: a global or static object pointer, all local variables and parameter object pointers on a thread's stack, or any CPU registers containing pointers to objects in the managed heap. If a reference to an object is a root reference, it may or may not have child objects associated with it that would also survive the garbage collection. The GC starts with the root objects and follows the references to find the other objects that are referenced by the root so that these objects are not collected either.

As you can see in Figure 1, there are four objects allocated in the managed heap: (S)mall, (L)arge, (F)inalized, and (R)eferenced. Assume that each object is identified by its dominant characteristic (for example, there are no small objects that are holding references or other combinations). When these objects are allocated in the heap, they will be placed immediately next to one another in memory. I also have a root reference at (G)lobal scope that is holding a reference to Z.

Figure 1 Managed Heap

Figure 1** Managed Heap **

When the GC starts a garbage collection, it begins by assuming all objects are unnecessary until proven necessary. An object proves that it is necessary essentially by who it "knows" or references, or who it is referenced by or knows it. For the GC, root references provide the starting point for who knows whom. The GC follows the references from the root objects down the object hierarchy to determine if an object is reachable or could potentially be used by another object. If the object is proven to be reachable, it is not part of this garbage collection cycle. If the object is proven to be unreachable by any references, the GC marks the object collectable and it is discarded. The GC uses a "mark and compact" methodology, which means that once the GC determines that an object is garbage, another part of the GC removes the unreachable objects and compacts the space in the heap to ensure that allocations will continue to be very fast.

The GC views the objects involved in the cycles of collection in terms of generations. Every time an object is considered reachable, it gets promoted to the next generation. This means that the more objects referencing your object or the greater the scope your object operates in, the longer it will live. The GC has up to three generations currently going from 0 to 2. Generation 0 is typically populated with the smaller, short usage objects and is collected the highest number of times. This means that if you have small or little-used objects, they will be collected frequently. Generations 1 and 2 are repositories for the longer-lived and more frequently accessed objects and are subsequently collected less often. One of the fundamental assumptions in the GC is that there are more smaller, shorter-lived objects in your program and you benefit by cleaning them up more frequently. This is important to understand because how you design your system can have a big impact on how much memory you use and how long you hold it, since your working set will be large. The more memory you use, the more application performance will suffer.

Objects under 85,000 bytes are considered small and are allocated directly from the main part of the managed heap. Objects over 85,000 bytes are allocated from a special part of the managed heap referred to as the large object heap. There are two main differences in the way that the managed heap treats small and large objects. First, small objects are moved within the managed heap when it is compacted; large objects are not. Second, large objects are always considered part of generation 2, while small objects are typically considered part of generation 0. If you allocate many short-lived large objects, it will cause generation 2 to be collected more frequently. Since each subsequent generation (going from 0 to 2) is more expensive to collect, this will hurt the performance of your application.

One final aspect of garbage collection that I'd like to discuss is the concept of finalization. Finalization helps developers free resources they use in their objects at the time the objects are collected by the GC. Objects need to implement a Finalize method for this to work. The GC will call the Finalize method when the object is being destroyed to allow the object to clean up its internal resources and state. In C# and managed C++, the Finalize method is actually disguised in the syntax of a destructor (~Object) with the important difference that the Finalize method is only called when the GC cleans up the object, not when the object goes out of scope as in a true C++ destructor. Adding a Finalize method to your object means that it will always be called by the GC, but be careful because when you add a Finalize method to an object, the object will always survive the first generational garbage collection. Therefore, all finalized objects live longer. Since you are trying to let the GC clean up as efficiently as possible, only use finalization when you have unmanaged resources to clean up or in special cases where the object is expensive to create (object pools).

Let's go back to the original example in Figure 1 which has a managed heap with four objects and a root reference. If a garbage collection happens at this point (which would be due to manual intervention of the developer since none of the garbage collection criteria for launching a collection are met here), the result is that the (S)mall object will be garbage collected.

The large object survives the garbage collection because large objects are assigned to generation 2. The finalized object is noted by the GC and the Finalize method will be called, but the object itself will remain until the next garbage collection (possibly longer in some scenarios). The object with the root reference G remains; since it is a root reference, it is reachable.

Now let's assume that the next garbage collection that occurs is for generation 0-2 (which can be done by calling the System.GC.Collect method with 2 as the parameter). The (L)arge object is collected during the generation 2 cleanup and the (F)inalized object is collected during the generation 0 collection since Finalize has been called and has ended previous to the start of the collection. Only the object with the global reference is still present and thus remains for the life of the application.

Good Memory Usage

The GC takes care of memory leaks, but it doesn't prevent memory retention. As a developer, you have control over your object's lifetime. If you can reduce the working set of your application, performance will improve. If your application is designed in such a way that many objects remain alive for a long time, an argument could be made that you have a memory leak. Even though the memory is cleaned up in the end, you are still paying a performance penalty, so it is worth knowing how long your objects live.

The GC is a big help, but it only addresses one of the original types of leaks I discussed. Resource leaks are still an issue, but if you wrap your unmanaged resources in finalized classes, the GC can help ensure that you dispose of them properly. It is preferable to implement a Close or Dispose method on the object so that when your object is used, the resources can be cleaned up as early as possible, without waiting for the GC to clean them up (which could be a long time from when you stop using them). If you implement Finalize for unmanaged resource-using classes and are using the managed heap, you are reasonably safe from true leaks. Of course, this doesn't mean you should let your application's working set get chubby while processing merrily away.

Overview of the Profiler API

In order to demonstrate how much memory your application is using and how long your objects stick around, I have developed an application called MemoryUsage. MemoryUsage has two distinct parts. The first part is written as a C# application that will launch the process to be monitored and set an environment variable in the target process that indicates that the CLR should load the .NET profiler. The second part is written as the C++-based .NET profiler called MemProfiler that the CLR loads via the information in the environment variable. The .NET profiler is written using the Profiler API provided as part of the CLR, and this allows the profiler to run as part of the process being monitored and to receive notifications when certain events occur. It provides you with various notifications as the application executes. In order to receive these notifications from the CLR, you provide a callback interface as specified in the Profiler API (ICorProfilerCallback) and the CLR invokes the methods on the callback interface when the various events occur (see Figure 2).

Figure 2 Getting Notifications

Figure 2** Getting Notifications **

The main profiler callback methods of interest are the following: RuntimeSuspendStarted, RuntimeSuspendFinished, RuntimeResumeStarted, ObjectAllocated, ObjectsAllocatedByClass, MovedReferences, RootReferences, and ObjectReferences.

If you are unfamiliar with the Profiler API, you can access some more in-depth information in Profiler.doc, which is located in the \FrameworkSDK\Tool Developers Guide\docs folder under your Visual Studio® .NET installation.

There are a few things to consider when using a profiler, including thread safety and synchronization and the profiler's impact on performance. The Profiler API effectively allows you to run it as part of the CLR, so you must be cognizant of synchronization issues because multiple threads will call your profiler. The Profiler API specification Microsoft® provides states that callbacks are not serialized. It's up to the developer to protect his code appropriately by creating thread safe data structures and by locking the profiler code wherever necessary to prevent multiple threads from accessing the code in parallel.

I needed to synchronize access to the object tracking system as well as a few other items in the profiler callbacks I used in order to make the information come out correctly and, more importantly, to prevent threading bugs or deadlocks.

The performance impact of profiling can be significant. Running a .NET-based application using the profiling callbacks incurs roughly a 10x slowdown in application performance due to its intrusive nature. Because of the amount of information you process when looking at all of your options (detailed later in the article), you can have an even greater impact. The point of doing this sort of profiling is to figure out what is happening. The cost for making this discovery is that the application runs slower while you are looking at your options.

The .NET CLR will pull in exactly one profiler through the environment variable COR_PROFILER. This means I cannot work on something that already has another profiler loaded. Once the profiler is established for that instance of the CLR, you get an Initialize callback. Here is where you tell the CLR which notifications you want to receive by calling the ICorProfilerInfo::SetEventMask method with a DWORD containing masked values from the COR_PRF_MONITOR enumeration. The flags you should set in your profiler are shown in Figure 3.

Figure 3 Flags

Flag	Description
COR_PRF_MONITOR_SUSPENDS	For GC notifications
COR_PRF_MONITOR_GC	For all GC calls except ObjectAllocated
COR_PRF_ENABLE_OBJECT_ALLOCATED	For the ObjectAllocated call
COR_PRF_MONITOR_OBJECT_ALLOCATED	For the ObjectAllocated call

The CLR passes the ICorProfilerInfo interface to the profiler in order to allow the profiler to query the CLR for further information. This is used in my profiler to set the event mask and retrieve the size of objects based on the object ID, the class name from a class ID, and the class ID from an object ID.

Profilers implement the ICorProfilerCallback notifications API that is provided by the CLR in order to find out exactly what is happening in the runtime while a .NET-based application executes. I can use this in my profiler to look at certain runtime events and garbage collection notifications.

In order to see what notifications the GC provides as it goes through a garbage collection, I need to pay attention to several of the runtime callbacks. The RuntimeSuspendStarted callback will pass in a COR_PRF_SUSPEND_REASON enumeration value when it is called:

HRESULT RuntimeSuspendStarted( COR_PRF_SUSPEND_REASON suspendReason )

The reason the runtime is suspending is stored in suspendReason.

One of the enumeration values that is of interest in this context is the COR_PRF_SUSPEND_FOR_GC value. When this value is encountered, it means the application is being suspended by the CLR prior to a garbage collection request. This request could be from managed code calling the System.GC.Collect method or it could be from the CLR itself when it is determined that the GC should run. This indicates that I am about to receive callbacks from the CLR regarding the garbage collection that is pending.

Once the RuntimeSuspendFinished callback is given to the profiler, the CLR will move on and execute the GC callbacks:

HRESULT RuntimeSuspendFinished() HRESULT RuntimeResumeStarted()

All GC callbacks will take place between the time the RuntimeSuspendFinished and the RuntimeResumeStarted callbacks occur in the profiler.

Now that you know when the fun starts and ends for the garbage collector, you can look at the callbacks related to memory usage and object lifetimes, such as ObjectAllocated:

HRESULT ObjectAllocated( ObjectID objectID, ClassID classID )

Here, objectID indicates the object identifier for the object being allocated, and classID represents the class identifier for the object being allocated.

ObjectAllocated is called every time the CLR allocates an object in the managed heap. C++ programmers can think of the ObjectID as a this pointer that moves (more on this later). When ObjectAllocated is called, the profiler is given an ObjectID and a ClassID. These two tokens allow you to identify an instance of a .NET type that has been allocated. Incidentally, if you are interested in finding out about the class of this object, the ICorProfilerInfo interface can be used from within the callback to determine the information about the type by calling the GetClassIDInfo method with the classId token. Be aware that this callback will be invoked many times for even trivial programs. Due to the number of times this is called, Microsoft wanted to make sure that the profiler writer actually wanted these notifications, so you have to specify the following flags to the SetEventMask call during profiler initialization:

COR_PRF_ENABLE_OBJECT_ALLOCATED COR_PRF_MONITOR_OBJECT_ALLOCATED

If you do not specify both of these flags, you won't get any ObjectAllocated callbacks.

ObjectsAllocatedByClass provides enough information to determine which .NET types have had instances allocated and how many allocations per .NET type have taken place since the last garbage collection occurred:

HRESULT ObjectsAllocatedByClass( ULONG cClassCount, ClassID classIds[], LONG cObjects[] )

The variable cClassCount indicates the number of classes being reported as allocated since the last garbage collection, classIds[] represents the class identifiers for the classes, and cObjects[] indicates the count of the number of object instances allocated for the corresponding classId index in the classIds array.

This is a nice shorthand way to look at object creation, but one important difference between this and ObjectAllocated is that it does not track objects allocated in the large object heap.

In order to track objects that are allocated in the managed heap, you need to be able to follow them as they move. The MovedReferences callback helps you do this by providing the "bunches" of objects that have moved since the last garbage collection:

HRESULT Moved References(ULONG cMovedObjectIDRanges, UINT oldObjectIDRangeStart[], UINT newObjectIDRangeStart[], ULONG cObjectIDRangeLength[])

The count of the number of elements in the array parameters is indicated by cMovedObjectRefs. It represents the number of ObjectID ranges that were moved in the managed heap. The array oldObjectIDRangeStart holds starting points for the ranges of original object identifiers before they were moved by the garbage collector, and newObjectIDRangeStart is an array of starting points for the ranges of new object identifiers after they have been moved by the garbage collector. The array cObjectIDRangeLength holds the range sizes of the moved object IDs. It corresponds to the oldObjectIDRangeStart and newObjectIDRangeStart arrays by array position.

The GC tends to move the objects that survive the collection in blocks. This being the case, MovedReferences provides you with the number of ranges of objects that moved as a result of the garbage collection, the old object IDs for the objects, the new object IDs for the objects after the move, and a count of how many objects are in each range. As you can see in Figure 4, you start out with the first object ID in the range and check the object you're interested in to see if it is between the first object ID and the first object ID plus the size of the first range. If it is, then the first object ID has a new object ID. This new object ID is determined by taking the original object ID you were checking, subtracting the first old object ID in the range, and adding the first new object ID in the range. Repeat this procedure to obtain the rest of the object IDs you are tracking.

Figure 4 IDs

Figure 4** IDs **

As an example take a look at Figure 4, which shows an initial managed heap starting with objects 2 through 8. Once the GC has run, it then determines that objects 2, 4, and 6 are collectable. This results in the object ID change mappings from the old object IDs to the new object IDs.

Figure 4 lists a series of ranges that indicates what the original object IDs were and what the new object IDs will be after collection. In Figure 4, the original range was one item with an original ID of 3 that has changed to a new ID of 1. Following this pattern, the next row is also for one item with an original ID of 5 that now has an ID of 2. Finally, the last row indicates that two items were moved starting with the original ID of 7 (so 7 and 8 were moved) and starting with the new ID of 4 (so 7 became 3, and 8 became 4).

The RootReferences callback will give the profiler the root references that the GC knows about during a garbage collection:

HRESULT RootReferences( ULONG cRoots, ObjectID objectIds[] )

The variable cRoots indicates the number of root references that are currently known by the garbage collector and objectIds indicates the object identifiers for the root references.

RootReferences can be called multiple times (depending upon the number of objects that the CLR considers to be "root" at garbage collection time) since this number can be greater than the internal storage the CLR uses to keep track of them. One interesting thing to note is that you can get NULL ObjectIDs back in the array of object IDs. This happens when you have declared an object reference on the stack.

The ObjectReferences callback is called one time for every object remaining in the managed heap after the garbage collection has been completed:

HRESULT ObjectReferences( ObjectID objectId, ClassID classId, ULONG cObjectRefs, ObjectID objectRefIds[] )

The objectId indicates the object identifier for the object that is having its references reported. The class identifier for this object instance is classId. The number of references to other objects that this object instance holds is indicated by cObjectRefs, and objectRefIds[] represents the object identifiers for each one of the objects that are referenced by this object instance.

This can be short-circuited by returning an error-based HRESULT (E_FAIL) that will stop the callbacks for each object until the next garbage collection. ObjectReferences provides you with the object IDs for all of the objects that are referenced by the original object ID passed as the first parameter. You can then see how and when these two references callbacks are used in tandem, and the profiler can build an object call graph to represent the layout of the objects in memory.

References are significant because if your object has references to it, it stays in memory longer. In order to produce a streamlined working set, you should consider the objects you create in terms of who will hold references to them and for how long.

Detecting Memory Usage with the Profiler API

Now that you have an idea of what you can do with the Profiler API calls for the GC and CLR, let's walk through the MemoryUsage sample I have created so I can illustrate some of the items I spoke about earlier. I decided to use these callbacks to tell me the following things about managed memory usage under .NET:

Of each .NET type that is used, how many object instances are being allocated?
How big are the instances of each type?
What notifications does the GC provide as it goes through a garbage collection and what can you find out?
When does the GC collect the object instances?

In order to answer these questions, I have created a bit of code to walk you through the use of a profiler to gather information about garbage collection and memory usage. There are three main pieces to the sample code (available at the link at the top of this article): the MemoryUsage application—a C# shell to help launch the target process, a C# console application called MemApp that will be the target, and lastly the C++-based .NET profiler known as MemProfiler.

MemoryUsage simply takes the path to a .NET-based application that you want it to analyze and launches the target process with the correct environment variables, allowing the CLR to load the .NET profiler (MemProfiler). The MemProfiler collects information to a log while the target application runs (MemApp). Once it ends, the MemoryUsage application loads the log of memory information I have gleaned from the GC callbacks and the object tracking I'm doing in the profiler (see Figure 5).

Figure 5 Generating a Log

Figure 5** Generating a Log **

MemApp is a simple C# application that creates lots of objects and then intermittently forces garbage collection so that MemProfiler will be notified of the state of the application. The interesting code for MemApp is located in MemClass.cpp. The class running the show is called MemClass and has the Main method in it and no other members. The C# Main function in MemApp allocates a bunch of objects in different scenarios in order to exercise the garbage collector in many different ways, as shown in Figure 6.

Figure 6 Putting the GC to Work

class MemClass { static void Main(string[] args) { // This function calls GC.Collect a number of times to // generate activity in the profiler callbacks so that you // can see what is being used in memory by the CLR and how // the objects are treated over the life of the function. // You would not want to do this in your application, // (Don't try this at home as your performance will // suffer...) but it is done here for illustration purposes // create a bunch of objects // (increase the numbers here if you want to see a lot of // activity in MovedReferences) for(int j=0;j<100;j++) new MemTarget(10); // create single object to be around after collection MemTarget gct = new MemTarget(1); // collect to see if we can force relocation in Moved // References GC.Collect(); // make a reference to the single object after the collect // so that the GC promotes at least one object to the next // generation System.Diagnostics.Debug.WriteLine("MemApp",gct.ToString()); // make a few more objects to muddy the waters... MemTarget []tgts = new MemTarget[5]; for(int i=0;i<5;i++) tgts[i] = new MemTarget(i); // get rid of some of the objects GC.Collect(); // make a reference to the array after the collect so the // GC promotes some of the objects to the next generation System.Diagnostics.Debug.WriteLine("MemApp",tgts[2].ToString()); // force a collection GC.Collect(); // make an item that is sure to get collected next // collection new System.ArithmeticException("Not enough math used!"); // perform the last collection GC.Collect(); } }

The MemTarget class is designed to illustrate that if you are using object references inside other objects, you may have a lot of items causing memory retention. MemTarget illustrates this scenario by having an array member that holds references to MemInnerTarget instances. The MemInnerTarget instances are created in the constructor of MemTarget and the references are then assigned to the array, as you can see in Figure 7.

Figure 7 MemInnerTarget and MemTarget

public class MemInnerTarget { // trivial data storage int m_y = 0; public void SetValue(int y) { m_y = y; } public int GetValue() { return m_y; } } public class MemTarget { int m_x; // hold an array of references to objects MemInnerTarget[] m_InnerTargets; public MemTarget(int i) { // how many do they want? m_x=i; // make that many inner objects m_InnerTargets = new MemInnerTarget[i]; for(int a=0;a<i;a++) { m_InnerTargets[a] = new MemInnerTarget(); m_InnerTargets[a].SetValue(a); } } public override string ToString() { // provide the easy .NET way to display data string sz; sz = "Values: " + m_x + " "; for(int i=0;i<m_x;i++) { sz += "Inner[" + i + "]==" + m_InnerTargets[i].GetValue() + " "; } return sz; } }

MemProfiler has two main source files where most of the action takes place: MemProfilerCallback.cpp and MemState.cpp. MemProfilerCallback is the implementation of the ICorProfilerCallback interface for the profiler. MemState.cpp holds the implementation of the object tracking system that I built to keep track of memory and object lifetimes in the application. There are two types of objects I used to hold the information that I encountered: ObjectInstance and ClassDetail.

ObjectInstance holds information about the actual instances that are allocated for each class type and ClassDetail holds information about the .NET types (classes) that are used. I stored these objects in a couple of ATL map classes called Objects and Classes, respectively. For those of you who have not used the map class in ATL, it is basically a collection with a lookup key for each entry. ObjectInstance keeps track of quite a few things for the instance I am recording. It holds the object ID and class ID for the object, a history of the objects that this instance referenced, whether this instance was a root reference, a history of the object IDs that at one time were the valid object ID for this object, the size of the object, and whether the object was collected.

ClassDetail holds the class name, the class ID, and the number of times that the class has had instances allocated. When a new object ID is encountered for the first time, I add a new ObjectInstance to the tracking system and when a class ID is encountered for the first time, I add a new ClassDetail. When I am notified of object ID changes via MovedReferences, all of the ObjectInstances are updated to the new object ID so that if the object ID is reused by the CLR, it will be appropriately tracked.

Since the profiler is a COM DLL, you need to register the MemProfiler.dll from the MemoryUsage\bin directory using regsvr32 first. Once that is done, in order to monitor the MemApp application, fire up the MemoryUsage application from the MemoryUsage\bin directory and select File\Analyze from the menu. This will prompt you for the path to the application to run, so select the MemoryUsage\bin directory and pick MemApp.exe. This will launch MemApp.exe with the correct environment variables to pull in MemProfiler so that it can monitor MemApp's execution.

In order to look at the instances of objects that the program uses, I am going to use the ObjectAllocated callback. Once I know that I have allocated an object, I save the information about that object and its type as well as the size of the instance. I also save the information about this ClassID and the number of times it has been allocated. I retrieve the size of the instance through a call to ICorProfilerInfo::GetObjectSize, passing the ObjectID I obtained from the callback. This returns the size of the instance in bytes, so now I know that the CLR has used n bytes on my behalf. By keeping a list of all of these, you can see how it becomes relatively easy to determine how much memory the CLR is using from the managed heap. Now I know how many instances of each type are being allocated as well as how big the instances are.

Depending on what the program being monitored allocates, you may see only some of the garbage collection callbacks, but when you have a program using enough memory to create a large amount of garbage collection activity, you should see all of them. Either way, the callbacks will happen in the following order: ObjectAllocated (many times), ObjectsAllocatedByClass, MovedReferences, RootReferences, and ObjectReferences.

As the application executes, remember some things I noticed while building the MemProfiler. If you are going to do any inspection using the object IDs in the ranges provided via MovedReferences (to get the name associated with the object by using the ICorProfilerInfo::GetClassFromObject and ICorProfilerInfo::GetClassIdInfo methods, for example), you need to use the old Object IDs since the MovedReferences callback is performed before the objects are actually collected. The GC figures out how all of these objects will be shuffled in the managed heap and calls you back before it actually moves them. Since the Profiler documentation indicates that error checking of items passed into ICorProfilerInfo is minimal, you will immediately see some spectacular crashes if you give a bad object or class ID to the ICorProfilerInfo interface, so be warned. MovedReferences also does not fire for low memory programs, so if you run a Hello World type of program it is unlikely that it will allocate enough items and hold onto them long enough to require not only a garbage collection (which can be forced) but the moving of blocks of object IDs which would give you the MovedReferences callback.

It is possible to get a class ID for a .NET type (class) that has not been loaded yet in the ObjectAllocated callback. This means that if you tried to use the class ID to retrieve information about it from ICorProfilerInfo, you would fail. If you see this situation, usually the ObjectAllocated notification would be followed by a ICorProfilerCallback::ClassLoad notification for this class (when the corresponding assembly is loaded) after which the class ID would be valid for use with ICorProfilerInfo.

ObjectAllocatedByClass does not track allocations of the large object heap, but ObjectAllocated does. By comparing the notifications from the two, an enterprising soul should be able to figure out what is in the large object heap as opposed to the normal managed heap.

The table of information given to the profiler by MovedReferences not only indicates what object IDs have changed but also that the old IDs not represented in the new IDs section have been collected. This tells you which garbage collection (the one happening during the MovedReferences callback) the objects were collected in. How can you know that an object was collected if you don't get any MovedReferences callbacks? You can actually look at the object IDs coming into ObjectAllocated and if you get a duplicate object ID, check the class ID. If you find that the class ID is different, then you know that object instance was collected since the CLR will reuse collected object IDs for other items. If it had moved the object, it would have called you back and you would have already dealt with that. Even if the class ID is not different, the duplicate object ID indicates that the CLR recognizes this as something else, so you should too.

Figure 8 Summary Report

Figure 8** Summary Report **

The application should have finished executing and now you can see the summary report that the profiler generated (see Figure 8). The log file containing the summary report will, by default, be located next to the MemoryUsage executable, allowing you to look at the results in a different editor. If you delve into the default summary report a bit, you can see the name of the log file that the information was generated to, the settings for reporting, and the reported events that the settings called for. In the default case, the only flag set is LOG_MEMORY_USAGE_SUMMARY, which says to look at what objects were allocated, the number of instances, and the different number of classes used in this application. By default, I am showing the runtime callbacks to give a bit of context, since you can see the garbage collection cycles happen. At the top of the report, you can see the path to this log file as well as which options were enabled during the log generation. The flags for reporting various types of information are listed in Figure 9 and are enabled by using the Tools | Options menu in MemoryUsage before running the target application.

Figure 9 MemProfiler Flags

Flag	MemoryUsage Options
LOG_OBJECT_ALLOCATED	Include ObjectAllocated information
LOG_OBJECTS_ALLOCATED_BY_CLASS	Include ObjectsAllocatedByClass information
LOG_OBJECT_REFERENCES	Include ObjectReferences information
LOG_REFERENCED_OBJECTS	Include RootReferences information
LOG_ROOT_REFERENCES	Include referenced objects information
LOG_MOVED_REFERENCES	Include MovedReferences information
LOG_MEMORY_USAGE_SUMMARY	Include summary information for memory usage
LOG_MEMORY_USAGE_ALL	Include all information for memory usage
LOG_OUTPUT_DEBUG_STRING	Write log entries to debug window

The Options dialog in MemoryUsage allows you to control the type of information that will be reported. The default log file is named output.log and is located in the directory that MemoryUsage.exe is running from. You can change this on the Options dialog to wherever you prefer by using the Browse button to select a new file path.

The default information that the profiler tracks includes the summary information and some of the profiler flow notifications (runtime, garbage collections, and various summary statistics). If you want more detailed information, such as the actual instances that were allocated, sizes, who referenced whom, and more, just enable a few of the following different options.

The LOG_OBJECT_ALLOCATED flag enables the reporting of specific object instances that were reported via the ObjectAllocated callback. The report line in the log looks like this:

ObjectAllocated: System.OutOfMemoryException (0x00BA104C)

This says that you allocated an instance of the System.OutOfMemoryException class with an object ID of 0x00BA104C.

The LOG_OBJECTS_ALLOCATED_BY_CLASS flag enables the reporting of the block of information given to you by the CLR about all of the objects allocated since the last garbage collection occurred, broken down by class. The report section for this in the log looks like this:

ObjectsAllocatedByClass: 98 Classes ClassID: 0x79B595C4 Number of Allocations: 1 (Many more of the ClassID lines) Total Objects Allocated since last GC: 2053

This tells you how many of each class was allocated, the class ID for each class, how many classes were represented by this callback, and the total number of objects allocated.

The LOG_OBJECT_REFERENCES flag enables the reporting of the information that is made available in the ObjectReferences callback. The report line for the basic object and its references looks like this:

ObjectReferences: ObjectID 0x00BA110C ClassID 0x79B52404 Number of references 1 ClassName: System.SharedStatics

This line shows the name of the class of the object, its object and class IDs, and the number of references it holds.

When the LOG_REFERENCED_OBJECTS flag is enabled, the actual information about the object is reported in the ObjectReferences callback. One line will be used for every object referenced and will look like this:

ObjectReferences: ObjectID 0x00BA110C ClassID 0x79B52404 Number of references 1 ClassName: System.SharedStatics References: ObjectID 0x00BA2E08 ClassID 0x79B5FBF4 Class System.Security.PermissionTokenFactory

This flag will also enable the ObjectInstance report line output by LOG_MEMORY_USAGE_ALL, as shown here:

ObjectInstance: objectId 0x00BDA530 with classId 0x7B3764FC; Size: 36 bytes; Allocated prior to GC 1; Survived until GC 4; Referenced 3 objects and was a root reference [NOT COLLECTED] Referenced Object: objectId 0x00BB6274 count 4 Referenced Object: objectId 0x00BDA554 count 4 Referenced Object: objectId 0x00BDAC68 count 4

When the LOG_ROOT_REFERENCES flag is enabled, this illustrates the root references for a garbage collection cycle in the following manner:

Root Reference: ObjectID 0x00BA1AAC ClassID 0x02D4106A Class System.String[]

This details the object, class IDs, and class name of the root reference.

When the LOG_MOVED_REFERENCES flag is enabled, it shows the information received through the MovedReferences callback. Since MovedReferences only fires when the GC is actually compacting and reassigning object IDs, it is not seen in lower memory applications like MemApp by default. In order to generate the sample information shown in Figure 10, I changed the original allocation of the MemTarget array from 100 to 1000 items in the first line of Main in MemClass.cs in the MemApp target program. The MovedReferences section will show all of the ranges of IDs that changed along with a total count. It will also show each object that moved, its class name, and which one of the table items caused the move.

Figure 10 Log Output

MovedReferences: Table provided by MovedReferences: Starting Old Object ID Range: 0x00BA104C Starting New Object ID Range: 0x00BA1040 Size of moved ID value range 452 Starting Old Object ID Range: 0x00BA1284 Starting New Object ID Range: 0x00BA1204 Size of moved ID value range 468 (More items...) 35888 references moved Object Types moved: Moved Object System.OutOfMemoryException from ObjectID 0x00BA104C (start 0x00BA104C with 0x000001C4 items) to ObjectID 0x00BA1040 (start 0x00BA1040 with 0x000001C4 items) Moved Object System.StackOverflowException from ObjectID 0x00BA108C (start 0x00BA104C with 0x000001C4 items) to ObjectID 0x00BA1080 (start 0x00BA1040 with 0x000001C4 items) (More items...)

The LOG_MEMORY_USAGE_SUMMARY flag is on by default and the output represents the basic events happening during execution. You see an Initialize callback followed by a series of runtime callbacks interspersed with the number I assigned to each garbage collection. The end summary reports the total number of objects allocated, how many bytes were used in the managed heap, the lifetime statistics about the objects, and how many different classes were used. Figure 11 shows the default summary log entries.

Figure 11 Log Entries

Initialize RuntimeSuspendStarted: Garbage Collection RuntimeSuspendFinished ***** GC 0 starting ***** RuntimeResumeStarted ***** GC 0 ending ***** RuntimeSuspendStarted: Garbage Collection RuntimeSuspendFinished ***** GC 1 starting ***** RuntimeResumeStarted ***** GC 1 ending ***** RuntimeSuspendStarted: Garbage Collection RuntimeSuspendFinished ***** GC 2 starting ***** RuntimeResumeStarted ***** GC 2 ending ***** RuntimeSuspendStarted: Garbage Collection RuntimeSuspendFinished ***** GC 3 starting ***** RuntimeResumeStarted ***** GC 3 ending ***** RuntimeSuspendStarted: Garbage Collection RuntimeSuspendFinished ***** GC 4 starting ***** RuntimeResumeStarted ***** GC 4 ending ***** RuntimeSuspendStarted: Shutdown of the CLR RuntimeSuspendFinished Reporting Memory Usage... 4211 objects allocated Total Managed Heap Memory used in 4211 object instances: 318438 bytes Objects Lifetime Statistics: GC 0 : 2056 objects allocated before, 13 objects survived until GC 1 : 1347 objects allocated before, 4 objects survived until GC 2 : 771 objects allocated before, 20 objects survived until GC 3 : 21 objects allocated before, 771 objects survived until GC 4 : 3 objects allocated before, 1347 objects survived until GC 5 : 13 objects allocated before, 2056 objects survived until 4211 instances of 169 classes used Shutdown

Notice that objects are being allocated early in the program (before garbage collection 0 and 1 occur) and stick around until very late in the program (garbage collection 4 or 5). This indicates memory retention in my target application (as it should, since the code is designed to exhibit this behavior).

The LOG_MEMORY_USAGE_ALL flag gives you all the details about the objects that were allocated, how many instances of each class were used, and how big each instance was, along with the object instance detail. Detailed information available about each object instance, such as object lifetime, number of instances created, and sizes is shown here:

ObjectInstance: objectId 0x00BDA530 with classId 0x7B3764FC; Size: 36 bytes; Allocated prior to GC 1; Survived until GC 4; Referenced 3 objects and was a root reference [NOT COLLECTED] Instance 0x00BA104C 64 bytes Class System.OutOfMemoryException Instance 0x00BA108C 64 bytes Class System.StackOverflowException Instance 0x00BA10CC 64 bytes Class System.ExecutionEngineException Class: MemApp.MemTarget (0x0038512C) allocated 106 times Class: MemApp.MemInnerTarget (0x003854A8) allocated 1011 times

This LOG_OUTPUT_DEBUG_STRING flag will tell the profiler to not only write this information to the log file, but to write it to the debug stream as well through an OutputDebugString call.

You can watch the information in the summary report being generated by running the MemProfiler.DLL under the debugger and making the target application MemApp.exe. This is a good way to get a feel for how things are working as the garbage collections occur.

Profiler Debugging Tips

The sample application I provided (MemoryUsage) does the work of setting up the proper environment for the profiler. All you have to do is run MemoryUsage and select your target application in order to see the summary output. If you'd like to get a better feel for how the profiler notifications flow by running the profiler under a debugger, I'll offer a few tips on how to do that. If you haven't actually tried to create or use a profiler with .NET, there are a few things I'd like to share that should relieve some frustration that can result from trying to debug a profiler under Visual Studio .NET.

The first item of business is to indicate to the CLR that you want to be a profiler. In order to do this, you need to set certain environment variables that the CLR is looking for. These variables indicate that your profiler should be used by the CLR.

The COR_ENABLE_PROFILING environment variable turns on the profiling capacity in the CLR and should be set to 0x1 to indicate that you want to profile. Now that you have indicated your willingness to profile, you have to tell the CLR which COM-based profiler to pull in by specifying the COR_PROFILER environment variable and setting it equal to the CLSID for the profiler.

Here is an example for enabling the environment for the MemProfiler sample I have provided:

SET COR_ENABLE_PROFILING=0x1 SET COR_PROFILER={04059918-5D57-4068-9FB8-F1AB0B24B3BC}

Now that you have the right environment, you should be able to debug, right? There is one other thing to consider here—where the CLR (the one that will be pulling your profiler in) lives. Since your .NET-based application is launched from Visual Studio .NET, it passes along the environment to the application being debugged. This means that if Visual Studio .NET doesn't have these environment variables set, it will not pass them to the .NET-based application you are trying to profile and will consequently leave you wondering why you never hit your initial breakpoint. Why go through this? Visual Studio .NET acts as a CLR host so if you just set the environment variables first, and then run devenv /useenv (to use the current environment settings), Visual Studio .NET will drag in your profiler to monitor what it's doing in its instance of the CLR. This makes it rather difficult to debug, so I add in the environment variables after I start up so that they are simply passed on to the process being debugged (MemApp.exe in this case).

In order to ensure you have the environment correct, you should use the following procedure:

Open a Visual Studio .NET command prompt (from the Visual Studio .NET Tools submenu).
Run the following command to start up Visual Studio .NET:

devenv

Use Task Manager to get the process ID of the devenv process.
Run the GSET program (found in the bin directory in the sample code) with the following command line:

Gset [devenv process id]

GSET is a tool that a colleague of mine at the NuMega Lab (Yoshi Watanabe) wrote for modifying the environment of an already running process. When you run it you will see the current environment of the process (see Figure 12). You can use the New button to add the following environment variables to the development environment:

COR_ENABLE_PROFILING 0x1 COR_PROFILER {04059918-5D57-4068-9FB8-F1AB0B24B3BC}

Load the MemUsage.sln solution and change the debugging target for MemProfiler to point at MemApp.exe in the \bin\Debug directory. Make sure that MemProfiler is the startup project.

This will set up the Visual Studio .NET IDE with the correct environment settings. Now you have a debugging environment with the profiling environment variables set properly so you can debug through the profiler.

Figure 12 Modifying a Process

Figure 12** Modifying a Process **

At this point, I would suggest putting a breakpoint in the Initialize callback and running the application you want to profile. You should hit the breakpoint and be on your way to inspecting and implementing lots of cool profiling features.

Conclusion

In this article, I wanted to find out how many instances of each type were being used as well as their size, so I tracked the objects I saw in the ObjectAllocated callback and then used the ICorProfilerInfo::GetObjectSize call to get the instance size. By saving this information for each item that I received a ObjectAllocated callback for, I was able to display the information in my summary report at the end of the program.

By using a few of the runtime callbacks, specifically RuntimeSuspendStarted, RuntimeSuspendFinished, and RuntimeResumeStarted, I could see the ordering of the garbage collection notifications as it proceeded through the garbage collection. By looking at the garbage collector callbacks, I could then find statistics on the objects allocated for each .NET type, which objects were considered root references at a given time, which objects referenced other objects at a given time, and finally when the instance was collected by the garbage collector.

There is more memory involved in an application than just the managed heap, but being aware of how garbage collection works allows you to develop leaner, more efficient applications.

For background information see:
Applied Microsoft .NET Framework Programming by Jeffery Richter (Microsoft Press, 2002)

Jay Hilyardis a software engineer in the BoundsChecker group with Compuware/NuMega Lab. When Jay is not busy spending time with his family or watching the Patriots, he can be reached at hilyard@attbi.com.

Additional resources