sup { vertical-align:text-top; }

CLR Inside Out

Large Object Heap Uncovered

Maoni Stephens

Contents

The Large Object Heap and the GC
When a Large Object Gets Collected
LOH Performance Implications
Collecting Performance Data for the LOH
Using a Debugger

The CLR garbage collector (GC) divides objects into small and large categories. When an object is large, some attributes associated with it become more significant than if the object is small. For instance, compacting it—copying the memory elsewhere on the heap—is expensive. In this month's column I am going to look at the large object heap in depth. I will talk about what qualifies an object as a large object, how these large objects are collected, and what kind of performance implications large objects impose.

The Large Object Heap and the GC

In the Microsoft® .NET Framework 1.1 and 2.0, if an object is greater than or equal to 85,000 bytes it's considered a large object. This number was determined as a result of performance tuning. When an object allocation request comes in and meets that size threshold, it will be allocated on the large object heap. What does this mean exactly? To understand this, it may be beneficial to explain some fundamentals about the .NET garbage collector.

As many of you are aware, the .NET garbage collector is a generational collector. It has three generations: generation 0, generation 1, and generation 2. The reason for this is so that in a well-tuned application, you can expect most objects to die in generation 0. For example, in a server app, the allocations associated with each request should die after the request is finished. The in-flight allocation requests will make it into generation 1 and die there. Essentially, generation 1 acts as a buffer between young object areas and long-lived object areas.

From a generation point of view, large objects belong to generation 2 because they are collected only when there is a generation 2 collection. When a generation is collected, all younger generations are also collected. So for example, when a generation 1 garbage collection happens, both generation 1 and 0 are collected. And when a generation 2 garbage collection happens, the whole heap is collected. For this reason, a generation 2 garbage collection is also known as a full garbage collection. In this column I will use the term generation 2 garbage collection instead of full garbage collection, but they are interchangeable.

Generations are the logical view of the garbage collector heap. Physically, objects live on managed heap segments. A managed heap segment is a chunk of memory that the garbage collector reserves from the OS (via calling VirtualAlloc) on behalf of managed code. When the CLR is loaded, two initial heap segments are allocated—one for small objects and one for large objects, which I will refer to as the small object heap (SOH) and the large object heap (LOH), respectively.

Allocation requests are then satisfied by putting managed objects on one of these managed heap segments. If the object is less than 85,000 bytes, it will be put on a SOH segment; otherwise it'll be on a LOH segment. Segments are committed (in smaller chunks) as more and more objects are allocated onto them.

For the SOH, objects that survive a garbage collection are promoted to the next generation; so objects that survive a generation 0 collection will be considered generation 1 objects, and so on. Objects that survive the oldest generation, however, will still be considered in the oldest generation. In other words, survivors from generation 2 will be generation 2 objects; and survivors from LOH will be LOH objects (collected with generation 2). User code can only allocate in generation 0 (small objects) or LOH (large objects). Only the garbage collector can "allocate" objects in generation 1 (by promoting survivors from generation 0) and generation 2 (by promoting survivors from generations 1 and 2).

When a garbage collection is triggered, the garbage collector traces through the live objects and compacts them. For LOH, though, because compaction is expensive the CLR team chose to sweep them, making a free list out of dead objects that can be reused later to satisfy large object allocation requests. Adjacent dead objects are made into one free object.

An important thing to keep in mind is that even though today we don't compact LOH, we might in the future. So if you allocate large objects and want to make sure they don't move, you should still pin them.

Note that the figures shown here are only for illustration purposes. I used very few objects, just to show what happens on the heap. In reality there are many more objects there.

Figure 1 illustrates a scenario where I form generation 1 after the first generation 0 GC where Obj1 and Obj3 are dead; and I form generation 2 after the first generation 1 GC where Obj2 and Obj5 are dead.

fig01.gif

Figure 1 SOH Allocations and Garbage Collections

Figure 2 illustrates that after a generation 2 garbage collection in which you saw that Obj1 and Obj2 were dead, I formed one free space out of the memory that used to be occupied by Obj1 and Obj2, which then was used to satisfy the allocation request for Obj4. The space after the last object Obj3 until the end of the segment can still be used to satisfy further allocation requests.

fig02.gif

Figure 2 LOH Allocations and Garbage Collections

If I don't have enough free space to accommodate the large object allocation requests, I will first attempt to acquire more segments from the OS. If that fails, then I will trigger a generation 2 garbage collection in hope of freeing up some space.

During a generation 2 garbage collection, I take the opportunity to release segments that have no live objects on them back to the OS (by calling VirtualFree). Memory from the last live object to the end of the segment is decommitted. And the free spaces remain committed though they are reset, meaning the OS doesn't need to write data in them back to disk. Figure 3 illustrates a scenario in which I release a segment back to the OS (segment 2) and decommit more space on the remaining segments. If I need to use the decommitted space at the end of the segment to satisfy new large object allocation requests, I will commit the memory again.

fig03.gif

Figure 3 Dead Segments Released on LOH during Garbage Collection

For an explanation on commit/decommit, please see the MSDN® documentation on VirtualAlloc at go.microsoft.com/fwlink/?LinkId=116041.

When a Large Object Gets Collected

To determine when a large object is collected, let's first talk about when a garbage collection happens in general. A garbage collection occurs if one of the following conditions happens:

Allocation Exceeds the Generation 0 or Large Object Threshold Most GCs happen because of allocations on the managed heap (this is the most typical case).

System.GC.Collect Is Called When someone calls GC.Collect on generation 2 (by passing either no arguments to GC.Collect or passing GC.MaxGeneration as an argument), the LOH will get collected right away along with the rest of the managed heap.

System Is in Low Memory Situation This happens when I receive the high memory notification from the OS. If I think doing a generation 2 GC will be productive, I will trigger one.

The threshold is a property of each generation. When you allocate objects into a generation, you increase the amount of memory in that generation and get closer to the generation's threshold. And when the threshold is exceeded for a generation, a garbage collection is triggered on that generation. So when you allocate small or large objects, you consume generation 0's and LOH's respective thresholds. And when the garbage collector allocates into generation 1 and 2, it consumes generation 1 thresholds. These thresholds are dynamically tuned as the program runs.

LOH Performance Implications

Let's look at allocation costs. The CLR makes the guarantee that the memory for every new object I give out is cleared. This means the allocation cost of a large object is completely dominated by memory clearing (unless it triggers a garbage collection). If it takes two cycles to clear 1 byte, it means it takes 170,000 cycles to clear the smallest large object. It's not uncommon for people to allocate large objects that are a bit too large. For a 16MB object on a 2GHz machine, it will take approximately 16ms to clear the memory. That's a rather large cost.

Now let's see about collection costs. As mentioned earlier, LOH and generation 2 are collected together. If either one's threshold is exceeded, a generation 2 collection will be triggered. If a generation 2 was triggered because of the LOH, generation 2 itself won't necessarily get much smaller after the garbage collection. Now, if there's not much data on generation 2, this is not a problem. But if generation 2 is big, it could cause performance problems if many generation 2 garbage collections are triggered. If many large objects are allocated on a very temporary basis and you have a big SOH, you could be spending too much time running garbage collections; not to mention that the allocation cost can really add up if you keep allocating and letting really large objects go.

Very large objects on the LOH are usually arrays (it's very rare to have an instance object that's really, really large). If the elements of an array are reference-rich, the cost is high. If the element doesn't contain any references, I wouldn't need to go through the array at all. For example, if you use an array to store nodes in a binary tree, one way to implement it is to refer to a node's right and left node by the actual nodes:

class Node
{
    Data d;
    Node left;
    Node right;
};

Node[] binary_tr = new Node [num_nodes];

If num_nodes is a large number, it means you will need to go through at least two references per element. An alternative approach is to store the index of the right and the left node:

class Node
{
    Data d;
    uint left_index;
    uint right_index;
};

This way, instead of referring to the left node's data as left.d, you would refer to it as binary_tr[left_index].d. And the garbage collector wouldn't need to look at any references for left and right node.

Out of the three reasons for a collection, the first two are usually more dominant than the third. Therefore, it's best if you can allocate a pool of large objects and reuse them instead of allocating temporary ones. Yun Jin has a good example of such a buffer pool in his blog entry (go.microsoft.com/fwlink/?LinkId=115870). Of course, you would want to make the buffer size larger.

Collecting Performance Data for the LOH

There are a few ways to collect performance data that's relevant to the LOH. But before I explain them, let's talk about why you would want to do it.

Before you start collecting performance data for a specific area, hopefully you have already found evidence that you should be looking at this area or you have exhausted other areas that you know of and didn't find any problems that could explain a performance problem you are observing.

I would recommend reading a blog entry of mine for more explanation (see go.microsoft.com/fwlink/?LinkId=116467). There I talk about the fundamentals of memory and CPU. Additionally, the November 2006 installment of CLR Inside Out on investigating memory issues talks about steps involved in diagnosing performance problems in a managed process that may be related to managed heap (see msdn2.microsoft.com/magazine/cc163528).

.NET CLR Memory Performance counters are usually a good first step in investigating performance issues. The counters that are relevant to the LOH are the number of generation 2 collections and the large object heap size. The number of generation 2 collections displays the number of times generation 2 garbage collections have occurred since the process started. The counter is incremented at the end of a generation 2 garbage collection (also called a full garbage collection). This counter displays the last observed value.

Large object heap size is the current size in bytes, including the free space, of the Large Object Heap. This counter is updated at the end of a garbage collection, not at each allocation.

A common way to look at performance counters is via Perfor­mance Monitor (PerfMon.exe). Use "Add Counters" to add the interesting counter for processes that you care about, as you can see in Figure 4.

fig04.gif

Figure 4 Adding Counters in Performance Monitor

You can save the performance counter data to a log file in Perf­Mon. Performance counters can also be queried programmatically. Many people collect them this way as part of their routine testing process. When you spot counters with values that are out of the ordinary, you can then use other means to get more details to help with the investigation.

Using a Debugger

Before I begin, you should note that the debugging commands I'll mention in this section are applicable to the Windows® Debuggers. If you need to look at which objects are actually on the LOH, you can use the SoS debugger extension provided by the CLR, which is explained in the November 2006 installment I mentioned previously. An example output of analyzing the LOH is shown in Figure 5.

The bold sections of Figure 5 say that the LOH heap size is (16,754,224 + 16,699,288 + 16,284,504 =) 49,738,016 bytes. And between address 023e1000 and 033db630, there are 8,008,736 bytes occupied by Sys­tem.Ob­ject[] objects; 6,663,696 occupied by System.Byte[] objects; and 2,081,792 occupied by free space.

Figure 5 LOH Output

0:003> .loadby sos mscorwks
0:003> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x013e35ec
generation 1 starts at 0x013e1b6c
generation 2 starts at 0x013e1000
ephemeral segment allocation context: none
 segment    begin allocated     size
0018f2d0 790d5588  790f4b38 0x0001f5b0(128432)
013e0000 013e1000  013e35f8 0x000025f8(9720)
Large object heap starts at 0x023e1000
 segment    begin allocated     size
023e0000 023e1000  033db630 0x00ffa630(16754224)
033e0000 033e1000  043cdf98 0x00fecf98(16699288)
043e0000 043e1000  05368b58 0x00f87b58(16284504)
Total Size  0x2f90cc8(49876168)
------------------------------
GC Heap Size  0x2f90cc8(49876168)
0:003> !dumpheap -stat 023e1000  033db630
total 133 objects
Statistics:
      MT    Count    TotalSize Class Name
001521d0       66      2081792      Free
7912273c       63      6663696 System.Byte[]
7912254c        4      8008736 System.Object[]
Total 133 objects

Sometimes you'll see that the total size of the LOH is less than 85,000 bytes. Why is this? It's because the runtime itself actually uses LOH to allocate some objects that are smaller than a large object.

Since the LOH is not compacted, sometimes people suspect that the LOH is a source of fragmentation. You really need to first clarify what fragmentation means. There's fragmentation of the managed heap, which is indicated by the amount of free space between managed objects (in other words, what you see when you do !dumpheap –type Free in SoS); there's also fragmentation of the virtual memory (VM) address space, which is the memory marked as MEM_FREE and which you can see by using various debugger commands in windbg (see go.microsoft.com/fwlink/?LinkId=116470). Figure 6 shows the fragmentation in the virtual memory space (note the bold text in the figure).

Figure 6 VM Space Fragmentation

0:000> !address
    00000000 : 00000000 - 00010000
                    Type     00000000 
                    Protect  00000001 PAGE_NOACCESS
                    State    00010000 MEM_FREE
                    Usage    RegionUsageFree
    00010000 : 00010000 - 00002000
                    Type     00020000 MEM_PRIVATE
                    Protect  00000004 PAGE_READWRITE
                    State    00001000 MEM_COMMIT
                    Usage    RegionUsageEnvironmentBlock
    00012000 : 00012000 - 0000e000
                    Type     00000000 
                    Protect  00000001 PAGE_NOACCESS
                    State    00010000 MEM_FREE
                    Usage    RegionUsageFree
... [omitted]
-------------------- Usage SUMMARY --------------------------
    TotSize (      KB)   Pct(Tots) Pct(Busy)   Usage
     701000 (    7172) : 00.34%    20.69%    : RegionUsageIsVAD
   7de15000 ( 2062420) : 98.35%    00.00%    : RegionUsageFree
    1452000 (   20808) : 00.99%    60.02%    : RegionUsageImage
     300000 (    3072) : 00.15%    08.86%    : RegionUsageStack
       3000 (      12) : 00.00%    00.03%    : RegionUsageTeb
     381000 (    3588) : 00.17%    10.35%    : RegionUsageHeap
          0 (       0) : 00.00%    00.00%    : RegionUsagePageHeap
       1000 (       4) : 00.00%    00.01%    : RegionUsagePeb
       1000 (       4) : 00.00%    00.01%    : RegionUsageProcessParametrs
       2000 (       8) : 00.00%    00.02%    : RegionUsageEnvironmentBlock
       Tot: 7fff0000 (2097088 KB) Busy: 021db000 (34668 KB)

-------------------- Type SUMMARY --------------------------
    TotSize (      KB)   Pct(Tots)  Usage
   7de15000 ( 2062420) : 98.35%   : <free>
    1452000 (   20808) : 00.99%   : MEM_IMAGE
     69f000 (    6780) : 00.32%   : MEM_MAPPED
     6ea000 (    7080) : 00.34%   : MEM_PRIVATE

-------------------- State SUMMARY --------------------------
    TotSize (      KB)   Pct(Tots)  Usage
    1a58000 (   26976) : 01.29%   : MEM_COMMIT
   7de15000 ( 2062420) : 98.35%   : MEM_FREE
     783000 (    7692) : 00.37%   : MEM_RESERVE
Largest free region: Base 01432000 - Size 707ee000 (1843128 KB)

As mentioned earlier, fragmentation on the managed heap is used for allocation requests. It's more common to see virtual memory fragmentation caused by temporary large objects that require frequent garbage collection to acquire new managed heap segments from the OS and release empty ones back to the OS.

To verify whether the LOH is causing VM fragmentation, you can set a breakpoint on VirtualAlloc and VirtualFree and see who calls them. For example, if I want to see who tried to allocate VM chunks from the OS that are larger than 8MB, I can set a breakpoint like this:

bp kernel32!virtualalloc "j (dwo(@esp+8)>800000) 'kb';'g'"

This breaks into the debugger and shows me the callstack if Virtual­Alloc is called with the allocation size greater than 8MB (0x800000) and doesn't break into the bugger otherwise.

In CLR 2.0 we added a feature called VM Hoarding that may be applicable if you are in a situation where segments (including those for both the large and small object heaps) are frequently acquired and released. To specify VM Hoarding, you specify a startup flag called STARTUP_HOARD_GC_VM via the hosting API (see go.microsoft.com/fwlink/?LinkId=116471). When you specify this, instead of releasing empty segments back to the OS, the memory on these segments is simply decommitted and put on a standby list. The segments on the standby list will be used later to satisfy new segment requests. So next time I need a new segment, I'll use one from this standby list if I can find one that's big enough.

Note that this isn't done for the segments that are too large. This feature is also useful for applications that want to hold onto the segments that they've already acquired, like some server apps that seek to avoid fragmentation of the VM space as much as they can so that they don't cause out-of-memory errors. And they can do this since they are usually the dominating apps on the machine. I strongly recommend you carefully test your application when you use this feature and make sure it has a fairly stable memory usage.

Large objects are expensive. The allocation cost is high because the CLR needs to clear the memory for a newly allocated large object to satisfy the CLR guarantee that memory for all newly allocated objects is cleared. The LOH is collected with the rest of the heap, so carefully analyze how that impacts your app's performance. The recommendation is to reuse large objects if possible to avoid fragmentation on the managed heap and the VM space.

Finally, as of right now, the LOH is not compacted as a part of collection, but that is an implementation detail that should not be relied on. So to make sure something is not moved by the GC, always pin it. Now take your newfound LOH knowledge and go take control of the heap.

Send your questions and comments to clrinout@microsoft.com.

Maoni Stephens is Senior Developer working on the garbage collector in the CLR team at Microsoft. Before joining CLR, Maoni spent several years in the Operating System group at Microsoft.