Objects and Values, Part 2: Memory Management
In the June 2002 installment of Basic Instincts I began a discus-sion of objects and values. This month I'll build on that column, so I will assume you've read the June installment and that you know the fundamental differences between value types and reference types. These differences contribute to values and objects exhibiting very different behaviors. If you read that earlier column, you also know about the restrictions involved with designing a user-defined value type (such as an enumeration or a structure) that make it different from designing a class.
It's important to remember that the biggest difference between values and objects has to do with how the common language runtime (CLR) manages their memory. You know that choosing between a value type parameter and a reference type parameter affects the behavior of both the ByVal and ByRef keywords. You also know how to add cloning support to a class so that your objects can provide clients with the ability to make copies on demand. In this month's column I am going to continue by discussing the type conversion operation known as boxing. After that, I am going to discuss how the CLR manages the lifetime of objects. As you will see, lifetime management has a profound effect on the way you write code for your classes.
I'm sure you have seen that values behave one way at run time while objects behave another way. However, there are certain situations in which the CLR determines that it must promote a value to an object in order to give it object-like capabilities. When the CLR converts a value into an object, it is known as boxing.
Recall that System.Object serves as the ultimate base type in the programming model of the CLR. All types (except for interfaces) inherit either directly or indirectly from the Object type. This is true of both value and reference types. Due to the inheritance relationship that every creatable type has with the Object type, any value and any object reference can be assigned to a variable declared using the Object type.
It's important for you to understand that the Object type is a reference type. That means that a variable declared using the Object type can be in one of two states. It can either hold a reference to an object on the heap or it can contain a value of Nothing. So what do you think happens when you take a stack-based value and assign it to an Object variable?
'*** create value on the stack Dim var1 As Integer = 714 '*** assign value to Object variable Dim var2 As Object = var1
When you assign a value type instance to an Object variable, boxing occurs. The CLR silently copies the memory for the value into a wrapper object on the heap, as shown in Figure 1. A reference to this heap-based object is returned to the variable var2. In this scenario, the only way the CLR could let a reference type variable point to a value is to promote the value to an object at run time.
Figure 1 Heap-based Object
Note that programmers do not write code to explicitly perform a boxing operation. Instead, the Visual Basic® .NET compiler determines when boxing operations must occur and it adds the appropriate instructions into the intermediate language (IL) that is compiled into your assemblies. From the programmer's perspective, boxing happens transparently behind the scenes.
This is just one example of boxing. There are several other scenarios in which boxing occurs. As you have just seen, Visual Basic performs boxing whenever you assign a value to a variable, field, or parameter that was declared using the Object type. Boxing also occurs whenever you need to call a method on an object that requires dynamic binding. Therefore, boxing will happen when you invoke an overridable method on a value and when you assign a value to an interface-based reference.
Let's look at a common scenario in which you might not expect boxing to occur. What happens when you create a stack-based value such as an Integer or a Double and you call the ToString method? ToString is overridable and, therefore, requires dynamic binding. However, values don't support dynamic binding. As a result, the CLR must promote the value to an object in order to invoke the ToString method using dynamic binding.
If you are curious as to whether boxing is occurring in a particular situation, you can always use the IL disassembler, ILDASM.EXE, to inspect the IL of your method implementations within your assemblies. By examining the IL, you can see when the Visual Basic .NET compiler has added a boxing instruction into your code.
The next question you might be asking is whether boxing should concern you when it occurs. The answer is sometimes yes, sometimes no. Boxing slows things down because the CLR has to create and manage a heap-based object. In some cases the performance hit will not be significant. However, there are other times when the performance hit will be something you want to avoid.
For example, imagine that you want to store 10,000 integer values in memory. You can use an Integer array or a System.Object array. However, if you use an array based on the Object type to store Integer values, every Integer value must be boxed. Obviously, forcing the CLR to create and manage 10,000 heap-based objects unnecessarily is quite costly.
Apart from performance issues, you need to be aware of boxing because you must occasionally unbox a boxed value. That is, you must explicitly cast a boxed object to copy its memory into another value. For example, examine the following code:In this example, there are three different places in memory that contain the data for the number 714. When var2 is assigned to var3, the memory from the heap-based wrapper object is copied into a stack-based value. You must perform this assignment with an explicit conversion when strict type-checking is enabled.
Dim var1 As Integer = 714 Dim var2 As Object = var1 '*** boxing Dim var3 As Integer = DirectCast(var2, Integer) '*** unboxing
The last thing I want to mention about boxing is that it copies data in ways that often catch programmers off guard. You have to be careful when you have two copies of the same data. It's possible to change one of the copies and think that you have changed the other. It's up to you to know when a boxing operation has created a copy and to keep things straight.
Managing Object Lifetimes
The programming model of the CLR is based on the premise that the system is responsible for allocating and managing the memory for types, values, and objects. This means that programmers using the Microsoft® .NET Framework are not responsible for allocating and managing memory. In fact, when you are writing managed code that is to run under normal conditions (safe mode), it is illegal to use pointers in order to directly allocate or access memory.
The memory management scheme of the CLR presents challenges for programmers who are transitioning from unmanaged languages such as C and C++. These programmers have been conditioned to use a hands-on style of memory management. They are accustomed to writing code to allocate and access memory through the use of pointers. They are also accustomed to explicitly releasing memory once they have finished using it. Obviously, moving to a platform where programmers are neither responsible for nor allowed to manage memory directly in their applications will require a significant mind shift on their part.
While the CLR's memory management scheme is a drastic departure from those used by C and C++ programmers, it is similar to the memory management scheme of Visual Basic. Starting with Visual Basic 1.0, the responsibility of memory management shifted away from programmers and was given to the system. Having the system manage memory behind the scenes makes it that much easier to write code, and it's one less thing that a programmer has to worry about.
If you program in any version of Visual Basic, the underlying system is responsible for managing memory for you. However, it's important to note that the mechanisms for managing memory employed by the CLR are different than the mechanisms used by previous versions of Visual Basic. The CLR manages the lifetime of objects through the use of a garbage collector. Earlier versions of Visual Basic are based on COM and, consequently, they manage object lifetimes through reference counting.
Let's look at simple example of a method that creates one instance using a value type and a second instance using a reference type. This will illustrate some of the more important issues of memory management:There are two local variables in this example. When Method1 completes, its stack frame is cleaned up and the memory for each variable is reclaimed. However, you should see that this example involves the allocation of memory on the heap as well as memory on the stack. After the stack-based memory for the variables var1 and var2 have been reclaimed, there is still a Dog object on the heap. This raises two important questions. First, who is responsible for reclaiming the memory for this object? Second, when will the memory for this object be reclaimed? The answer to these two questions depends on the version of Visual Basic that you're using.
Sub Method1() Dim var1 As Integer Var1 = 10 Dim var2 As New Dog() var2.Bark() End Sub
Let's begin the discussion of object lifetime management by giving you a quick refresher on how things worked with earlier versions of Visual Basic. As I mentioned, in older versions such as Visual Basic 6.0, the lifetime of objects is managed through reference counting. Every object is expected to keep track of how many clients have active references to it. Objects are also expected to release their memory when their reference count drops to zero.
In addition, since earlier versions of Visual Basic rely on reference counting to manage object lifetimes, it has become an acknowledged best practice to explicitly release objects as soon as your program has finished using them. This is often accomplished by setting a local object reference equal to Nothing:
Sub Method1() '*** create object Dim dog1 As New Dog() '*** use object dog1.Bark() '*** release object Set dog1 = Nothing End Sub
When you write client-side code like this in Visual Basic 6.0, the underlying runtime issues a Release call to the Dog object. The Dog object responds by decrementing its reference count and determining that the last reference has just been dropped. Next, the object executes any user-defined cleanup code and then removes itself from memory.
If you had written the Dog class in Visual Basic 6.0 and you had also added an implementation of the Class_Terminate method, you would be guaranteed that this method will be called at a predictable time. In the example you've just seen, the line of code that sets the dog1 variable to Nothing will block until the cleanup code in the Class_Terminate method has finished executing.
Now let's take a look at what's different in Visual Basic .NET. The biggest difference is that the CLR employs a garbage collector to reclaim the memory for objects. Let me provide a high-level explanation of how the garbage collector performs its job.
The garbage collector employs an algorithm to determine whether objects on the heap are reachable by the app. For an object to be reachable, the app must possess one or more references that make that object accessible. For example, if the garbage collector sees that there is a shared field or a local variable that currently references an object, the object is considered reachable. Any objects referenced by this object are also considered reachable.
However, if the garbage collector determines that an application does not possess any references that would make it possible to access an object, that object is considered to be unreachable. The garbage collector assumes that the memory occupied by unreachable objects is available to be reclaimed.
The garbage collector is typically triggered when an application goes beyond a certain memory usage threshold. When it's time for the garbage collector to determine which memory it can reclaim, it freezes program execution and builds a large object graph containing all the reachable objects in the application. When the CLR creates a new object, it assumes that it can reuse any memory that is not associated with a reachable object.
During a collection, the garbage collector may also decide to compact the heap by copying the memory for reachable objects into memory associated with unreachable objects. In other words, the garbage collector may move all the reachable objects to the beginning of the heap. When the garbage collector moves an object, it's also responsible for ensuring that all the application's references to that object are updated accordingly.
A memory management scheme that uses garbage collection is quite different from one that uses reference counting. First, a garbage collection scheme can offer better performance. This is especially true for a server-side application in which objects are being continuously created and released within the scope of a single client request. Reference counting is significantly more expensive due to the extra interaction that's required between an object and its clients. Clients are continually making calls to the object to increment and decrement its reference count.
The second big benefit to garbage collection is that it can easily manage sets of objects that contain circular references. This is not the case with a memory management scheme based on reference counting. Reference counting schemes are vulnerable to memory leaks when a pair of objects acquires references to one another. If neither object releases the other, the two objects will never see their reference counts reach zero. Even after the program has finished using these objects, they will remain in memory until the program ends. This is an example of a classic memory leak in COM.
When designing classes that are to be used in an environment that uses reference counting, you have two choices: either avoid circular references or use well-known programming techniques that are explicitly designed to break down any object graphs that contain circular references. In an environment that uses garbage collection, circular references do not create the same problem. If two objects are unreachable, it doesn't matter whether they hold references to each other or not. The garbage collector can free the memory of both objects because it can determine that the program is no longer using them.
The details of how the garbage collector of the CLR performs its work are fascinating and they are also fairly complicated. If you are interested in learning more about the garbage collector's inner workings, you can read about them in Parts 1 and 2 of Jeffrey Richter's article "Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework" in the November and December 2000 issues of MSDN® Magazine. Jeffrey provides excellent coverage of what goes on behind the scenes and describes how Microsoft engineers designed the garbage collector to provide very high levels of performance.
I don't think it's critical that you understand the implementation details of the garbage collector. While it may be interesting, it's definitely not a requirement for writing efficient code with Visual Basic .NET. It is critical, however, that you understand the best practices you should follow when writing code for a garbage-collected environment. This is the topic that I will discuss next.
Earlier, I explained two important differences between the use of reference counting and the use of garbage collection. First, you saw how a garbage collector can increase your application's performance. Second, you saw how a garbage collector provides a far more elegant solution for dealing with circular references. Unfortunately, the third major difference between garbage collection and reference counting is not an area in which garbage collection can provide an advantage.
The third big difference has to do with how you write the code to clean up after your objects. Imagine that you've written the Dog class in Visual Basic .NET. As you know, the lifetime of every Dog object will be managed by the CLR through garbage collection. Now imagine that someone has written the following client-side code to create and use instances from the class:
Sub Method1() '*** create two objects Dim dog1 As New Dog() Dim dog2 As New Dog() '*** use the objects dog1.Bark() dog2.Bark() '*** let go of the first object dog1 = Nothing End Sub
The important question you must ask yourself is, when are these two Dog objects destroyed? The answer is that you really don't know. The Dog object referenced by the dog1 variable is unreachable once the variable is set to a value of Nothing. The Dog object referenced by the dog2 variable is unreachable once the method ends and the dog2 variable goes out of scope. However, it's important to see that there is an indeterminate length of time between when objects become unreachable and when they are actually destroyed by the garbage collector. This interval of time makes writing cleanup code in Visual Basic .NET more complicated than it has been in previous versions.
If you were creating the Dog class using Visual Basic 6.0, you would simply add your cleanup code to the Class_Terminate method. You would also be able to make the assumption that your cleanup code would be called in a timely fashion when the client released its last reference to the object. Since Visual Basic .NET does not offer a Class_Terminate method, you must find another place to write your cleanup code.
The CLR provides support for object finalization. This means a managed object can request that the CLR send it a notification before its memory is reclaimed by the garbage collector. To request that your objects receive finalization notifications, include a Finalize method in your class definition. Here's an example of what such a class might look like:
Public Class Dog Protected Overrides Sub Finalize() Try '*** add your cleanup code here Finally MyBase.Finalize() End Try End Sub '*** other Dog class members omitted for clarity End Class
Note that in addition to your cleanup code, an implementation of Finalize should include a call to the Finalize implementation of its base class. If your cleanup code could experience a run-time exception, it should be placed in a Try statement that calls MyBase.Finalize in a Finally block.
Once you add a Finalize method, the CLR knows that objects created from your class require a finalization notification. When a client creates an object, the CLR adds a reference to an internal list of objects that require finalization. Therefore, objects that require finalization take longer to create. In addition, objects that don't require finalization have other performance advantages over objects that do require it.
Now let's discuss what happens at the end of the lifetime of an object that requires finalization. When the garbage collector determines that an object requiring finalization is no longer reachable by the application, it takes a reference to this object and adds it to a second internal list. While the first internal list is for objects that require finalization, the second list is for objects that are ready for finalization. Note that objects in this second internal list are reachable by the CLR even though they are no longer reachable by the application. This ensures that objects requiring finalization are not collected before their Finalize methods have been called.
The CLR uses a dedicated background thread to monitor the internal list of objects that are ready for finalization. This background finalization thread calls the Finalize method of objects as they are added to the list. If several objects are added to this list at once, the background thread moves through the list in a serialized fashion, executing Finalize methods one after another. Once the background finalization thread has called an object's Finalize method, the object is treated just like any other unreachable object and the CLR can reclaim its memory.
You might be wondering what happens if an object's Finalize method throws an exception. It turns out that it isn't a big deal. The CLR's finalization thread has been written to simply swallow such exceptions and move on to the next object that's ready.
Now that you have a basic understanding of the mechanics behind the Finalize method, let's reexamine things from a design perspective. While you might be tempted to use the Finalize method in the same manner in which you have used Class_Terminate in the past, that would be a mistake. There are several design issues that make writing code in the Finalize method very different from writing code in the Class_Terminate method.
Why is Finalize different from Class_Terminate? First and foremost, you don't get any guarantees about when Finalize is going to be called. Therefore, Finalize is not a good place to release time-sensitive resources such as a database connection or a file handle.
A second issue is that you are given no guarantees as to what order the CLR will call Finalize methods when a set of objects are ready for finalization. Let's look at an example to illustrate why this is important. What if you have designed the Dog class to contain a field based on the Bone class and both of these classes contain a Finalize method? You might be tempted to write code in the Finalize method of the Dog class to access the Bone object. However, code like this will not work reliably. There will likely be problems if the Bone object's Finalize method runs before the Dog object's Finalize method.
Since you cannot determine which Finalize method will run first, the Dog object should not attempt to access the Bone object in its Finalize method. As a rule of thumb, when you write a Finalize method you should never attempt to access any other object that could also be in the process of undergoing finalization.
Another issue is that the Finalize method is executed using a different thread. While this doesn't create a problem for most designs, you must exercise caution when using any programming techniques that create thread dependencies, such as the use of thread local storage. You might also be responsible for writing synchronization code if the finalization threads access the same data at the same time as other application threads.
Also remember that an object with a poorly written Finalize method can block the background finalization thread indefinitely. If an object's Finalize method encounters a synchronization lock or enters an infinite loop, the finalization thread will not be able to call the Finalize method of other objects. Therefore, you should note that a poorly written Finalize method can stop the garbage collector from doing its job.
One more issue that could affect your design is that the garbage collector calls Finalize on an object that experienced an exception during construction. Think about this for a minute. What happens when a client calls New on a class with a Finalize method and the constructor throws an exception? The exception is thrown back to the client, but the client never gets a reference to the partially constructed object. You might not expect the garbage collector to call the Finalize method for the object, but it does. This is done to give a partially constructed object a chance to release any resources it might have acquired.
It is interesting to note that the CLR allows you to trigger a garbage collection. Programmatic access to the garbage collector is provided through shared public members of the System.GC class. The following code illustrates how to trigger the garbage collector to synchronously execute the Finalize method for objects that are no longer reachable by the application:
'*** create object Dim spot As New Dog() '*** use object Spot.Bark() '*** release reference to object spot = Nothing '*** force GC to call object's Finalize method System.GC.Collect() System.GC.WaitForPendingFinalizers()
Why would you ever trigger a garbage collection programmatically? This technique can be handy when you are testing and debugging your Finalize method. However, explicitly triggering a garbage collection is usually an unacceptable practice in a production application. This is especially true for server-side applications that experience concurrent calls from multiple client applications. It would be a very big mistake to trigger a garbage collection every time a client application submits a request.
As I mentioned before, the garbage collector of the CLR has been designed to automatically trigger collections when the application's memory usage exceeds predefined thresholds. The garbage collector is able to optimize different applications in different ways depending on how it sees an application using memory. It is naive to think that you know more than the garbage collector about when it should do its work. Triggering a garbage collection programmatically in a production application is almost guaranteed to have the opposite effect of what you're hoping for.
Now that you have seen object finalization, let's revisit the discussion of class design. Now you know that the Finalize method isn't the best place to write the cleanup code for your objects. That's because Finalize doesn't get called when you really want it to. Therefore, you should not rely on code in the Finalize method to clean up resources that need to be released in a time-critical fashion.
Let's take a step back and ask a fundamental question. Does every object really need to execute cleanup code at the end of its lifetime? It really depends on what kinds of resources the object is holding. If an object holds nothing other than memory, you have no concerns. You don't need any cleanup code because it's the garbage collector's responsibility to free the memory for you. You might find that the majority of your classes don't require any custom cleanup code at all.
The only time you really need custom cleanup code that executes at the end of an object's lifetime is when the object holds a time-sensitive resource that is not managed by the CLR. Examples of such resources are database connections and file handles. There is a common .NET programming convention used by class authors who need to execute custom cleanup code at the end of an object's lifetime: exposing a public method named Dispose.
When you add a public Dispose method it is also recommended that you implement the IDisposable interface to advertise the fact that the object contains a Dispose method. Here's an example of a class that follows this convention:
Public Class Dog : Implements IDisposable Public Sub Dispose() Implements IDisposable.Dispose '*** your cleanup code goes here End Sub '*** other Dog class members omitted for clarity End Class
This convention for executing cleanup code in a timely fashion requires participation from the client as well as the object. It's the client's responsibility to call the Dispose method of a disposable object once the program is finished using it:
'*** create object Dim spot As New Dog() '*** use object spot.Bark() '*** call Dispose on object when done spot.Dispose()
As you can see, this convention allows an object to execute its cleanup code in a timely and predictable manner. Note that a class author can implement the Dispose method without having to worry about all the issues surrounding the Finalize method. When you are implementing Dispose, you can assume that the client will call it in a timely fashion. Furthermore, when the Dispose method is executing, the object and all the objects it references are still reachable. This means that an object can make calls to objects that it references in the Dispose method. As you recall, this is not the case with the Finalize method.
It's good to keep in mind that a disposable object puts additional requirements on what's expected from client-side code. A client has to know to call Dispose at the right time. Furthermore, a client must never make any additional calls to an object after calling Dispose. As you can imagine, this becomes even more difficult to manage in designs where multiple clients are given access to the same disposable object.
Once an object has been disposed, it should be considered off-limits. By convention, a disposable object should throw an exception if any of its methods are called after Dispose is called. There is a built-in exception type called ObjectDisposedException in the System namespace that was added to the Framework Class Library for this exact purpose. The class definition in Figure 2 shows one design in which a method checks to make sure it hasn't been disposed before servicing a method call.
Class Dog : Implements IDisposable Private ObjectDisposed As Boolean = False Public Sub Dispose() Implements System.IDisposable.Dispose Me.ObjectDisposed = True '*** add cleanup code here End Sub '*** user-defined method Sub Bark() If ObjectDisposed Then Dim ObjectName As String = "Dog" Dim Message As String = "Object has been disposed" Throw New ObjectDisposedException(ObjectName, Message) End If '*** perform operation End Sub End Class
At this point you have seen how both the Finalize and Dispose methods work. Now, would you ever want to include both methods in the same class definition? You might if you're worried about what will happen to a disposable object when the client forgets to call Dispose. This is a philosophical design question.
Should you add a Finalize method to deal with situations in which the client-side code did not live up to its responsibilities? Adding a Finalize method in addition to a Dispose method gives you an additional layer of safety. However, there are a few important trade-offs involving performance degradation and design complexity that you should consider.
A Finalize method always has an impact on performance because the CLR must update the two internal finalization lists as each object goes through its lifecycle. It updates the first list during object construction, which results in slowing down calls to the New operator. Objects that require finalization also increase the workload of the background finalization thread, which then takes processing cycles away from the threads that are running your application code.
The performance hit for object finalization is most noticeable in server-side applications where objects are often created and released within the scope of a single client request. When you are writing code for this type of server-side application, consider avoiding objects that require finalization. However, for other kinds of applications, the overhead of finalization is often far less noticeable. For example, if you are designing a class for a long-lived object that will run in a desktop application, the overhead of finalization will not be much of a concern at all.
Apart from performance-related issues, there is a second problem with creating a class that contains a Finalize method in addition to a Dispose method: it complicates the design of the class. The Dispose method contains the cleanup code that you hope will run. Finalize contains the backup cleanup code that you want to run in cases where the client forgot to call Dispose.
While both Dispose and Finalize are written to contain cleanup code, they usually must be written differently. That's because code you write for a Finalize method has several restrictions that are not an issue when writing a Dispose method. These restrictions were discussed earlier. The restriction that commonly requires your attention is that code in the Dispose method can make calls to other objects while the code in Finalize typically cannot.
If you create a class that contains both a Dispose method and a Finalize method, you must also ensure that the same cleanup code doesn't run more than once. In other words, you want to structure things so the cleanup code in Finalize only runs when the Dispose method was not called. This can be done by adding a call to a method named SuppressFinalize in your Dispose method. SuppressFinalize is a shared method in the System.GC class that removes an object from the internal list of objects requiring finalization. Figure 3 shows a simple example of a class that is designed to execute either the Dispose method or the Finalize method.
Class Dog : Implements IDisposable Public Sub Dispose() _ Implements System.IDisposable.Dispose '*** disable finalization notification System.GC.SuppressFinalize(Me) '*** add cleanup without finalization restrictions End Sub Protected Overrides Sub Finalize() Try '*** add cleanup with finalization restrictions Finally MyBase.Finalize() End Try End Sub End Class
As you can see, adding support for both the Dispose method and the Finalize method in the same class requires extra attention. As I mentioned, it doesn't always make sense to add both methods. The most common scenario for doing so is when you are designing classes for long-lived objects in desktop applications.
You should now have a good understanding of when to use objects and when to use values. In this two-part series, you have seen that values are faster to create and that they do not require the attention of the CLR garbage collector. Values have less overhead and are therefore preferable from a performance perspective.
While objects require more resources, they provide greater capabilities. An object can live independently outside the scope of a method call while a value cannot. A value can only live outside the scope of a method call when it is embedded inside an object. Objects also provide dynamic binding—essential to effective use of polymorphism. Finally, objects provide you with more opportunity for code reuse. This is due to the fact that reference types can be designed to support inheritance while values types cannot.
Send questions and comments for Ted to Instinct@microsoft.com.
Ted Pattisonis an instructor and researcher at DevelopMentor (http://www.develop.com), where he co-manages the Visual Basic curriculum. He is the author of Programming Distributed Applications with COM and Microsoft Visual Basic 6.0 (Microsoft Press, 2000). Parts of this article have been adapted from Ted's upcoming book, Building Applications and Components with Visual Basic .NET, to be published by Addison-Wesley in 2003.