This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.
|Performance Trade-offs of the Windows 2000 Component Execution Environment|
Let's start off with some good news. Despite some of the scary scenarios I've addressed in this column, COM always works. In the next few pages I will be presenting an overwhelming number of factors that affect the performance of your object model. However, even if you are completely ignorant of these factors, COM will still happily glue your objects together and allow method calls to be dispatched.
Most of the factors examined here are performance optimizations you can utilize if you choose to, but none of them are critical to correct operation. While some of the choices you make may have a negative impact on performance, this is preferable to abandoning correct operation to wring out the last ounce of performance. For example, all user interface code must execute on single-threaded apartment (STA) threads. Period. Additionally, all STA threads must avoid making blocking system calls without periodically allowing the message pump to run (unless deadlock is one of the selling points of your software).
Now for some bad news: proxies are still expensive. Actually, let me rephrase that. Proxies are still expensive when compared to not having a proxy. The classic Brockschmidtian COMâ"in which all same-process access was via direct C++-style object referencesâ"yielded the highest possible performance since only a handful of machine instructions were needed to invoke a method. However, when used in this fashion CoCreateInstance was really nothing more than a fancy wrapper around LoadLibrary. While there was nothing wrong with this free-love view of COM, the model has matured. COM developers are looking for more services from the platform, and the number of times that CoCreateInstance returns a raw reference (as opposed to a proxy) is diminishing. With the integration of context into COM, CoCreateInstance is increasingly acting like an ultra-high-performance CreateProcess rather than a fancy wrapper around DLLs.
Despite this new role for CoCreateInstance, making a method call across a proxy is considerably more expensive than making a method call against a raw object reference. Part of the reason for this is the marshaling of parameters from one stack frame to another. Another significant reason is that proxies must switch execution contexts both before and after invoking the method. The first switch ensures that the method runs under the expected runtime conditions. The second switch restores the runtime environment of the caller.
Avoiding Proxy-based Method Invocations
There are two common techniques you can use to avoid the cost of proxy-based method invocation. The first technique is simple: do it as seldom as possible. If you anticipate having a proxy, it is better to design your interfaces to get as much work done in a single method invocation, amortizing the COM context switching cost over multiple aggregate suboperations. You can ensure that an interface is never used with a proxy by adding the [local] attribute to the IDL definition of the interface (or to one or more methods). If you find yourself defining a remotable interface that makes extensive use of [propput] or [propget], you are probably going to experience a nontrivial amount of context switching overhead.
The second technique is to avoid having a proxy between your caller and your object. There are two ways to ensure this in COM, neither of which is attractive. The most draconian way is to implement the IMarshal interface and return a SEVERITY_FAILURE HRESULT from your GetUnmarshalClass method. This will prevent COM from ever creating a proxy to your object. It will also prevent your object from ever being passed as a parameter to another proxy, which is rather harsh.
A less drastic technique to avoid proxies is to aggregate the freethreaded marshaler (FTM). By aggregating the FTM, you are telling COM that you never want a proxy when accessed in the same process, in essence turning back the clock to the Brockschmidtian era. However, the FTM has several downsides. For one, your component cannot be configured in the COM+ catalog. This makes sense, since your objects will span all contexts in the process, rendering most class attributes meaningless. The second downside is that you must now employ the global interface table (GIT) for any object references that you hold as data members.
The GIT represents a minor programmatic inconvenience, but depending on the scenario can actually result in higher aggregate context switching costs than a non-FTM-based solution. Consider the simple object model shown in Figure 1. Here, object A creates object B, which then creates objects C and D. If object B aggregates the FTM, then it must use the GIT to hold references to C and D. If C and D are known to also aggregate the FTM, then the GIT can safely be bypassed. If, however, C and D's FTM status is unknown, then B has no choice but to use the GIT. In the best-case scenario, C and D actually do aggregate the FTM, in which case the only additional cost is the GIT overheadâ"a table lookup and an AddRef/Release pair (cycles) plus a process-wide lock (contention).
In a less favorable scenario, C and D do not aggregate the FTM, but instead use standard marshaling. In this case, it is unlikely that objects B and C reside in the same context as B's caller. This means that when A calls Bâ"triggering calls to C and Dâ"there will be two context switches, as shown in Figure 2. If this were the common-case scenario for B, then it would have been wiser for B to not aggregate the FTM, thus ensuring that all of B's methods would occur in a single context. This would include the CoCreateInstance calls for C and D, which in at least some cases would cause C and D to be colocated with B in a single context. Yes, a context switch would occur when A calls B, but if C and D are colocated with B, the two calls to C and D would be direct calls with no proxy involved (see Figure 3).
Freethreaded Marshaler Alternatives
While the FTM can be a beneficial tool for certain circumstances, it is not for everyone. Assuming that you are not going to aggregate the FTM, but still want to optimize method invocation costs, you must choose a combination of ThreadingModel and COM+ attributes that balances desired service levels with performance. The first choice you must make is whether to install a class into the COM+ catalog, making it a configured component. Configured components have a set of attributes that describe their expected context. Nonconfigured components (components that are installed simply using regsvr32.exe) have no such attributes and are compatible with virtually any type of context.
Depending on the target class's attribute values (or their absence), CoCreateInstance will either colocate the new object in the context of the creator or force it to exist in a distinct context. Of course, CoCreateInstance takes the creator's context into account, which means that knowing the scenarios in which your code will be used is a requirement for achieving optimal performance. As you'll see later on, knowing only the context type of your creator is necessary, but it's insufficient for making an optimal choice regarding attribute values.
If your goal is to avoid a proxy between you and your creator, then you want to ensure that new instances of your classes are always colocated with their creator, thus causing CoCreateInstance to return a raw object reference instead of a proxy. One way to do this is to write a nonconfigured class and mark it ThreadingModel=Both. By making your component nonconfigured, you are saying that your code is compatible with any context in the process (modulo ThreadingModel incompatibilities). In general, nonconfigured classes do not trigger a new context at CoCreateInstance-time. In contrast, by calling CoCreateInstance on a configured class you will generally cause a new context to be created for the new object.
Both of these rules have exceptions. In particular, the only way an instance of a nonconfigured class will not be colocated with its creator is if the ThreadingModel attribute of the class is incompatible with the apartment of the creator. The way to ensure compatibility is to mark the class ThreadingModel=Both, which indicates that the class is compatible with all known apartment types (as well as any future apartment types that may appear in Windows 2007). So if your goal is to ensure that your creator never gets a proxy, mark your class ThreadingModel=Both and do not install it into the COM+ catalog.
This does not mean that all configured classes get a new context at CoCreateInstance time, although that is certainly the norm. Figure 4 shows which attribute values allow a new object to be colocated with its creator. Note that classes marked JustInTimeActivation=True will never be colocated with their creator. This is an important observation for two reasons. First, this attribute value is a requirement for all transactional classes, which implies that creating a transactional object implies creating a new context. Second, JustInTimeActivation=True is the default value assigned to a class when it is installed into the COM+ catalog for MTS compatibility (see Figure 5). This means that if you accept your default attribute values, new instances of your configured class will never be colocated with their creator.
As Figure 4 shows, both the class's attribute values and the creator's context affect whether a new context is required. One way to ensure that a new context is never created is to mark your class MustRunInClientContext=True by selecting the last checkbox shown in Figure 5. This attribute value tells CoCreateInstance to fail the call if colocation is not possible. If the target class's attributes do not allow colocation in the creator's context, CoCreateInstance will simply return the distinguished HRESULT CO_E_ ATTEMPT_TO_CREATE_OUTSIDE_CLIENT_CONTEXT.
The discussion so far has focused on how to optimize method invocation between you and your creator. I've made the simplifying assumption that your creator is also going to be your caller. This assumption is not all that absurd, since the majority of COM objects tend to be used only by their creator and are not shared with others. However, many developers build object models that rely on sharing object references among several objects. In this sort of object model it is entirely possible that an object's creator may not be the primary caller of the object. If an object is expected to have shared access, then colocation with its creator is not necessarily the proper goal. Rather, configuring the class for the most efficient access from any context is often a more reasonable goal as it acknowledges that proxies are often an unavoidable fact of life.
Recall that for objects that are private to their creator, ThreadingModel=Both was the optimal setting because it allowed colocation for any apartment type. In fact, for private-access objects there is no need to aggregate the FTM since references to the object will never be marshaled outside of the creator's context. For shared-access objects, the correct ThreadingModel is not so obvious since colocation is not the primary goal. Proxies will be a way of life for a shared-access object, so you'll want to put the object into a context that requires a minimal amount of overhead to enter.
The performance cost of entering and leaving a context is influenced by many factors: the type of interface marshaler used, the argument types, and the delta between the caller's context and the target object's context. Let's take a look at the last of these factors first.
Many COM+ services are triggered by context switches. Activity-based locking, role-based security, just-in-time (JIT) activation, and transaction management all happen primarily at context boundary crossings. Obviously, the fewer of these services you enable, the cheaper it will be to enter your context. Of course, if you actually need these services, then the context switching cost is hopefully overshadowed by the benefits you'll get from COM+.
Perhaps the most subtle configuration setting for a shared-access object is the ThreadingModel. For a variety of reasons, context switches that cross apartment boundaries are often more expensive than same-apartment context switches. This is especially true for methods that pass object references as parameters. Additionally, even in the face of cross-apartment access, it is cheaper to enter a context using the caller's thread than to switch threads due to apartment constraints.
When deciding on a ThreadingModel, it is important to determine whether the class will be used to create private-access objects or shared-access objects. For private-access objects, ThreadingModel=Both is optimal because it ensures that the object will always share the apartment (and potentially the context) of its creator. Even if the creator gets a proxy (due to other attribute values on the target class), the proxy will only need to cross a same-apartment context boundary, which in many circumstances will be cheaper than cross-apartment context switching (especially when object references are passed as method parameters).
For shared-access objects, ThreadingModel=Both is probably a bad idea. The reason for this is as follows. Consider the case where object A creates object B, which is an instance of a ThreadingModel=Both class. This means that A and B will share an apartment (and potentially a context). If A passes references to B to objects in other apartments, calls from those other objects may need to switch threads prior to entering the context of A and B. If A (and, by association, B) resides in an STA, then a thread switch will always be required for these cross-apartment calls (see Figure 6).
The best way to avoid this situation is to mark your class ThreadingModel=Neutral. Had B's class been marked as such, object A would have received a cross-apartment proxy, but the target apartment would have been the thread-neutral apartment (TNA), which by definition never requires a thread switch to enter (see Figure 7). Again, assuming that B is a shared-access object, the fact that the creator (object A) always uses a proxy will be offset by the fact that no proxy to object B will ever need to switch threads.
The Called Objects
So far I have focused on two factors that affect context switching overhead: the object's creator and the object's caller. Let's look at a third factor: what other objects the object in question will be calling. If you are developing an object that makes many calls to other subordinate objects, then the cost of entering your object's context can easily be overshadowed by the cost of accessing your subordinates that may reside in different contexts. In this scenario, you need to consider the configuration of the subordinate classesâ"who creates the subordinate objects (you or someone else) as well as their access patterns (private or shared). In general, these considerations are simply a role-reversal of what I've discussed so far. Earlier, I considered the case where you were developing the target class/callee. Those same factors still apply when you are the creator/caller, only now you are in the opposite role.
One of the more interesting (and common) scenarios is an object model based on private-access objects. Remember that in the private-access case, the goal is to optimize access for the creator, hopefully by colocation. If your methods will make many calls to your subordinate objects, it is more important to colocate with the subordinates than with your creator. For this reason, ThreadingModel=Both may not be an appropriate setting because your creator's ThreadingModel may be incompatible with the ThreadingModel of your subordinates. If this is the case, you are much better off cloning the ThreadingModel of your subordinates in order to minimize thread switching overhead.
Consider the object models shown in Figure 8 and 9. If object B belonged to a class marked ThreadingModel=Both, then it would reside in the MTA with object A, as shown in Figure 8.
Because object C belongs to a ThreadingModel=Apartment class, each call from B to C will require a thread switch. If each call from A to B triggers n calls from B to C, then each call from A to B will cause n thread switches. If object B instead belonged to a class marked ThreadingModel=Apartment (the ThreadingModel of its subordinate object), then each call from A to B would trigger exactly one thread switch because calls from B to C would occur on the same thread (see Figure 9).
Another interesting interaction between ThreadingModel, COM+ attributes, and thread allocation is the interaction between the Synchronization attribute and ThreadingModel=Both or ThreadingModel=Apartment. Under ThreadingModel=Both and ThreadingModel=Apartment, it is assumed to be safe to create instances of the class on an STA thread. However, if the COM+ Synchronization attribute is set to Required, instances of ThreadingModel=Apartment classes that are created from user-created threads will always reside on a COM-managed activity thread. In contrast, instances of ThreadingModel=Both classes that are created from user-created threads will always reside on the creator's thread (and apartment).
Selecting a Marshaler
One additional item that can affect the cost of proxies is your choice of marshaler. Using explicit proxy/stub DLLs is often more expensive than using the TLB marshaler implied by the [oleautomation] and [dual] attributes.
In certain scenarios, COM is able to bypass the classic COM remoting architecture and optimize cross-context method invocation. At the time of this writing, this optimization could only be used if the registered marshaler was the type library marshaler.
I know as I write this that despite my best efforts many readers are now reaching for a towel to sop up the liquefied brain-matter that has oozed out of their ears. Threading was confusing enough. Adding apartments made it worse, and this column adds insult to injury by stating that no combination of apartments or threads is appropriate for every situation. While invocation performance is influenced by who creates you, who calls you, and who you call, invocation cost is often negligible when compared to the costs of making database calls or performing other network I/O. That said, I present the following 10 guidelines to atone for the complexity presented here.
Don Box is a cofounder of DevelopMentor, a COM think tank that educates the software industry in COM, MTS, and ATL. Don wrote Essential COM, and coauthored the follow-up Effective COM (Addison-Wesley, 1998). Reach Don at http://www.develop.com/dbox.
From the March 2000 issue of MSDN Magazine.