Marshaling Your Data: Efficient Data Transfer T...
This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

Marshaling Your Data: Efficient Data Transfer Techniques Using COM and Windows 2000

Richard Grimes
This article assumes you�re familiar with COM, IDL, C++, and Visual Basic
Level of Difficulty    1   2   3 
Code for this article: Grimes0900.exe (108KB)
SUMMARY The way you choose to transfer data is vitally important in a distributed application. Windows 2000 provides several new features that allow you to transfer data more efficiently. Lightweight handlers allow you to write smart proxies that can cache results and perform buffered reads and writes, minimizing the number of network calls. Windows 2000 also allows you to use pipe interfaces to transfer large amounts of data efficiently through a read-ahead facility.
      This article illustrates several ways to improve data transfer in Windows 2000 using these new features. It also reports the results of transfer time tests and provides recommendations for transferred buffer sizes.

egardless of the purpose your distributed application serves, there is one requirement it will almost definitely have: efficient data transfer. In this article I'll look at methods to pass large amounts of data over a network using COM and Windows® 2000 and the role of marshaling in this process. I will also discuss the issue of data buffer sizes and explain some strategies that will allow you to optimize the transferred buffer sizes. I will concentrate on COM because it is the plumbing that holds many Windows-based components together. In addition, I will describe the many facilities for transferring data that are provided in Windows 2000.

Marshaling Data

      Before I discuss the ways in which Windows 2000 helps you transfer data, I'll describe how data moves from one machine to another. COM has its roots in Microsoft® Remote Procedure Calls (RPC). Indeed DCOM itself is essentially "object RPC" and it is often referred to as ORPC (in my opinion, this is a more precise term than DCOM). Because of its origins, COM has inherited RPC IDL as a way of describing interfaces. IDL is compiled with the Microsoft IDL compiler (MIDL), which does three things. First, it produces a description of interfaces (for example, type libraries). Second, it produces language bindings to allow you to use the interfaces in the language of your choice (well, just C and C++; all other languages have to use type libraries). And finally, it produces C code that can be compiled to produce proxy-stub DLLs to marshal the interfaces.
      These proxy-stub DLLs intercept calls to interfaces and contain code to allow the method calls to be made across context boundaries. The architecture is shown in Figure 1.
Figure 1 COM Marshaling Architecture
Figure 1 COM Marshaling Architecture

      Conceptually, the client code has direct access to the component. But under the covers, two separate objectsâ€"the interface proxy and stubâ€"are loaded automatically by COM in the client and component's contexts. If the contexts are in different processes or on different machines, then COM provides a channel object that transfers proxy and stub-initialized buffers over RPC. This channel object is implemented over RPC, but conceptually it is accessible in both the importing and exporting contexts. The proxy packages method parameters into a buffer obtained from the channel, whereas the stub obtains this buffer and uses it to construct the stack frame to call the component.
      COM loads the interface proxy and stubs when the original interface pointer of the component is first marshaled out of the component's context into the client's context. In standard marshaling, COM passes the interface pointer as a parameter to CoMarshalInterface. This function takes the context-specific interface pointer and converts it into a context-neutral blob of bytes that describes the precise location of the component and the interface that's being marshaled. This blob of data is unmarshaled in the client context, which converts this context-neutral blob into a context-specific interface proxy object, which is aggregated into the proxy manager (that also provides the proxy object's identity).
      The significant thing about this architecture is that the proxy object looks exactly like the original object. The interface stubs know the interface intimately and behave just like an in-context client. The component and its client do not know about marshaling or how it is implemented. This happened because the proxy and stub objects intercept the call to the component. COM marshaling merely intercepts method calls and transmits them across contexts, but as you can see, other interception code could contain code to optimize calls across the network. I will describe how you can write this kind of interception code and explain some code that Microsoft has already provided for this purpose.

Specifying Data to Transfer Using IDL

      The simplest way to generate interface proxy and stub objects is to describe the interfaces in IDL and use MIDL to generate the code for you. For an introduction to IDL arrays, I recommend "Understanding Interface Definition Language: A Developer's Survival Guide," in the August 1998 issue of MSJ. For a more complete description read ActiveX®/COM Q&A in the November 1996 issue of MSJ.
      IDL is used to describe the amount of data transferred during a call and the direction in which it is transferred (from the client to the component or vice versa). The direction is indicated by the [in] and [out] attributes, whereas the capacity (the maximum size of the array) is indicated by [size_is()] or its equivalent, [max_is()]. The actual number of data items is indicated by [length_is()]. The [size_is()] attribute tells the proxy how much data will be transferred to the stub, and the proxy uses this information to determine how large a buffer it should request from the channel and how many bytes to copy into this buffer. Sometimes the array that will be transferred may not be completely filled, so [length_is()] (or the equivalent, [last_is()]) can be used as an optimization to reduce the number of unnecessary bytes transferred from the client to the component.
      Here are some examples of how to use these attributes:

HRESULT PassLongs([in] ULONG ulNum,
                      [in, size_is(ulNum)] LONG* pArrIn);
HRESULT GetLongs([in] ULONG ulNum, 
                      [out, size_is(ulNum)] LONG* pArrOut);
HRESULT GetLongsAlloc([out] ULONG* pNum, 
                      [out, size_is(, *pNum)] LONG** ppArr);
When passing data from the client to a component, the client always allocates storage and is responsible for the deallocation of that storage. In the previous examples, the ulNum parameter is most likely an auto variable in the client code, and pArrIn is a pointer to the first element in an array of at least ulNum LONGs, which may be allocated on the stack or heap. Since the [size_is()] attribute is used, it means that the marshaler will only transfer ulNum items.
      When passing data from the component to the client, the client passes a pointer to storage where the data will be copied by the marshaler. So GetLongs can be called like this:

ULONG ulNum = 10;
LONG l[10];
hr = pArr->GetLongs(ulNum, l);
The component code may look like this:

STDMETHODIMP CArrays::GetLongs(ULONG ulNum, LONG *pArr)
   for (ULONG x = 0; x < ulNum; x++) pArr[x] = x;
   return S_OK;
As you can see, the component code assumes that the data storage is accessible through the pArr pointer. The component-side marshaler will allocate sufficient storage because the [size_is()] attribute tells it the required size.
      As I mentioned earlier, it is the client's responsibility to deallocate the storage. In this case, no extra code is needed because I have used auto variables on the stack. This technique assumes that the client knows how many items are available.
      What happens when the number of items cannot be determined by the client before requesting the data from the component? Take a look at the example shown earlier that contains GetLongsAlloc. Here, the component returns the size of the returned array via the pNum parameter. However, because this size is determined by the component method, the marshaler will not have enough information to allocate the storage before the method is called. Therefore, the component must allocate this memory. It does this by using a memory allocator that the marshaling layer knows about, CoTaskMemAlloc.

STDMETHODIMP CArrays::GetLongsAlloc(ULONG *pNum, LONG **ppArr)
   *pNum = 10;
   *ppArr = reinterpret_cast<LONG*>(CoTaskMemAlloc
                                    (*pNum * sizeof(LONG)));
   for (ULONG x = 0; x < *pNum; x++) (*ppArr)[x] = x;
   return S_OK;
      The memory is not deallocated by the component, which at first may look like a memory leak if the component and client are on different machines. This is not the case. After the marshaling code on the component side has transferred the data to the RPC, it will make the call to CoTaskMemFree to release the component-side buffer. On the client side, the marshaler will see that *pNum items have been sent and will make another call to CoTaskMemAlloc for the client-side copy of this array and copy the items into it. The client can then access these items, but it must deallocate the array with a call to CoTaskMemFree:

ULONG ulNum;
LONG* pl;
hr = pArr->GetLongsAlloc(&ulNum, &pl);
for (ULONG ul = 0; ul < ulNum; ul++) printf("%ld\n", pl[ul]);
      The number of items and a pointer to the array are returned to the client from GetLongsAlloc. This is why the address of pl is passed to the method, and it is the reason why the IDL has the strange notation of

[out, size_is(, *pNum)] LONG** ppArr
The comma in [size_is()] indicates that *pNum is the size of the array pointed to by ppArr.
      If you use any of the array attributes that I have mentioned, you must produce a proxy-stub DLL by compiling and linking the C files produced by MIDL. The ATL AppWizard produces a make file called to do this. You must make sure that your server does not register its component's interfaces as type library marshaled with the automation marshaler because automation does not recognize the array attributes.

Data Transfer with Type Library Marshaling

      What if your clients use type library marshaling? You have two options. You can use either a BSTR or a SAFEARRAY to transfer the data. A BSTR is a length-prefixed buffer of OLECHAR (each one 16 bits), but you can ask COM to create an array of 8-bit bytes instead by calling SysAllocStringByteLen:

// pass NULL for the first parameter to
// get an uninitialized buffer
BSTR bstr = SysAllocStringByteLen(NULL, 10);
LPBYTE pv = reinterpret_cast<LPBYTE>(bstr);
for (UINT i = 0; i < 10; i++) pv[i] = i * i;
      MIDL will generate marshaling code for BSTRs based on the fact that they are length prefixed. To see this in action, add a BSTR to an interface method and look at the project_p.c marshaling file generated by MIDL. You will find that the BSTR is user-marshaled using the functions BSTR_UserSize, BSTR_UserMarshal, BSTR_ UserUnmarshal, and BSTR_UserFree, which are present in OLE32.dll.
      These marshaling routines use the BSTR prefix to determine how many bytes to transmit. They do not interpret the data as a string, so the data may be binary data with embedded nulls. If the data is in a BSTR, it would seem natural to use this when writing an application in Visual Basic®. Although this is possible, Visual Basic does a lot of work for you with BSTRs, and you have to undo some of this work to get access to its data.
      For example, if you have this method:

HRESULT GetDataInBSTR([out, retval] BSTR* pBstr);
You can access the binary data in the BSTR using Visual Basic:

Dim obj As New DataTransferObject
Dim s As String
Dim a() As Byte
' get the BSTR
s = obj.GetDataInBSTR() 
' convert it to a Byte array
a = s                    
' now do something with the data
For x = LBound(a) To UBound(a)
   Debug.Print a(x)
      It takes about as much code to do the same thing in C++ with ATL, which is usually not the case for COM code.

CComPtr<IMyData> pObj;
CComBSTR bstr;
// get the BSTR
pObj-> GetDataInBSTR(&bstr);
// get the number of bytes in the BSTR
UINT ui = SysStringByteLen(bstr.m_str);
LPBYTE pv = reinterpret_cast<LPBYTE>(bstr.m_str);
// do something with them
for (UINT idx = 0; idx < ui; idx++)
   printf("array[%d]=%d\n", idx, pv[idx]);
      Another problem with putting binary data in a BSTR is that most wrapper classes assume that the data is a Unicode string. I explicitly call SysStringByteLen to get the number of bytes in the BSTR because CComBSTR::Length will return the number of Unicode characters in the BSTR.
      Another way to pass data is through a Visual Basic SAFEARRAY (for details about SAFEARRAYs see the OLE Q&A column in the June 1996 issue of MSJ). SAFEARRAYS are self-describing; they contain a description of the type of the items in the array, as well as the number of dimensions and the size of each dimension. The combination of these pieces of information allows the marshaler to know exactly how many bytes should be transmitted. An added benefit of this technique is that if the SAFEARRAY contains VARIANTs, then the data will be readable by scripting clients. However, you will have to justify the overhead of 16 bytes for each VARIANT item to hold a single BYTE of data.

Transferring Data with Stream Objects

      The final method of transferring data that I want to mention is the use of stream objects. IStream pointers can be marshaled by type library marshaling, and can be accessed by C++ clients. However, they are not directly accessible through Visual Basic code. (Persistable objects in Visual Basic do support IPersistStream and IPersistStreamInit, but do not give direct access to IStream.) The IStream interface effectively gives access to an unstructured buffer of bytes. The code that writes data to the stream and the code that reads the data must understand the format of the data put in the stream, as shown in Figure 2.
      The advantage of using a stream to transfer data is that all machines running Win32®-based operating systems will have stream marshaling code. However, as Figure 2 shows, you do not get direct access to the data in the stream through the IStream interface. If the stream holds many data items, this will result in many calls to the stream to access the data.

Improving Data Transfer Performance

      Now that I have explained the various ways to transfer data, let's take a more detailed look at performance issues. Distributed applications are great from the programmer's perspective because they allow you to utilize the data and component functionality available on many machines across the network. Windows DNA provides the platform and tools to access these distributed components. However, from a performance perspective distribution really stinks. It can take four orders of magnitude as long to make a call across a machine boundary than it does to make an in-context call (see the ActiveX/COM Q&A column in the May 1997 issue of MSJ for more details). For best performance you should keep the number of network calls to a minimum, and avoid them completely whenever possible.
      It is not always possible to avoid network calls, and in some cases you may be making network calls when you don't even know it. This can happen with distributed transactions in Microsoft Transaction Services (MTS). MTS allows you to create a transaction on one machine and enlist resource managers on other machines into the same transaction. This is possible because the context object for an MTS component holds information about the component's transaction requirements (which are persisted in the MTS catalog) and details about any existing transaction that the component is using. When such an MTS component uses a resource manager through an inproc resource dispenser, MTS checks the context object and if a transaction exists, MTS tells the resource dispenser to enlist the resource manager in the transaction. If your transactional MTS component accesses another MTS component that has the Required transaction attribute, then the transaction will be exported to the new component.
      MTS works over normal DCOM, so there are separate packets of data passed over the network to make the component activation requests and method calls as well as the Microsoft Distributed Transaction Coordinator messages to maintain the transaction. As a result, you can often make a significant improvement to your MTS application's performance by keeping transactions local and avoiding distributed transactions altogether.
      One of the unsung improvements in COM+ is that it streamlines the use of distributed transactions by hijacking the DCOM packets that are used to access remote COM+ components. This makes COM+ a far better platform for applications that require distributed transactions. However, because COM+ uses DCOM packets to transfer the transaction ID and MTS doesn't, the two do not interoperate. As a result, you cannot use MTS-based components and COM+ components in the same transaction.
      Even with this optimization, if a resource manager is involved with a transaction created on and coordinated by another machine, there will always be extra network calls to perform the two-phase commit. Therefore, it makes sense to keep your transactions local whenever possible.
      If you must access a component on a remote machine, first determine whether the transaction must be created on the local machine and passed to the remote component. If not, remove transaction support from your local COM+ component.
      Resource managers are typically data sources like SQL Server™. In general, components should be as close as possible to the data that they will use, so usually the middle tier is on the same machine as the data source it uses. If this is not possible, consider using a stored procedure to manipulate the data in the data source. This way the transaction can be created in the stored procedure and kept local to the machine that uses it.

Buffer Sizes and Cross-machine Calls

      Keeping the number of network calls small is important, but keeping the buffer size as large as possible is equally important. This is partly just common sense. The RPC and DCOM header information in a DCOM packet accounts for about 250 bytes or so, and if you increase the size of the buffers passed in each network call, you can ensure that most of the DCOM packet will consist of data rather than the protocol's overhead. Of course, if your buffers are large, it will almost certainly mean that you have aggregated data into one call that would otherwise have to be sent in multiple network calls.
Figure 3 Transmission Time versus Size
Figure 3 Transmission Time versus Size

      In Figure 3 I have plotted the results of my tests to show how the transmission time of a data buffer varies with the size of the buffer. I've used various common methods of passing the data, which are described in Figure 4. The measurements were taken for transferring the data between two machines running Windows 2000 on a quiet network. I was careful to include the time taken to clean up any buffers used by the client when the data was released. The absolute values are not significant as you'll find different values for your network and machine, but what is important are the trends. As you can see, the lines nearly converge after the buffer size reaches 8KB. In other words, beyond that point the efficiency of the data transmission is the same regardless of the size of buffer. Below this value, the data transmission efficiency is significantly reduced as the buffer size decreases.
      The other striking discovery is that except when transferring data via a stream object (which consistently takes longer than the other methods), the transfer rates are effectively the same. This indicates to me that Windows 2000 must be using similarâ€"if not the sameâ€"marshaling code to transfer BSTRs, SAFEARRAYs, and conformant arrays. This is good news for programmers partial to Visual Basic. It means that they don't have to be left out of the marshaling game just because automation marshaling does not allow them to pass data using a conformant array. Now they too can pass large buffers efficiently between machines.

Marshaling Objects

      So how should you transfer data between processes in a distributed application? As I've mentioned, the most important issue is to design your interfaces to pass large data buffers in a few network calls rather than making a large number of calls passing small amounts of data.
      The type of property access you're used to in Visual Basic is possibly the worst thing to do in a distributed application.

Dim day As New Day
day.Day = 8
day.Month = 9
day.Year = 2000
Debug.Print day.DayName
If the Day object resides in another context, then each call to the object will involve marshaling. In this example, four calls are made to the object. They could easily be reduced to one call by replacing the properties with a simple method:

Dim day As New Day
Debug.Print day.GetDayName(8, 9, 2000)
      Accessing the object in this way is familiar to MTS and COM+ developers. This type of component is often called stateless because the state of the component is passed in the method parameters. MTS and COM+ transactional components are accessed this way to keep the transaction isolated; the component is activated just to execute GetDayName.
      Wherever possible you should pass data by value and not by reference. Objects are wonderful because they make code easier to read. A drawback of COM components (as far as distributed data transfer is concerned) is that they are always passed by reference. Thus, when you create a component on a remote machine, it will always live on that particular machine and all access to it will be via a marshaled interface pointer. Thus method calls to the component will always involve a network call.
      When you design your object model, you should avoid passing data using components. For example, the following Visual Basic code is a bad idea:

Dim person As New Person 
' if this is in-context then property access is OK
person.ForeName = "Richard"
person.SurName = "Grimes"

Dim customers As CustomerList
Set customers = CreateObject("CustomerSvr.CustomerList", _
customers.Add person 
In this case I am assuming that the object named person is created in-context so that I can call it using property access. This object is then passed to a remote object: customers. The code is readable and logical. You are adding a new person to a list of customers, so you create a new instance of the Person class and add it to an instance of the CustomerList class. However, this code is very bad for a distributed application because the person object is not passed to the customers object directly, but by reference. This means that the customers object must make network calls to get the data from the person object. In this simple example it would have been far better if the CustomerList class had a method that could be passed the customer's name rather than using an additional object.
      Of course, real-world code is rarely as simple as this. Passing objects has true advantages, especially if the object has many data members. Have you ever called a method with 10 parameters and gotten back E_INVALIDARG, then spent tons of time trying to find out exactly which parameter was invalid and why? This situation can be avoided if you pass the data as properties to an in-context object. Then the object can perform validation as each property is changed, which allows the object to return a meaningful error code if the property is invalid. To get the benefits of passing data via an objectâ€"but without the inefficiency of cross-context accessâ€"implement the object so it is marshaled by value.

Marshaling by Value

      Marshaling by value has been discussed before in MSJ, but I will give a brief overview because I want to talk about marshaling in more depth later on. (For a good starting point see House of COM in the March 1999 issue of MSJ.) If a component wants to have a say in the marshaling mechanism, it should implement IMarshal. When COM creates a component, it will always query for this interface. If the component does not implement IMarshal, it means that it is happy with standard marshaling. If the component implements IMarshal, then COM will call its methods to get the CLSID of the proxy object used in the client context, as well as to obtain the blob of data that contains information that will be passed to the proxy to allow it to connect to the object.
      In marshal-by-value, a component indicates that it should always be accessed in-context. This is achieved by persuading COM to create a clone of the component in the client context. To do this the component must be able to serialize its state and initialize a copy of itself from this serialized state. When COM marshals the component's interface, it asks for the CLSID of the proxy object. The component can then return its own CLSID to force COM to create an uninitialized version of the component in the client context. When COM asks for the component to provide marshaling information with a call to IMarshal::MarshalInterface, the component should serialize its state to the marshaled packet. COM then passes this packet to the proxy object (the uninitialized instance of the component in the client context), which can then extract the component state information and use this to initialize the clone. The marshal-by-value mechanism basically freeze-dries the object, copies it to the client context, and then rehydrates the component there. The connection to the out-of-context object is no longer needed because the proxy is an in-context version and all COM calls are serviced by it.
      Marshal-by-value is used more often that you may realize. ActiveX Data Objects disconnected recordsets are one well-known example of marshal-by-value. Standard error objects (created through CreateErrorInfo and accessed through GetErrorInfo) are also marshaled by value so that when your client code accesses the error object to get information about the error, the call will not involve marshaling. Note, however, that the extended error objects used by OLE DB are not marshaled by value. Instead, they generate the error description at the time the client calls IErrorInfo::GetDescription using an additional object, called a lookup object, that runs in the context of the object that generated the error. This requires a marshaled call.
      You should note that marshal-by-value components impose one restriction. If the connection to the out-of-context component is lost, the proxy cannot write values to that component, and the proxy that the client receives will be read-only.

Handler Marshaling

      Handler marshaling is described in the COM specification as being the middle ground between standard and custom marshaling. That is, the developer hooks into the standard marshaling mechanism to provide extra code, but essentially keeps the ar-chitecture intact.
      Handler marshaling is not new. It first appeared as part of OLE 2, where it was used for embedded objects in compound documents. One of the problems of OLE 2 was that when you had more than one OLE server loaded, the whole system would grind to a halt because of the amount of memory consumed. Inproc handlers alleviated this problem because they could implement some of the object's interface ��methods (for example, rendering) that could be performed by inproc code. If the client requested an action that the handler could not perform, then the handler could load the server to get it to do the work.
      One form of handler marshaling can be implemented on versions of Windows before Windows 2000. The component can implement IMarshal to indicate that a custom proxy object, called a handler, should be used in place of the standard marshaling object. When COM asks the component for a marshal packet by calling IMarshal::MarshalInterface, it obtains a standard marshal packet by calling CoGetStandardMarshal. This means that the object's interfaces will be marshaled using standard marshaling, so the developer does not have to worry about writing interprocess communication code. The main reason for the component to implement IMarshal is so that it can use GetUnmarshalClass to return the CLSID of the handler object. However, the component and handler can take advantage of the fact that IMarshal is being used and can append extra initialization data to the marshal packet.
      Since the component's interfaces use standard marshaling, the handler can have access to the out-of-context object, but it can also handle some of the object's interface methods locally. Therefore, an enumerator could implement the Next method to return values from a cache, and replenish this cache using calls to the actual object requesting a large number of items. However, if the sole purpose of implementing IMarshal is to indicate the CLSID of the custom proxy that will be used, it may be unnecessary to implement all of the methods of IMarshal in the object.
      COM provides an alternative in which the object need not implement IMarshal. Instead, it implements an interface called IStdMarshalInfo as shown in the following code, where a single method called GetClassForHandler is the equivalent of GetUnmarshalClass.

 [   local, object,
    uuid(00000018-0000-0000-C000-000000000046) ]
interface IStdMarshalInfo : IUnknown
    HRESULT GetClassForHandler([in] DWORD dwDestContext,
        [in, unique] void *pvDestContext, [out] CLSID *pClsid);
COM will look for this CLSID under the CLSID registry key, where it expects to find an InProcHandler32 key with the path to the server that implements the handler.
      Handler marshaling in Windows 2000 allows you to hook into the marshaling process on the client side. You can use this to restrict the number of calls to the component by allowing the handler to judge whether a marshaled call is necessary. The handler should implement the interfaces of the component that it allows the client to call. If the client queries for an interface that is not implemented by the handler, then the call will fail.
Figure 5 Handler Marshaling Architecture
Figure 5 Handler Marshaling Architecture

      Figure 5 shows the client-side architecture. As you can see, the handler is aggregated by a client-side identity object that implements IUnknown. The handler can choose to implement an interface in its entirety or it may decide to delegate the client call to the actual object. In the latter case, the handler should obtain a pointer to the proxy manager and use that pointer to get access to the object's interfaces. To do this, the handler calls:

HRESULT CoGetStdMarshalEx(IUnknown* pUnkOuter,DWORD dwSMEXFlags,
                          IUnknown** ppUnkInner);
      The first parameter is the controlling IUnknown of the handlerâ€"the identity object. The second parameter is a flag that is used to specify whether the proxy manager or server-side standard marshaler is required; a handler passes a value of SMEXF_HANDLER. If the call is successful, a pointer to the proxy manager is returned in the final parameter. The handler can then query this pointer for the interface that it requires, and it will be returned a pointer to a standard interface proxy. Since this is a hook into standard marshaling, the interfaces can be custom or dual interfaces.
       Figure 6 shows a handler for an interface that gives access to an array of strings which are the names of the files in a folder. This code comes from the FileEnum example that can be downloaded from the link at the top of this article.
Figure 7 Objects Used in FileEnum
Figure 7 Objects Used in FileEnum

Figure 7 shows the objects used in this example. The client context is implemented with the following code:

Interface IFiles2 : IDispatch
    HRESULT GetNextFile ([ out, retval] BSTR: pData);
while the server context corresponds to this snippet:

interface IFiles : IUnknown
    HRESULT GetNextFiles ([ in] ULONG count, 
        [out, size_is(count), length_is(*pFetched)]
        BSTR* pData, {out} ULONG* pFetched);
Notice that the handler and the component implement two different interfaces. The handler implements IFiles2, which has a single method called GetNextFile. This returns the next file name in the list of file names that the component maintains for a specified folder. The component implements the IFiles interface which has been optimized for the network, and allows many file names to be obtained through the GetNextFiles method. IFiles is marshaled with a proxy-stub DLL because it uses [size_is()] and [length_is()]. IFiles2 is accessed in-context, therefore it is not marshaled.
      IFiles::GetNextFile works by maintaining a cache locally, and when this cache is empty it calls through to the Files object to get BUF_SIZE number of items. One irritating feature of this scheme is that the handler is created in the client context but isn't initialized. So once the client has activated the handler in its context, an out-of-context call must be made on the first client access.
      A more efficient scheme would be to pass some initialization values to the handler. Handler marshaling in Windows 2000 allows you to do this, but both the object and the handler must implement IMarshal. The object must provide an implementation of all methods except IMarshal::UnmarshalInterface because this is the only method that the handler must implement. The object can use IMarshal::MarshalInterface to get access to the marshal packet and insert its own data, similar to marshaling by value, with the size of this data specified when COM calls IMarshal::GetMarshalSizeMax. But how does the object get access to the marshal packet? Again, this requires a call to CoGetStdMarshalEx:

CComPtr<IMarshal> m_pMarshal;
CComPtr<IUnknown> m_pUnk;

HRESULT FinalConstruct()
   HRESULT hr;
   hr = CoGetStdMarshalEx(GetUnknown(), SMEXF_SERVER, &m_pUnk);
   if (FAILED(hr)) return hr;
   hr = m_pUnk->QueryInterface(&m_pMarshal);
   if (SUCCEEDED(hr)) Release();
   return hr;
This code passes the object's IUnknown interface as the controlling unknown to CoGetStdMarshalEx and passes SMEXF_ SERVER as the dwSMEXFlags parameter. This standard marshaler object will AddRef this pointer. Since this represents an excessive reference, the calling code calls Release to take this into account. Next, the code queries for IMarshal. Notice that both the IMarshal and IUnknown pointers have to be cached. If you release the IUnknown pointer at the end of FinalConstruct, the IMarshal interface will become invalid.
      After this, the IMarshal pointer can be used to implement IMarshal on the object, as shown in Figure 8. Here I assume that the data you want to marshal to the handler is in a buffer called ExtraData, which is DATA_SIZE in bytes. Notice that GetUnmarshalClass is implemented by the standard marshaler. This means that whatever interface marshaler the standard marshaler thinks is used for marshaling will be used for cross-context calls. Your interfaces can be marshaled in any way, including type library marshaling, so your clients can be scripting clients.
      On the client side, at the minimum your code should implement just IMarshal::UnmarshalInterface, as shown in Figure 9. The other methods will not be called unless an attempt is made to marshal the proxy pointer to another context. (To handle this situation, just delegate these methods to the standard marshaler. Since the handler has already been specified, the standard marshaler will load it in the new context.)
      The aggregated standard marshaler (returned from CoGetStdMarshalEx) is available only on Windows 2000, so the handler will not run on any other operating system. However, if your object implements IStdMarshalInfo, the information about the handler will be passed back to the client machine even if it is not running Windows 2000, but it will result in a failure code. Since you cannot turn off handler marshaling, both your clients and servers have to run on Windows 2000.

Passing Data with Pipes

      Imagine the case where you have megabytes or even gigabytes of data to transfer. Your data packets will be much larger than 8KB so you won't have to worry about inefficient calls to the network, but there are other issues to keep in mind. Consider making a call and processing the results. First, the client calls the component and asks for data to be returned. The component will have to obtain that data from somewhere and copy it into the buffer that RPC transfers. RPC transfers the data across the network and copies it into a buffer in the client context. Once it has been copied into the buffer, the client can access the data. During this time, the client thread will be blocked.
      At this point the client thread can process the data, but remember it's a huge amount of data, so this will take a long time. During this processing time the component is effectively idleâ€"as far as the client is concerned. Clearly, it takes a long time to generate the data and transfer it, while the client waits around.
      COM pipes were developed to reduce this waiting time. The idea behind them is that the data buffer to be transferred should be split into chunks and transferred one after the other down the pipe. Instead of the client waiting a long time to get the entire buffer, it just waits a shorter time for the smaller chunk to arrive. Once the client gets the buffer, it can start to process it. COM now requests another chunck of data to be sent from the component even though the client hasn't yet requested it. This process of requesting a chunk of data while another is being processed is called read-ahead.
      If you get the balance right, the time taken to process a chunk will be the same as the time taken to generate and transfer another chunk. This means the client will get immediate access to the next chunk of data without any waiting time. Of course, this balance is not easy to attain, but the savings can be significant.
      Pipes are not a new technology; Microsoft RPC has supported them for a while. The difference is that in RPC you had to define the data that would be transferred via the pipe, and because RPC is not object based, you had to deal with context handles. The Windows 2000 Platform SDK defines three pipe interfaces: IPipeByte, IPipeLong, and IPipeDouble (see Figure 10). Every machine running Windows 2000 has the marshalers for each. These interfaces differ only in the type of data that they transfer.
      Each pipe interface has two methods: Push and Pull, which means that COM pipes are bidirectional. Once one executable has a pipe from another, it can both receive (pull) and transmit (push) data. Indeed, it can do both at the same time! Notice that these interfaces are declared with the async_iid attribute so you can call them synchronously or asynchronously (non-blocking). I will come back to this issue in a moment.
      When you use pipes, the first decision you have to make is which part of your application will implement the pipe code, the client or the component. Consider these two methods:

HRESULT ProvidePipe([in] IPipeByte* pPipe);
HRESULT GetPipe([out] IPipeByte** ppByte);
The first method is designed to be called by a client that has an implementation of the IPipeByte interface. It creates an instance of this and passes it to the component, which can then initiate the calls to pull or push data. With the second method, the component gives access to an implementation of the pipe in the server context, in which case it is the client, not the component, that initiates the pull or push operation.
      Pulling and pushing data is very straightforward. Your code does not have to be concerned with the read-ahead feature because this is carried out by the pipe marshaler provided by Windows 2000. However, there is one issue you need to address: how does COM know that there is enough data available to perform read-ahead? A pull operation means that the pulling code must repeatedly call IPipeXXX::Pull, and while it is processing each buffer COM will call the component to get the next buffer. Clearly COM must be told when the data is exhausted so that it should not do any more read-ahead. To do this, the pipe implementation must return 0 in the pcReturned parameter. As a result, there will always be one more network call than is necessary when using Pull.
       Figure 11 shows a simple pipe implementation for transferring text files via a pipe. This component just implements the Pull method, which is called repeatedly until it indicates that zero bytes were returned. The pipe will return this value when there is no more data in the file. This pipe can be returned by the GetFileData method shown in Figure 12.
      The Push method is just as simple. Push is called repeatedly until it has no more data to send. However, so that COM knows that the data is exhausted and that it cannot perform read-ahead, the data pusher must send zero bytes, as shown in Figure 13.
      The actual data transfer is carried out by the pipe, so it is important to make sure that the pipe is notified when it is no longer needed. If you forget to call Push and pass zero bytes, or implement the Pull so that it returns zero bytes when the transfer is complete, the pipe will still be active and COM will still have a reference on it. You will see this if you try to shut down the apartment that has the pipe proxyâ€"the call to CoUninitialize will hang until COM times out the call (if the call goes across machine boundaries).
      What about the size of the buffer transferred via each call to the pipe? You have two criteria to take into account. The first is the efficiency of the network. You should perform basic timing tests on the target network under typical conditions to see what packet size is most efficient. (My network, as shown in the results given in Figure 3, is efficient for 8KB data packets or larger.) The other criteria is the processing that is performed on the data by the code receiving the data. The ideal case is when the processing of each buffer takes the same amount of time as it takes to generate and transfer the buffer. In that case when one buffer has been processed, COM will have received the next buffer and it will be available for processing.
      The only way you can determine the best buffer size is to test the code on the target network. Figure 14 shows a simple class and code to perform such testing, but remember Heisenberg's uncertainty principleâ€"the measurement on a system will affect the system (in this case, the timings will include the time used to take the timings). But from this class you can get an idea of the maximum time that the data processing should take. You should test the data transfer for various sizes of buffers.
      The next step in the test is to use static data in the client (in other words, do not transfer any data) to test how long the data processing takes. Again, run this test using various sizes of buffers. Finally, compare the two sets of figures and choose the buffer size that best matches the processing time to data transfer time. The download for this article includes a project that allows you to read and write file contents over pipes.

Asynchronous Pipes

      What about the asynchronous versions of the pipe interfaces? Although pipe read-ahead allows you to synchronize the data transfer and processing, it's possible that the client thread will be blocked while a data buffer is being transferred. To get around this, you can call the pipe using the non-blocking version of the pipe interface. The pipe implementor can utilize the non-blocking mechanism to implement the pipe using a custom thread pool rather than the RPC thread pool to run the pipe code. This allows the pipe implementor to manage threads more efficiently.
      When pulling data via the non-blocking versions of the pipe interfaces, the caller thread initiates the transfer by calling the Begin_Pull method, indicating how many items are required. It can then perform some other processing and return later to obtain the data (and the number of items returned) by calling the Finish_Pull method. During this time COM will have received the data and cached it ready for collection. When Finish_Pull is called, COM will perform the read-ahead to get the next buffer. Take a look at the non-blocking version of this method:

HRESULT Begin_Pull([in] ULONG cRequest);
HRESULT Finish_Pull([out, size_is(*pcReturned)] BYTE* buf, 
                  [out] ULONG* pcReturned);
      This is pseudo-IDL because the actual methods are generated by MIDL. On initial inspection it appears odd because when you call Finish_Pull you pass the buffer you want filled, after the actual transfer has been performed. Presumably COM will have read the data into some private buffer, and when you call Finish_Pull, it copies the data from this buffer into your buffer.
      The non-blocking version of Push is useful for sending data to another process without the current thread blocking, which is quite useful, especially if the amount of data is large. Since there are no [out] parameters on Push, you should call Push_Finish to allow COM to clean up any resources it may have used and to determine if the push was successful. (The return value from Push_Begin only indicates that the method call was accepted by COM.)
      To use pipes you must have the headers and libraries from the most recent Platform SDK, and you must define _WIN32_WINNT to have a value of 0x500 in your stdafx.h.


      Data transfer over COM requires careful thought about the best method for moving the data across the wire. In general, you should make as few network calls as possible. When you do make a call, you should make the transmitted data buffers as large as you can, and always avoid the kind of property access used in Visual Basic.
      To further facilitate data transfer, COM gives you several tools. First, to avoid the problems with method calls that have many parameters, you can pass the data in an object as long as the object is marshaled by value. This allows you to combine the benefits of validation that an object provides, with the efficiency of network calls when data is passed by value. Next, Windows 2000 allows you to create lightweight client handlers that can make smart decisions about whether to make calls to an out-of-context component. Such a handler can cache results and make buffered reads and writes.
      Finally, Windows 2000 provides pipe interfaces that allow you to transfer large amounts of data over the network efficiently. This works by splitting up your data into sizable chunks, allowing COM to handle the transfer of these chunks over the pipe.

For related articles see:

For background information see:

Richard Grimes is the author of several books on COM and ATL for WROX Press, including Professional ATL COM Programming and Professional Visual C++ 6 MTS Programming. Richard can be contacted at

From the September 2000 issue of MSDN Magazine.

Page view tracker