Export (0) Print
Expand All

BizTalk Messaging: Building BizTalk Server Custom Parsers and Serializers

Joseph Fultz
Microsoft Corporation

February 2001

Summary: BizTalk Server 2000 includes interfaces that allow developers to add proprietary parsers and serializers for specialize incoming and outgoing file formats. This article discusses and defines custom BizTalk Server 2000 parsers and serializers, the purposes they serve, and the interfaces required to implement them. It also walks you through a simple parser and serializer pair implementation. (50 printed pages)

Download BTSParserSerialCode.exe.

Contents

Introduction
Prerequisites
Setting Up the Test Harness
Implementing a BizTalk Custom Parser
Implementing a BizTalk Server Serializer
Overall Summary
Appendix: Code Snippets

Introduction

Microsoft® BizTalk™ Server 2000 is one of Microsoft's .NET Servers. It is a powerful tool for handling the trading of documents between companies, organizations, and applications. It has state-of-the-art features for handling transactions, workflow, document transformation, and delivery—all of which are based on industry standards such as HTTP, HTTPS, XML, and XSL.

While the out-of-the-box features and tools of BizTalk Server will meet the needs of most users, there will be times when custom work is needed to handle specialized incoming and outgoing file formats. For this reason, BizTalk Server 2000 has been created with interfaces to allow developers to add their own proprietary parsers and serializers. This document explores and describes what custom BizTalk Server 2000 parsers and serializers are, what purposes they serve, the interfaces required to implement them, and walks through an implementation of a simple parser and serializer pair.

This white paper is targeted at those who have at least a working familiarity with BizTalk Server 2000, COM, and Microsoft Visual C++®. Throughout the paper, terminology associated with BizTalk Server 2000—channels, ports, receipts, interchanges, and so on—will be used freely. If you need clarification on any of the topics covered, there are articles online at http://www.microsoft.com/biztalk, or you can review the BizTalk Server documentation.

BizTalk Server 2000 includes parsers for flat file formats, XML, and EDI that when combined with document specifications defined within BizTalk Editor are very powerful and flexible and will usually meet your needs. However, there are various reasons that implementing a custom parser and serializer pair might be important. Some examples of this would be receiving and parsing a document that contains binary data that must be submitted to BizTalk Server 2000, the desire to have custom receipt correlation, accepting interchanges (files) that contain multiple documents, or simply to handle custom document types that do not lend themselves to XML-Data Reduced (XDR) representations, for example, RosettaNet objects (http://www.rosettanet.org). The case of the interchange with multiple documents is a very tangible example. Imagine that you want to enable partners to submit to you a document that represents a batch of documents. This is a very common practice in the health care industry for processing billing. So, each of the documents in a single file can be a valid XML instance of the document, but the included XML parser will not be able to parse it because the document as a whole has multiple root nodes and is not a valid instance of the document, nor is it a valid XML instance at all. For example, a valid instance might have the following format:

<docroot>
<header>
<field1/>
<field2/>
</header>
<body>
<field3/>
<field4/>
</body>
</docroot>

However, the partners from whom you receive interchanges that represent a batch would send a file containing:

<docroot>
<header>
<field1/>
<field2/>
</header>
<body>
<field3/>
<field4/>
</body>
</docroot>

<docroot>
<header>
<field1/>
<field2/>
</header>
<body>
<field3/>
<field4/>
</body>
</docroot>

…n

While you could simply implement a custom BizTalk Server preprocessor, the most you could do with the preprocessor is wrap the documents in a parent XML structure or reorder the file completely. However, doing either one of these will break the concept of groups and multiple documents per group by making the input interchange appear to be one document. Thus, you want to implement a custom parser to handle the parsing of this document. In this case, it would be as simple as breaking up the stream into its separate documents and passing on the data. The parsers that ship with BizTalk Server recognize groups and multiple documents per group.

This document will review interface and method implementations as they relate to BizTalk Server parsers and serializers. While addressing these topics, this document will explore other BizTalk Server 2000 topics, as they are relevant to the sections on the component and interface implementations. The sample implementation will be based upon passing a simple equation into BizTalk Server 2000, parsing that equation, solving the equation, and serializing the response in a format similar to the incoming document. The example that we are using was chosen because it enables you to easily review and compare the inbound interchange contents pre- and post-parser and serializer execution. A simple data example is chosen, because complex XML and file structures are not the focus of this document.

The inbound and outbound documents will be persisted commerce dictionaries. Commerce dictionaries are objects originally introduced into the platform within Microsoft Site Server and are very similar to the dictionary objects provided by the scripting library (Scripting.Dictionary). The dictionary objects that we will be using are for storing in memory name and value pairs as defined by the user (developer). Commerce dictionaries differ from their scripting library counterparts in several ways, one of which is that commerce dictionaries are free threaded. Additionally, commerce dictionaries support persistence of the entire dictionary object into XML through the IPersistXML interface. It is this interface that we will use to create the object (dictionary) representation of our incoming dictionary-persisted XML and to persist the resulting data dictionary into an XML format.

For the sample implementation, the flow of the document will be as follows:

  1. Receive XML persisted dictionary.
  2. BizTalk Server passes data to custom parser.
  3. Parser recreates dictionary object.
  4. Parser creates XML instance from dictionary.
  5. Equation solved in the channel by a BizTalk Server map.
  6. Serializer creates a dictionary object from the outbound XML.
  7. Serializer creates outbound XML by calling the dictionary's IPersistXML interface.
  8. BizTalk Server saves final instance of outbound document.
Note   The code samples used do not have exception handling, are not optimized, and are not factored. Thus, they are not intended for production systems.

Prerequisites

To implement the examples set forth in this document, you need to have BizTalk Server 2000 and all of its prerequisites installed. Additionally, you must have Microsoft Visual Studio® 6 (SP4) installed.

Some files that are needed specifically for BizTalk Server 2000 development are as follows:

  • bts_sdk_guids.h
  • btsparsercomps.h
  • btsserializercomps.h
  • commerce.h

All of these files can be found in the /SDK/Include folder where BizTalk Server 2000 is installed. For example, on my computer the path is C:\Program Files\Microsoft BizTalk Server\SDK\Include\.

Additionally, I made use of a couple of files, computil.h and computil.cpp, from Microsoft Commerce Server 2000 (they were formerly shipped with Site Server 3.0 Commerce Edition). While it is not necessary to use these files, they contain some functions for setting and retrieving dictionary elements that you would probably write yourself to minimize having to repeat the same few lines of code every time that you want to set or retrieve values from the dictionary object with which you are working.

For the purpose of this example, I have also created a couple of functions for use in the sample code and placed them into misc.h and misc.cpp files. You can find those functions in the appendix. Now that the prerequisites are out of the way, we can set up the test harness, which will help us to validate that everything is working as it should before we introduce custom code.

Setting Up the Test Harness

At this point I am sure that you are ready to create the custom parser and serializer, but first we need to put in place the BizTalk Server artifacts that are needed to test our example parser and serializer. To test this, we need to set up some type of harness for the application. This will include a channel, a receive function, a port, a document definition, and a map.

Once BizTalk Server has been set up with our test harness, we will validate the harness by passing in a compliant XML equation document, solve it through the map inside the channel, and write the outbound document. Having validated that BizTalk Messaging is set up appropriately to handle the equation XML instance, we can move forward to implementing the parser.

So, before we even begin with the code, let us set up BizTalk Messaging Services so that we can test our parser and serializer. Of the items that we need to set up, the first thing is to decide on and define the document that we will be using in our sample.

Creating the Document Specification

I am using a standard equation document that I use variances of in testing various XML-based application prototypes. The following schema defines the document that we are going to use:

<?xml version="1.0"?>
<!-- Generated by using BizTalk Editor on Tue, Dec 05 2000 01:32:56 PM -->
<!-- Microsoft Corporation (c) 2000 (http://www.microsoft.com) -->
<Schema name="Equation" b:BizTalkServerEditorTool_Version="1.0" b:root_
reference="Equation" b:standard="XML" xmlns="urn:schemas-microsoft-
      com:xml-data" xmlns:b="urn:schemas-microsoft-com:BizTalkServer" 
      xmlns:d="urn:schemas-microsoft-com:datatypes">
<b:SelectionFields/>

<ElementType name="Result" content="textOnly" model="closed">
<b:RecordInfo/>
</ElementType><ElementType name="Operator" content="textOnly" model="closed">
<b:RecordInfo/>
</ElementType><ElementType name="Operand2" content="textOnly" model="closed">
<b:RecordInfo/>
</ElementType><ElementType name="Operand1" content="textOnly" model="closed">
<b:RecordInfo/>
</ElementType><ElementType name="Equation" content="eltOnly" model="closed">
<b:RecordInfo/>
<element type="Operand1" maxOccurs="1" minOccurs="0"/>
<element type="Operand2" maxOccurs="1" minOccurs="0"/>
<element type="Operator" maxOccurs="1" minOccurs="0"/>
<element type="Result" maxOccurs="1" minOccurs="0"/>
</ElementType></Schema>

An instance of the above schema would look like this:

<Equation>
   <Operand1>43</Operand1>
   <Operand2>26</Operand2>
   <Operator>*</Operator>
   <Result></Result>
</Equation>

There is purposefully nothing complex about this data or its representation so that we can focus on the details of implementing parsers and serializers. Once you have created the specification as a file, save it to your repository and create a document definition using BizTalk Messaging Manager. Name the definition SimpleEquationXML. We will need to map the inbound equation to an outbound equation and, through the map, solve the equation.

Creating the Map

Open BizTalk Editor and use SimpleEquationXML as both your source and destination document. Map Operand1, Operand2, and Operator one-to-one.

Ee265734.buildbiztalkserverps01(en-US,BTS.10).gif

Figure 1. BizTalk Mapper

Next you will need to use the Scripting functoid to generate the result. Place the Scripting functoid onto the mapping surface and connect Operand1, Operand2, and Operator into it in the given order. Open the Scripting functoid and ensure that the parameters are in the correct order. Click the Script tab and place the following script in it:

Function SolveEquation(oper1, oper2, operator)
   Dim result

   oper1 = cdbl(oper1)
   oper2 = cdbl(oper2)

   Select Case cstr(operator)
      Case "*"
         result = oper1 * oper2
      Case "+"
         result = oper1 + oper2
      Case "-"
         result = oper1 - oper2
      Case "/"
         if oper2 = 0 then
            result = 0
         else
            result = oper1/oper2
         end if
   end select

  SolveEquation = result
End Function

Save and test your map. Save the map as map_SolveSimpleEquation. Be sure to put it into the repository. Now that the document and the map are defined, BizTalk Messaging must be configured to receive and process the instances of the document.

Setting Up for Document Processing

You need to create a place for incoming and outgoing documents. I used C:\input and C:\output on my computer. Now let's set up the port. Open BizTalk Messaging Manager and create a new port to an organization. I called mine port_TestParser. For this test I used a couple of organizations that I have defined for testing on my computer. Define the primary transport of File with a location of C:\output\solved_%tracking_id%_.xml. Accept the defaults for everything else.

When prompted, create a new channel named cha_TestParser, set the source organization to open, use our previously defined document definition (SimpleEquationXML) as both the incoming and outgoing document, use the previously defined map as the document map, and take the defaults for everything else.

Lastly, set up a receive function in the BizTalk Server Administration snap-in. Create a new File receive function named recv_SimpleEquation and point it to your receive location for types of *.xml. On the advanced settings, set the channel to cha_TestParser.

At this point you should be able to create a sample XML instance from the editor. Modify it and run the document through the channel by using the File receive function. You can use this instance of the document if you want:

<Equation>
   <Operand1>43</Operand1> 
   <Operand2>26</Operand2> 
   <Operator>*</Operator> 
   <Result></Result> 
</Equation>

The output file should look like this:

<Equation>
   <Operand1>43</Operand1> 
   <Operand2>26</Operand2> 
   <Operator>*</Operator> 
   <Result>1118</Result> 
</Equation>

At this point we have not done anything with the custom parser and serializer. Instead, we have set up and validated our test harness, thus ensuring that our configuration will produce expected results before we introduce the parser and serializer into the environment. It is a good practice when working with BizTalk Server to ensure that you can successfully pass through the channel first. Subsequently, it is good to ensure that you have channel pass-through and appropriate mapping of the document types—which is what we just accomplished through our test. Make sure that you are able to generate this type of output through your BizTalk Server configuration before you continue.

Now that the pieces of BizTalk Messaging are set up, we will move to implementing the parser and serializer pair to work with XML generated from persisting a dictionary object. The following XML is an instance of that dictionary XML. You will want to create an XML file with it on your system for your use.

<DICTIONARY xmlns:dt="uuid:304FB305-29A4-11d3-B0D4-00C04F8ED7A2" version="1.0">
<DICTITEM key="Operation"><VALUE dt:dt="string" xml-space="preserve">*</VALUE></DICTITEM>
<DICTITEM key="Operand1"><VALUE dt:dt="string" xml-space="preserve">1</VALUE></DICTITEM>
<DICTITEM key="Operand2"><VALUE dt:dt="string" xml-space="preserve">4</VALUE></DICTITEM>
</DICTIONARY>

Save the file for later. Once the code for the parser is implemented and the parser is ready to run within BizTalk Server, we will need this file to test the parser.

Implementing a BizTalk Custom Parser

We will implement a parser that expects a dictionary object persisted to XML. The parser will create a dictionary object from the XML, pull out the expected name-value pairs, build the expected SimpleEquationXML document, and place it into the transport dictionary for BizTalk Server to pass on to subsequent parsers and then into the channel. This will allow BizTalk Messaging to produce the equation XML with a mapped result at the other end of the channel.

Within a channel, BizTalk Messaging can be configured to track inbound and outbound documents in both native and XML format. These tracked documents can then be viewed through the BizTalk Document Tracking tool. The native format is the original document format, whereas the XML format represents the format that BizTalk Server places the document into internally. The XML format will match the document specification used, whereas the native format will not necessarily be the same. Thus in our example, the native format will be the document as it exists on disk and before our parser does anything with it. The XML format will be the output from our parser. We very well could have used a binary object BLOB of some type, but using the XML persisted dictionary will give us a very visible example. This will enable us to visibly inspect the native and XML representations within BizTalk Document Tracking and be able to easily read and compare the pre- and post-parser documents. Next let us take a look at what the parser interface looks like and how it works.

The Parser Interface

A custom parser in BizTalk Server is a COM component that implements the IBizTalkParserComponent interface. A custom parser is responsible for transforming data from an incoming document into a representation that is used and expected within BizTalk Server. In our example, the parser will be responsible for transforming a commerce dictionary object persisted to XML into the SimpleEquationXML document that BizTalk Messaging is expecting. The IBizTalkParserComponent interface is comprised of seven methods. The following table shows the method names and describes their function.

MethodDescription
GetGroupDetailsGets details of the group for the Document Tracking database. This method is called only if there are groups in the interchange.
GetGroupSizeGets the size of the group after all documents in the group are parsed. This method is called only if there are groups in the interchange.
GetInterchangeDetailsGets information about the organization identifiers of the source and destination BizTalkOrganization objects.

This function also performs a sleuth of other tasks, such as finding dynamic delimiters and codepages. It also fills the selection fields.

GetNativeDocumentOffsetsIdentifies offsets from the beginning of the stream for final details about the group in the Document Tracking database for final logging.
GetNextDocumentExamines the data in a document and determines when to get the next document if this is not the last document.
GroupsExistDetermines if the interchange contains groups.
ProbeInterchangeFormatIdentifies the format of the interchange.

For a simple interchange that contains a single document, the calling order for the methods is:

  1. ProbeInterchangeFormat
  2. GetInterchangeDetails
  3. GroupsExist
  4. GetNextDocument
  5. GetNativeDocumentOffsets

The number of times the methods for handling document groups and retrieving the next document information are called is dependent on the number of groups and number of documents that exist in a given interchange. The parser method calls have been designed to facilitate handling multiple documents and multiple groups of documents in any given interchange. The parser is responsible for informing BizTalk Server as to the existence of groups. If groups exist and are expected in an interchange, the parser must be able to handle them.

A more complex interchange might contain groups and multiple documents per group. BizTalk Server handles the interchange by calling the parser interfaces in the following manner:

BizTalk Server calls ProbeInterchangeFormat, in which the parser gathers information about the interchange, checks the document to determine if it will handle it, potentially grabs a pointer to the data in the interchange, and informs BizTalk Server of the document format if it recognizes it.

GetInterchangeDetails is called and the parser grabs any needed information from the dictionary passed to the call.

GroupsExists is next in the call order, and the parser checks for the existence of multiple groups and subsequently informs BizTalk Server that this interchange has groups, thus causing BizTalk Server to call some additional methods.

GetGroupDetails is the next method called and is where the call pattern diverges from the simple example. Within this call the parser returns information to BizTalk Server with regard to the group identity of the document or documents that the parser is about to process.

GetNextDocument is called until the parser reaches the last document in the group. Then the parser indicates that this is the last document as it normally would, but because the parser already determined that the group exists, this indication is of the last document in the group, but not necessarily in the interchange.

GetNativeDocumentOffsets operates as it usually would by indicating to BizTalk Server from where in the stream to grab the original data for this document.

GetGroupSize is called and the parser informs BizTalk Server of the number of documents in the group and whether this is the last group. If this is not the last group, BizTalk Server subsequently calls GetGroupDetails, thus starting another round of document parsing.

Figure 2 shows the flow and decision tree for parser method calls described here.

Ee265734.buildbiztalkserverps02(en-US,BTS.10).gif

Figure 2. Logical flow of parser method calls

Between the text and the diagram you should have a pretty clear picture of how the methods are called from BizTalk Server. Since we have now covered what the methods are, described briefly their function, and reviewed the call order, let's review the code implemented in the sample.

Base Code

In creating our example components, we will keep it as simple as possible to focus on the purpose of learning how and why to implement the interface methods. Hence, there will not be any complex error handling or reporting and the code will not be optimized and factored. Follow these steps to create the skeleton code:

  1. Create a new project through the ATL COM App Wizard named FirstParser.
  2. Inside FirstParser, create a new ATL Simple Object named Parser1. This is the component in which we will implement the IBizTalkParserComponent interface.
  3. Include the following files inside Parser1.h:
    1. bts_sdk_guids.h
    2. BTSParserComps.h. This file contains the definition for our interface, IBizTalkParserComponent.
    3. Computil.h. This file contains the method PutDictValue, which we will use to add data to dictionary objects. While this is not necessary, it relieves you from writing similar code to accomplish the same task.
  4. Include the functions from the appendix. I created separate files, but you can add the functions to the project however you see fit. These functions are not necessary for BizTalk Server development, but rather were developed specifically for this example.
  5. Compile the project to make sure that everything is sitting in place.
  6. Add the interface and method for IBizTalkParserComponent.
  7. Inside Parser1.h, add "public IBizTalkParserComponent" to the class inheritance list.
  8. Add the following method prototypes to the class declaration:
    //Methods for IBizTalkParserComponent
    HRESULT STDMETHODCALLTYPE ProbeInterchangeFormat(IStream 
          __RPC_FAR *pData, BOOL FromFile, 
       BSTR EnvName,  IStream __RPC_FAR *pReceiptData, BSTR __RPC_
          FAR *Format);
    
    HRESULT STDMETHODCALLTYPE GetInterchangeDetails(IDictionary __RPC
          _FAR *Dict);
    
    HRESULT STDMETHODCALLTYPE GroupsExist(BOOL __RPC_FAR *GrpsExist);
    
    HRESULT STDMETHODCALLTYPE GetGroupDetails(IDictionary __RPC_FAR *Dict);
    
    HRESULT STDMETHODCALLTYPE GetGroupSize(long __RPC_
          FAR *GroupSize,
       BOOL __RPC_FAR *LastGroup);
    
    HRESULT STDMETHODCALLTYPE GetNextDocument(IDictionary __RPC_FAR *Dict,
       BSTR DocName, BOOL __RPC_FAR *DocIsValid, BOOL __RPC_
          FAR *LastDocument,
       enum GeneratedReceiptLevel __RPC_FAR *ReceiptGenerated, BOOL __RPC
          _FAR *DocIsReceipt,
       BSTR __RPC_FAR *CorrelationCompProgID);
    
    HRESULT STDMETHODCALLTYPE GetNativeDocumentOffsets(BOOL __RPC_FAR *SizeFromXMLDoc,
       LARGE_INTEGER __RPC_FAR *StartOffset, long __RPC_
          FAR *DocLength);
    
    
  9. Add COM_INTERFACE_ENTRY(IBizTalkParserComponent) to COM_MAP so that it will be returned from a QueryInterface request.
  10. Add IMPLEMENTED_CATEGORY(CATID_BIZTALK_PARSER) to the CATEGORY_MAP. This is what will identify your component as a parser to BizTalk server. So don't forget it.
  11. While you are in this file, go ahead and add the following private members to the class; we will use them later:
       IStream*   m_pData;
       char*        m_pCharData;
       BOOL      m_FromFile;
       long         m_TotalNumDocs;
       long         m_LastDocEndIndex;
       long         m_LastDocLength;
       long         m_CurrentDocument;
       long         m_LastPositionInString;
    
       BOOL      IsValidDocType(IStream* pStream);
       BOOL     CreateXMLFromDictionary(IDictionary* pDataDict, BSTR* OutputXML);
       BOOL     CheckStreamForMultipleDocs();
    
    
  12. In the class constructor we will initialize all the member variables:
    CParser1()
    {
       m_TotalNumDocs= 0;
       m_LastDocEndIndex= 0;
       m_LastDocLength= 0;
       m_CurrentDocument= 0;
       m_LastPositionInString = 0;
       m_FromFile= 0;
       m_pData= 0;
    }
    
    
  13. Add an override for the FinalRelease method so that we can let go of our IStream interface:
    void FinalRelease()
    {
         if(m_pData != NULL){m_pData->Release();}
    }
    
    
  14. At this point you should have the class declaration complete. Now open the CPP file and add empty method implementations for each of the methods defined by IBizTalkParserComponent.
  15. Compile. At this point you should have an empty implementation of a BizTalk Server parser.

Just for your own edification during a debug session, in each of the parser method calls add a line similar to this:

ATLTRACE("In CParser1::[method name]\n");

Additionally, you might want to add "hr = TraceDictionaryValues(Dict);" to the GetInterchangeDetails, GetNextDocument, and GetGroupDetails methods just to give you a look at what is in the dictionary as it is passed in to the method.

If you are re-implementing this sample code, I would suggest that you use the example code for implementing the functions through the copy-paste coding style. I will refer to implementing code snippets within the document, but to get the entire code base you will need to refer to the sample source because this document serves to explain the interfaces, not to deliver the code.

With the base code and base understanding in place, the following sections will go through each method on the parser interface in detail. The methods will be covered in the order of expected execution.

Implementing the parser interfaces

In implementing the parser, we will focus on implementing the parser to handle a single instance or multiple instances of the document inside the interchange and to assume static routing. In a more complex scenario there could exist dynamically routed and self-routed documents. For the sake of clarity, we are keeping the example complex enough to be relevant, but simple enough to be easily understood. During the coverage of the interface methods, this document will briefly review the items that would be used and how they would be used to address tracking, dynamic routing, and groups inside the interchange. Refer to the parser implementation in the sample code as you work your way through this implementation. While parts of it will be in this document, it will be clearer if you relate the contents of this document to the sample source code.

ProbeInterchangeFormat

This method is the first method called by the BizTalk Server parser engine. In this method we gather several pieces of information and inform BizTalk Server of whether this is a format that is recognized by our parser.

First the parser must extract enough information from the stream pointer to ascertain whether it has a document with which it can work. This can be as complex or as simple as you need it to be. In other words, if a document coming in has a special encryption algorithm, if it needs to have checksums verified on it that are related only to pieces of the interchange, or if it needs anything else special and specific to determine whether this is an interchange for your parser, you would implement it here. It could be checking for a version number, validating some checksum, or checking a digital signature in the document. In other words, it is up to you, or the semantics of the document exchange agreement, to define the mechanism by which you verify that this is a document that the parser should operate on. In our case, we will simply check for the GUID of the Commerce.Dictionary object to exist within the first 64 bytes of the stream. To do this, the sample code implements a function, IsValidDocType:

BOOL CParser1::IsValidDocType(IStream* pStream)
{
    HRESULT     hr;
    ULONG       bytesread;
    BOOL        retval =  FALSE;

    BYTE*       pbData = new BYTE[TEST_DOC_NUM_BYTES + 1];

We read the first 64 bytes from the stream to check for the GUID:

    hr = pStream->Read ((void*)pbData, TEST_DOC_NUM_BYTES, &bytesread);

If indeed we read 64 bytes, let's check it for our GUID:

    //test for guid
    if (SUCCEEDED(hr))
    {
         //set the null
         BYTE* pFinalChar = pbData + bytesread;
         *pFinalChar = 0;

        //do the test
        char* lpszTestReturn = NULL;

        lpszTestReturn = strstr((char*)pbData, "uuid:304FB305-29A4-11d3-B0D4-00C04F8ED7A2");

Based on the return from the string compare, we return a TRUE or FALSE to indicate whether this appears to be an interchange that we want:

        retval = ( *lpszTestReturn == NULL) ? (FALSE) : (TRUE);

    }
    
    delete pbData;
    return retval;
}

Within the parser we implement some code to check the document to see if it is the type that the parser can handle. If it is not, we must return a value to BizTalk Server telling it that we didn't fail, but we don't want this data. The code looks something like this:

if(!IsValidDocType(pData))
{
   hr = pData->Seek (dLibMove, STREAM_SEEK_SET, NULL);
   return S_FALSE;
}

The return code S_FALSE lets BizTalk Server know that we didn't do anything with this interchange. For a complete list of possible BizTalk Server return codes, look in the BizTalk Server online documentation under the topic BizTalk Server 2000 Error Messages.

Note that we only validated that the interchange appears to have something in it that we handle. We could investigate the stream a little more to also determine if it appears to be the type of dictionary we want, that is, a dictionary representing a simple equation. If we did so, it would be to determine whether we think that the stream contains the type of dictionary, but not actually validate any documents. The GetNextDocument method allows the parser to inform BizTalk Server as to the validity of a given document, thus document validation is done there.

Once the parser has verified that this is an interchange it wants, it needs to save a reference to the IStream* that is passed into the method. Within this method call is the only place where BizTalk Server will pass the parser a pointer to the data stream. If the parser doesn't get a reference to it now, there will not be another chance. Add a member variable to the class for this, that is, IStream* m_pData;. Now you need to save the reference:

m_pData = pData;
pData->AddRef ();

Determine the number of documents within the interchange by counting the number of end tags in the interchange:

//get the number of documents in this interchange
CheckStreamForMultipleDocs();

Set the document format variable. BizTalk Server will pass this information on to subsequent parsers.

bstrFormat = *Format;
bstrFormat = "Custom XML";
*Format = bstrFormat.Detach ();

Don't forget to include AddRef or you will potentially find yourself with a NULL or invalid IStream pointer when the subsequent calls to your parser are made. In the sample code, I have added a couple of other pieces of code for tracing some information to the output window, but you only need to check the message type, save a reference to the stream, and set the message format.

Additionally, in this method we retrieve all the data from the stream and place it in a member variable of the parser class. This simplifies the parsing, as the code will only deal with the character pointer.

Near the top of the function, before we actually checked the interchange, we ensured the stream pointer was at the beginning and retrieved some information on the stream with the following code:

//check the data in the stream to see if it is what we want
STATSTG                 stats;
LARGE_INTEGER      dLibMove;
CComBSTR              BSTRData;
ULONG                    bytesread;
BYTE*                     pbData;

hr = pData->Stat (&stats, 0);
m_FromFile = FromFile;
//make sure that the stream is at the beginning.  
      It should be, but being defensive isn't all //bad.
if (SUCCEEDED(hr))
{
   dLibMove.QuadPart = 0;
   hr = pData->Seek (dLibMove, STREAM_SEEK_SET, NULL);
}

We allocate memory for a BYTE pointer by using the cbsize member of the stats (type STATSG) variable, which contains the size in bytes of the stream. We add one byte for a null terminator so that we can treat this as a null-terminated string.

//get all of the data out of the stream
pbData = new BYTE[stats.cbSize.LowPart + 1];

Now we read all the data and then set the last byte of the BYTE pointer to NULL.

hr = pData->Read ((void*)pbData, stats.cbSize.LowPart , &bytesread);

//add null terminator

BYTE* pFinalChar = pbData + bytesread;
*pFinalChar = 0;

Also, notice that I have written code to determine if the stream was originally from a file. If it is from a file, it is likely single-byte characters. If it is not, it is likely a BSTR.

//save off the data to a member variable for use later
if(m_FromFile)
{
   m_pCharData = (char*)pbData;
}
else
{
   USES_CONVERSION;
   m_pCharData = W2A((LPWSTR)pbData);
};

Since I am saving off the data for this example, I am converting to a char* before saving it. Later on when we look at GetNextDocument we will discuss this further. However, for now leave the stream pointer at its end location as we move on to covering GetInterchangeDetails.

GetInterchangeDetails

In this method call the parser will retrieve information with regard to source and destination organization identifiers. Additionally, if an envelope were being used, that information would be available to the parser through this method call. In the sample code, you will see that there is only a method call to write the dictionary name-value pairs out to the debug window.

HRESULT STDMETHODCALLTYPE CParser1::GetInterchangeDetails( 
     IDictionary __RPC_FAR *Dict)
{
ATLTRACE("In CParser1::GetInterchangeDetails\n");
//write the minimal fields from the dictionary to the output device
//I am using some helper functions borrowed from a commerce server.
HRESULT   hr;

hr = TraceDictionaryValues(Dict);
return hr;
}

For most XML implementations you will likely not do anything within this function. The dictionary will probably contain the following fields, which can be viewed in the debug window from the TraceDictionaryValues call.

NameDescription
Tracking_IDGUID (string) for tracking purposes
Document_NameString name of document type
submission_idGUID (string) to uniquely identify submission
s_Interchange_Spec String for identifying interchange specification (envelope)

Additionally, if you were working with an EDI type you would possibly have these name-value pairs in the dictionary.

TypeValue NameDescription
EDIFACT  
 interchange_releaseMessage type release number
 interchange_idInterchange control reference
 Dest_ID_ValueRecipient identification
 Dest_ID_TypeIdentification qualifier
 Out_Dest_ID_AppIdentification application
 Src_ID_ValueSender identification
 Src_ID_TypeIdentification qualifier
 In_Src_ID_AppIdentification application
X12  
 interchange_idInterchange control reference
 Dest_ID_ValueRecipient identification
 Dest_ID_TypeIdentification qualifier
 Src_ID_ValueSender identification
 Src_ID_TypeIdentification qualifier

Several of the values presented in these tables are used for self-routing documents, for example, the Dest_ID_ and Src_ID_ prefixed value names. In the sample code we did not do anything with envelopes or the source and destination identifiers. Thus, this method has no effect on our sample implementation. The only thing that we do in the sample implementation is display any values within the incoming dictionary in the debug window.

Now that the parser has retrieved information with regard to the interchange and has grabbed a pointer to the data stream, BizTalk Server is interested in knowing whether the interchange contains groups.

GroupsExist

In any given interchange, you might have multiple documents that are batched for processing. For example, you might have a document specification that represents a remittance. However, that remittance might not contain the concept of a batch within its specification. Additionally, not only might more than one document exist, but more than one group of documents might exist. Hence, you might have data within a single file that is logically similar to the following:

<interchange header> [header info]  </interchange header>
<groupheader> [lots of group information] </groupheader>
<document>     [very valuable doc info    ] </document>
<document>     [very valuable doc info    ] </document>
<document>     [very valuable doc info    ] </document>
            <grouptrailer>   [group trailer info           ] </grouptrailer>
<groupheader> [lots of group information] </groupheader>
<document>     [very valuable doc info    ] </document>
<document>     [very valuable doc info    ] </document>
<document>     [very valuable doc info    ] </document>
            <grouptrailer>   [trailer info                    ] </grouptrailer>
<interchange trailers>[interchange trailer info]<interchange trailer>

Notice that since there are multiple root nodes, it would be impossible to represent this as XML without wrapping the entire thing within a root node. This is because the XML specification dictates that an XML document must contain one and only one root node. However, this is the manner in which many batched EDI transactions are communicated. BizTalk Server ships EDI parsers and serializers that handle these types of complex interchanges automatically.

The GroupsExist method is where you tell the BizTalk Server parsing engine whether the interchange that you are dealing with has multiple groups within your document. This function is targeted at EDI type transactions where not only might you have multiple documents within the interchange, but additionally you might have multiple groups. This is because historically EDI is handled by batching all transactions and sending them at set intervals of time. An example could be that a reseller might batch all purchase orders (POs) from multiple distributors into a single interchange and send that to the vendor, in which case the vendor will potentially have multiple POs from multiple distributors. Whereas you were starting from scratch and creating an XML-based PO for this purpose, you might take this into consideration and create a group PO superstructure, thus facilitating the existence of multiple POs from multiple organizations and removing the complexity of a multidocument interchange.

If you tell BizTalk Server that you have multiple groups within an interchange, BizTalk Server will subsequently make a call into your component to GetGroupDetails. Furthermore, within GetNextDocument if the parser informs BizTalk Server that it has processed the last document and does not indicate within GetGroupSize that it has reached the last group, BizTalk Server will then call GetGroupDetails again and start the processing for the next set of documents by calling GetNextDocument. In our implementation, we will not process a separate group, but we will, for demonstration purposes, process an interchange with multiple documents. Thus our multi-document interchange will contain two instances of XML persisted dictionary objects:

<DICTIONARY xmlns:dt="uuid:304FB305-29A4-11d3-B0D4-00C04F8ED7A2" version="1.0">
<DICTITEM key="Operation"><VALUE dt:dt="string" xml-space="preserve">*</VALUE></DICTITEM>
<DICTITEM key="Operand1"><VALUE dt:dt="string" xml-space="preserve">1</VALUE></DICTITEM>
<DICTITEM key="Operand2"><VALUE dt:dt="string" xml-space="preserve">4</VALUE></DICTITEM>
</DICTIONARY>

<DICTIONARY xmlns:dt="uuid:304FB305-29A4-11d3-B0D4-00C04F8ED7A2" version="1.0">
<DICTITEM key="Operation"><VALUE dt:dt="string" xml-space="preserve">-</VALUE></DICTITEM>
<DICTITEM key="Operand1"><VALUE dt:dt="string" xml-space="preserve">5</VALUE></DICTITEM>
<DICTITEM key="Operand2"><VALUE dt:dt="string" xml-space="preserve">17</VALUE></DICTITEM>
</DICTIONARY>

GetGroupDetails

Provided that the parser identified groups in the call to GroupsExist, the GetGroupDetails method will be called before processing each group. Information about the document group is set in this method. Some of the fields that might exist or that you might place into the dictionary pointer that is passed to this method are shown in the following table.

TypeValue Name
General 
 Tracking_ID
 Document_Name
 syntax
 submission_id
EDIFACT 
 application_receiver_code
 application_sender_code
 functional_identifier
 standards_version
 standards_release
 group_id
X12 
 application_receiver_code
 application_sender_code
 functional_identifier
 standards_version
 group_id

Since we are not working with groups in this example, this implementation only contains code to dump the dictionary object to the output device. This serves heuristically to show what, if anything, is being passed into the method call through the dictionary object.

HRESULT STDMETHODCALLTYPE CParser1::GetGroupDetails( IDictionary __RPC_FAR *Dict)
{
   HRESULT hr;
   ATLTRACE("In CParser1::GetGroupDetails\n");
   hr = TraceDictionaryValues(Dict);
   return S_OK;
} 

Subsequent to the GetGroupDetails call is the GetNextDocument call, which is where most of the parsing work takes place.

GetNextDocument

Within the GetNextDocument method is where the majority of a parser's work exists. It is within this method that your parser will transform the input document into the XML format that is expected by the channel. It is also within this method that you would discover the route of a self-routing document. Examining the elements specified in the document specification as routing tags would do this. Hence, you would create the XML instance, discover the source and destination values contained within the self-routing document, and set the SrcQual, SrcID, DestQual, and DestID name-value pairs of the transport dictionary within the method call. Our sample implementation is not using a self-routing document and therefore does not set those fields.

However, we do parse the document, create the expected XML document, and set the WORKING_DATA field of the transport dictionary to our resulting XML. Furthermore, if within the interchange more than one document exists, the call to this method would return LastDocument as false and parse the subsequent documents, repeating the same steps until there are no more documents to process within the interchange.

In the ProbeInterchangeFormat method call, the sample code stores a char* within the class that contains the data of the stream. I will work with it to parse through the interchange and determine document offsets. Keep in mind that this sample code is using single-byte character strings. In the following code, we will also see that there are member variables of the class being set to track the end position of the current document and the length of the current document. In the GetNativeDocumentOffsets call that is made subsequent to this call, the parser engine will be looking for the offset from the beginning of the stream and the document length in order to track the inbound document.

//parse out the document
long  CurrentDocLength;
char* pUnprocessedSubString;
char* pEndTag;
char* pCurrentDocument;
//get a pointer to the character after the end tag

Get a pointer to the part of the character pointer that we haven't processed. We do this by tracking our offset in bytes during processing and adding the number of bytes to the pointer variable that contains our data.

pUnprocessedSubString = m_pCharData + m_LastDocEndIndex;
pEndTag = strstr(pUnprocessedSubString, END_DOC_IDENTIFIER);

Get a character pointer to the character after the last character of the end tag of a single instance of the dictionary document.

pEndTag += strlen(END_DOC_IDENTIFIER);

Determine the current document's length by subtracting the address of the first character of the unprocessed string from the address of the end character.

//determine the length of the document
CurrentDocLength = pEndTag - pUnprocessedSubString + 1;

Next we allocate and initialize memory for our document bytes based on the length that we just determined.

//copy document from string
pCurrentDocument = new char[CurrentDocLength+1];
memset ((void*)pCurrentDocument, 0, CurrentDocLength + 1);

We will now need to copy the string of the dictionary XML into a variable so that we can pass it to the RehydrateDictionary call and get back our source dictionary.

strncpy(pCurrentDocument, pUnprocessedSubString, CurrentDocLength);

As we parse the data, we want to keep track of where we are in the data stream so that we can inform BizTalk Server where to find the native document in the IStream* and so that we can know from where to continue parsing in subsequent calls to GetNextDocument.

//set the position place holders
//these are noted with Last, but represent current doc in this fn call
//they represent the last doc processed in the GetNativeDocOffsets call
m_LastDocLength = CurrentDocLength;   
m_LastDocEndIndex= m_LastDocEndIndex + m_LastDocLength;

Now that we have the Commerce.Dictionary XML from the stream, we will reconstruct the dictionary object through its LoadXML method. The IPersistXML::LoadXML method accepts XML previously created by the IPersistXML::SaveXML method. The dictionary object implements the IPersistXML interface. The LoadXML method is called within the RehydrateDictionary function in order to turn the source XML into a Commerce.Dictionary object.

if (SUCCEEDED(hr))
{
   hr = pDataDict.CoCreateInstance (L"Commerce.Dictionary");
   if(SUCCEEDED(hr))
   {
      BYTE* pFinalChar = pbData + bytesread;
      *pFinalChar = 0;
      BSTRData = (char*)pbData;
      hr = RehydrateDictionary (BSTRData, pDataDict);
   }
}

At this point we have parsed a single document from our input stream (interchange) and we have converted to a Commerce.Dictionary object. The next step is to create, from the data in the dictionary, an instance of an XML document, the SimpleEquationXML document in our case, that BizTalk Messaging is expecting in the channel. We should have a dictionary object from which we can extract expected name-value pairs and create the XML instance. Once we have created the proper XML document, we have to set the WORKING_DATA field of the dictionary passed to the method call. Parsed data is passed back to BizTalk Server from the parser through the WORKING_DATA field of the working dictionary passed to the GetNextDocument method.

if (SUCCEEDED(hr))
{
   //create the output xml instance
   *DocIsValid = CreateXMLFromDictionary (pDataDict, &bstrData);
   //Attempting to set the WORKING_DATA field of the transport dictionary
   hr = PutDictValue(Dict, L"WORKING_DATA", bstrData);
} 

Note that in the code block the DocIsValid flag is set based on whether we could successfully create the expected XML document. If you attempted to parse the document and create the XML representation that is expected in the channel, but were unable to do so because the native document does not contain the correct data or correct fields, you would set this variable to FALSE. Be sure to set LastDocument to TRUE or you might end up sending BizTalk Server spiraling into an infinite loop of calling GetNextDocument.

*LastDocument = (m_CurrentDocument == m_TotalNumDocs) ? (TRUE):(FALSE);

The following table can be used as a truth table as to what the implications are with regard to the various output combinations for the function's HRESULT and the DocIsValid flag.

HRESULTDocIsValidMeaning
S_OKTDocument accepted, continue with next document
S_OKFDocument rejected, continue with next document
S_FALSETErrors occurred, accept document and continue
S_FALSEFErrors occurred, reject document and continue
E_FAILEDAnyCatastrophic failure, dump data and terminate

With regard to the copy of the data to a char* in the ProbeInterchangeFormat method, normally you would move through the stream as you parse. However, in the sample code all the data is saved to a char* and all the parsing takes place against the char*. However, notice that the IStream* was left at its end position in ProbeInterchangeFormat. There are some things to note that are of significant consequence:

If you do not move the IStream* from the beginning byte, BizTalk Server will fail the parsing. For example, if I read all the data into a char* and then reset the IStream* to the beginning, even though I might parse and return a valid document from the char*, BizTalk Server will recognize that the IStream* is at the beginning and fail the parsing session, reporting an error that nothing was read from the stream.

If the IStream* is not at the end and you set LastDocument to TRUE, BizTalk Server will disregard the LastDocument flag and call GetNextDocument again.

Once the last document has been processed for the interchange, or for a particular group in the interchange, BizTalk Server calls the GetNativeDocumentOffsets method call of the parser. It is within this call that the parser provides BizTalk Server with the information needed to facilitate document tracking.

GetNativeDocumentOffsets

In this method, the parser informs BizTalk Server about how to extract the information related to the last parsed document from the stream. Let's say that for the purposes of auditing we want to ensure that we log all incoming files and store them in their inbound format. To do this, BizTalk Server will call this function. Inside this function you must set the SizeFromXMLDoc parameter to FALSE. If we set this to TRUE, BizTalk Server will try to ascertain from what point and how far from that point to read from the stream to log the native data by sizing the outbound XML instance that your parser creates. Since our incoming format does not directly match the size of the resulting XML instance, we must set this to FALSE and also manually set the StartOffset and DocLength parameters within the code.

The StartOffset parameter is the variable that BizTalk Server uses to determine from where in the stream to start reading to get the native document. The DocLength variable is how far BizTalk Server should read from the StartOffset parameter. BizTalk Server will use this information to read from the stream and log the document in the BizTalk Server tracking repository.

To accomplish this, we can track our processing by adding member variables to our class:

Long   m_LastDocEndIndex;
Long   m_LastDocLength; 

During the GetNextDocument call, we will set these. Thus, the code for this implementation of GetNativeDocumentOffsets looks like this:

HRESULT STDMETHODCALLTYPE CParser1::GetNativeDocumentOffsets(BOOL __RPC_
      FAR *SizeFromXMLDoc, LARGE_INTEGER __RPC_
      FAR *StartOffset, long __RPC_FAR *DocLength)
{
   ATLTRACE("In CParser1::GetNativeDocumentOffsets\n");
   //Tell BizTalk Server that we will give it the doc offset and length
   *SizeFromXMLDoc = FALSE;
   StartOffset->QuadPart =  m_LastDocEndIndex - m_LastDocLength;
   *DocLength = m_LastDocLength;
   return S_OK;
}

Using the sample dictionary XML from earlier in this document, I copied it and created a file that contained two instances of that XML. Thus we have one file that contains two XML documents. I ran through this sample parser, and the single document became two output documents. Of interest, and to the point of this paragraph, is the document tracking. Because of the code that was implemented in this method and the fact that in the channel setup I selected to track the inbound document in both native format and XML format, I was able to view the data pre-Parser1 and post-Parser1. By running BizTalk Document Tracking from my Start menu and querying for the documents as related to the document type and organization, I can view the documents in the interchange.

Ee265734.buildbiztalkserverps03(en-US,BTS.10).gif

Figure 3. BizTalk Document Tracking main page

By running a query for the test source organization and the document definition that was used (in this case the definition is Equation, with a logical name associated with the document specification SimpleEquation), BizTalk Document Tracking produces a screen with results as shown in the following illustration.

Ee265734.buildbiztalkserverps04(en-US,BTS.10).gif

Figure 4. Document Tracking Query Results page

You can clearly see the two documents as the two line items with GUIDs in figure 4. These are the two documents that the parser produced as part of the two-document interchange; if either of the documents is selected, BizTalk Document Tracking will display both the native input data and the resulting XML from the Parser1 implementation. Be sure to note that this is data from before the document passes through the channel. You can see what they look like in the following illustration.

Ee265734.buildbiztalkserverps05(en-US,BTS.10).gif

Figure 5. Document data views

By tracking both documents, you can ensure that you will pass audit. Furthermore, if indeed there ever were a problem, a comparison could be made between the incoming document and the post-parser document.

Something to consider is document tracking with regard to binary document submission, that is, a Microsoft Excel document. BizTalk Server cannot effectively track the document, since the document cannot be broken apart. This means for tracking purposes the parser would either respond to BizTalk Server in such a way as to not track binary objects or documentsBLOBs within BizTalk Document Tracking, or the parser would indicate to BizTalk Server to track the entire binary steam for each parsed out document. Once again, a Microsoft Excel spreadsheet might have several sheets, each of which is a distinct document. For the purposes of tracking, it might be better in practice to store binary documents in place of your own design and keep a set of data to relate the document to the Tracking_ID and submission_id that you get through the parser interfaces.

GetGroupSize

Take solace in knowing that this is the last interface that we must address with regard to parsers. This method is required when processing interchanges that include groups. Though we did not implement a group processing parser in the sample, had we done so, you would need to fully implement this method. If you refer to figure 2, you can clearly see that GetGroupSize is called directly after the call to GetNativeDocumentOffsets. However, it is only called if the parser indicated that the interchange it is parsing contains groups. For our example we set it to return E_UNEXPECTED. For our sample this method should never be called; if it is called, the call is indeed unexpected.

HRESULT STDMETHODCALLTYPE CParser1::GetGroupSize(long __RPC_
      FAR *GroupSize, 
BOOL __RPC_FAR *LastGroup)
{ 
   ATLTRACE("In CParser1::GetGroupSize\n");
   return E_UNEXPECTED;
}

Fortunately, even if we had implemented groups within our parser, there is not much happening in this function call. If the parser had indicated that groups exist, this call would be made after the parser returned a TRUE for the LastDocument flag of GetNextDocument. At that point, our custom parser would return the size of the group in the GroupSize parameter and indicate whether it had just finished processing the last group in the LastGroup parameter. If the last group is processed, the parser will exit. If not, GetGroupDetails will be called subsequent to this method.

At this point all the methods on the parser interface have been covered, so you should be able to run the example parser with this information and be able to clearly ascertain how the parser works.

Running the Parser

Now that all the interfaces are complete, you can build and register the parser. Once the parser is successfully registered, run BizTalk Server Administration and choose the properties for the BizTalk Server root node.

Ee265734.buildbiztalkserverps06(en-US,BTS.10).gif

Figure 6. BizTalk Server Administration main page

Once the Properties dialog box appears, click the Parsers tab and click Refresh. Parser1 should show up in the list.

Ee265734.buildbiztalkserverps07(en-US,BTS.10).gif

Figure 7. BizTalk Server Group Properties dialog box

Use the arrows on the right side of the dialog box to move Parser1 to the top of the list so that it will be called first. Click OK. The parser is ready to be used by BizTalk Server.

You should be able to run through the test harness we initially set up by dropping an instance of the dictionary XML into the File receive function. If you were to drop a SimpleEquationXML instance and a dictionary XML instance into the receive function, providing that the same values are used in both documents, they should both produce the same results from passing through the channel. This is because the parser converts the dictionary XML into the SimpleEquationXML document type before passing it to BizTalk Server.

One more thing to note on running the parser: If you wanted to debug your parser as BizTalk Server called the methods, you would simply need to attach to the MSCIS.EXE process. I find that the easiest way to work when debugging is to stop the BizTalk Server service and set up the debugger to use MSCIS.EXE as its debugging process. Thus, the BizTalk Server service will start running when you start debugging and stop when you are finished.

Implementing the Parser Summary

At this point the FirstParser component has an interface name Parser1. This object is able to receive Commerce.Dictionary objects that contain certain fields and that have been persisted to XML through their IPersistXML interface. Moreover, the document should run through a channel and, on the other end, BizTalk Server should produce XML that represents the equation XML that we set up internally and that contains the result of the equation inside the XML. Furthermore, the parser can handle multiple instances of the dictionary equation XML within a single interchange. While this is a fairly simple example, we have explored how to build a BizTalk Server parser component. That is, we parsed the incoming data stream into a format that BizTalk Server is expecting within the channel. We also looked at how we tell BizTalk Server where the native document sits within the input stream so that it can save a copy of the data to the Tracking database. Although there is no code for it in the example, we also outlined what the parser would need to do if the document was self-routing and what it means for an interchange to contain groups.

That parser represents only half of the equation. Most likely if there is a very specialized format coming in, there will be a requirement for the same format coming out. It is on the outbound side that the custom serializer does its work.

Implementing a BizTalk Server Serializer

Congratulations! If you are going through this document top-down and reviewing the code, you are halfway done, and the example serializer is simpler to implement than the example parser. While the job of the custom parser is to parse the incoming data and convert it into a format that is understood by BizTalk Server, the job of the serializer is to take an outgoing document and write it to a particular format from the internal XML representation. Examples of this might be persisting outgoing documents to Microsoft Word, Microsoft Excel, EBCDIC, and Packed Decimal, or anything else that you can imagine. For the example code, the serializer takes the outgoing XML document, pulls out the data from the fields, places the data as name-value pairs into a dictionary, retrieves the XML representation of that dictionary object through the IPersistXML interface, and finally places the data to be stored into the outgoing stream. Let's take a look at what the serializer interface methods are and how they work.

Serializer Interfaces

To implement a custom BizTalk Server serializer, we must implement a single interface consisting of five methods.

MethodDescription
AddDocumentAdds an XML document for storage by the serializer component.
GetDocInfoGets details of the document.
GetGroupInfoGets details of the group, such as size and offset, for the Document Tracking database.
GetInterchangeInfoGets information about the interchange created.
InitOutputs the document instance to the serializer component and indicates where it should be sent.

For a simple interchange that contains a single document, the calling order for the methods will be:

  1. Init
  2. AddDocument
  3. GetInterchangeInfo
  4. GetDocInfo

GetGroupInfo will be called only if a group exists in the document. In the example implementation we will not be dealing with groups, so GetGroupInfo will not be called. The number of times the methods for handling document groups and adding documents are called is dependent on the number of groups and number of documents that exist in a given interchange. For further information on the call order of the methods for the serializer, see the BizTalk Server 2000 documentation and look for the topic "Sequence for Calling Methods of the IBizTalkParserComponent Interface."

The calls to the serializer currently seem to pass only one document at a time, so we will not have to parse the data before preparing it for serialization. If we were to implement a more complex interchange containing groups, the serializer execution would take the following steps:

Init is called so that the serializer can retrieve some information and get a pointer to the output data stream.

Next AddDocument is called to allow the serializer to parse, translate, and transform the outbound data into the format that it wants and add it to the outbound stream.

AddDocument is called repeatedly until the last document has been processed. At that point, GetInterchangeInfo is called and the serializer returns an interchange identifier and the number of groups that exist.

If groups exist in the outbound interchange, the next call is to GetGroupInfo. In GetGroupInfo the serializer will tell BizTalk Server how many documents are in the group, the offset into the stream to find the group, and the length, in bytes, of all documents in the group.

GetDocInfo is then called for each document in each group and has similar responsibility as the parser method of GetNativeDocumentOffsets. Once this method has been called for each document in a group, GetGroupInfo is called and then GetDocInfo is called again.

The following diagram represents the call path for the serializer functions as described above.

Ee265734.buildbiztalkserverps08(en-US,BTS.10).gif

Figure 8. Logical flow of serializer method calls

For this release of BizTalk Server, the number of documents passed to the serializer at a time seems to always be one. This is not all bad because it simplifies the work that we have to do. Note that the execution path has the serializer processing all documents and placing them into the outgoing stream before BizTalk Server starts requesting information on the groups and documents in the interchange. Having reviewed the execution flow, let's take look at the code.

Base Code

In FirstParser, create a new ATL simple object named Serializer1. This will give us two objects within this library: FirstParser.Parser1 and FirstParser.Serializer1.

Include the following files inSerializer1.h:

  • bts_sdk_guids.h
  • BTSSerializerComps.h. This file contains the definition for the IBizTalkSerializer interface.
  • Computil.h. This file contains the GetDictValue method, which we will use to add data to dictionary objects. While this is not necessary, it relieves you from writing similar code to accomplish the same task.
  • Include the functions from the appendix. I included the misc.h and misc.cpp files from the parser implementation.

Compile the project to make sure that everything compiles. If it compiles, it must work.

Add the interface and method for the IBizTalkSerializerComponent interface.

In Serializer1.h, add "public IBizTalkSerializerComponent" to the class inheritance list. This enables us to implement the methods for the parser on our component and, once finished with a few following details, be able to return an IBizTalkSerializerComponent pointer from our component when requested by QueryInterface.

Add the following method prototypes to the class declaration:

//Serializer method prototypes
HRESULT STDMETHODCALLTYPE CSerializer1::Init(BSTR srcQual, BSTR srcID,BSTR destQual, 
   BSTR destID,  long EnvID, IDictionary __RPC_FAR *pDelimiters, 
   IStream __RPC_FAR *OutputStream, long NumDocs, long PortID);

HRESULT STDMETHODCALLTYPE CSerializer1::AddDocument(long DocHandle, 
   IDictionary __RPC_FAR *Transport, BSTR TrackID, long ChannelID);

HRESULT STDMETHODCALLTYPE CSerializer1::GetInterchangeInfo(BSTR __RPC_
      FAR *InterchangeID, long __RPC_FAR *lNumGroups);

HRESULT STDMETHODCALLTYPE CSerializer1::GetGroupInfo(long __RPC_FAR *NumDocs, 
   LARGE_INTEGER __RPC_FAR *GrpStartOffset, long __RPC_FAR *GrpLen);

HRESULT STDMETHODCALLTYPE CSerializer1::GetDocInfo(long __RPC_FAR *DocHandle,
   BOOL __RPC_FAR *SizeFromXMLDoc, LARGE_INTEGER __RPC_FAR *DocStartOffset,
   long __RPC_FAR *DocLen);

Add COM_INTERFACE_ENTRY(IBizTalkSerializerComponent) to COM_MAP so that it will be returned if requested through QueryInterface.

Add IMPLEMENTED_CATEGORY(CATID_BIZTALK_SERIALIZER) to CATEGORY_MAP. This is what will identify your component as a serializer to BizTalk Server. So don't forget it.

While you are in this file, go ahead and add the following private members to the class; we will use them later.

IStream*   m_DataStream;
long         m_CurrentStreamPos;
long         m_LastDocLength;
long         m_LastDocHandle;

HRESULT CreateDictionaryFromEquationXML(IDictionary* OutputDict, BSTR DictXML);

In the constructor, initialize the member variables:

CSerializer1()
{
   m_CurrentStreamPos = 0;
   m_LastDocLength = 0;
   m_LastDocHandle = 0;
   m_DataStream = 0;
}

Add an override for the FinalRelease method so that we can release our IStream interface:

void FinalRelease()
{
   if(m_DataStream != NULL){m_DataStream->Release();}
}

At this point you should have the class declaration complete. Now open the CPP file and add empty method implementations for each of the methods defined by IBizTalkSerializerComponent.

Compile. At this point you should have an empty implementation of a BizTalk Server serializer.

Just for your own edification during a debug session, in each of the parser method calls add a line similar to this:

ATLTRACE("In CSerializer1::[method name]\n");

Additionally, you might want to add "hr = TraceDictionaryValues(Dict);" to the methods that receive from or write to a dictionary.

You will want to reference the MSXML libraries. You could do this by including the proper header files in your project. In the sample, we used the #import directive to import msxml3.dll:

#import "msxml3.dll" named_guids raw_interfaces_only

If you are reimplementing this sample code, I would suggest that you use the example code for implementing the functions through the copy-paste coding style. I will refer to implementing code snippets within the document, but to get the entire code base you will need to refer to the sample source because this document serves to explain the interfaces, not to deliver the code.

Implementing the Serializer Interfaces

You should now have a complete parser and a skeleton serializer. In the serializer we will focus on where to get the document and what to do with it by reviewing the serializer sample method implementations.

Init

This method is the first method called on the serializer component, and it is where we grab some information that we need and a pointer to our outgoing data stream. The example code does not do a lot of work here. The source and destination information is passed into this method for use. The number of documents is passed in and currently will always be 1. However, just in case something changes, we will add code to ensure that the number of documents is indeed 1. We will also ensure that the stream pointer is not null.

if ( !OutputStream || NumDocs != 1) {return E_INVALIDARG;}

We will need to save a reference to the stream that gets passed into the method.

//grab the stream pointer, you won't get another chance
hr = OutputStream->AddRef ();
m_DataStream = OutputStream;

Additionally, as a proactive measure, we will make sure that the stream is at the beginning.

//make sure the stream is at the beginning. It should be, but this can't hurt
Move.QuadPart = 0;
hr = m_DataStream->Seek (Move, STREAM_SEEK_SET, NULL);

If we were expecting to get passed delimiters into the serializer, they would be found within the dictionary passed into this method. Once the call to Init is completed, the BizTalk Server parsing engine will continue by calling AddDocument.

AddDocument

AddDocument is the primary place in the serializer where work is performed. This is where you will implement all the code within the serializer to move the data from the format within the stream to the expected output format.

As a parameter to this method, we receive a pointer to a Commerice.Dictionary object, which contains the document that we have to convert to the output format. For our example this is an elementary process. First, we grab the data from the dictionary object.

hr = EquationDict.CoCreateInstance (L"Commerce.Dictionary");
if(SUCCEEDED(hr)){hr = GetDictValue(Transport, L"working_data", &IncomingXML);}
//just ensuring proper type
if(SUCCEEDED(hr)){hr = IncomingXML.ChangeType (VT_BSTR);}

The next step is to convert the XML into a dictionary instance so that we can generate the dictionary XML. The following method call is responsible for doing this.

if(SUCCEEDED(hr)){hr = CreateDictionaryFromEquationXML(EquationDict, IncomingXML.bstrVal);}

The code implemented within CreateDictionaryFromEquationXML is implemented as:

HRESULT CSerializer1::CreateDictionaryFromEquationXML(IDictionary* OutputDict, BSTR DictXML)
{
HRESULT   hr;
VARIANT_BOOL    IsLoaded;
CComBSTR          NodeValue;
CComPtr<MSXML2::IXMLDOMDocument2>   pXMLDoc;
CComPtr<MSXML2::IXMLDOMNodeList>       pXMLNodeList;
CComPtr<MSXML2::IXMLDOMNode>            pXMLNode;

Create an XMLDom instance and load it with the SimpleEquation XML passed into the AddDocument method.

hr = pXMLDoc.CoCreateInstance (MSXML2::CLSID_DOMDocument30);
hr = pXMLDoc->loadXML(DictXML, &IsLoaded);

The next several sections of code retrieve the specific values from the XML document that the serializer needs to create the outgoing dictionary object.

NodeValue="";
hr = pXMLDoc->getElementsByTagName(L"Operand1", &pXMLNodeList);
if(SUCCEEDED(hr)){hr = pXMLNodeList->get_item(0, &pXMLNode);}
if(SUCCEEDED(hr)){hr = pXMLNode->get_text(&NodeValue);}
hr = PutDictValue(OutputDict, L"Operand1", NodeValue);
pXMLNodeList.Release();
pXMLNode.Release();

NodeValue="";
hr = pXMLDoc->getElementsByTagName(L"Operand2", &pXMLNodeList);
if(SUCCEEDED(hr)){hr = pXMLNodeList->get_item(0, &pXMLNode);}
if(SUCCEEDED(hr)){hr = pXMLNode->get_text(&NodeValue);}
hr = PutDictValue(OutputDict, L"Operand2", NodeValue);
pXMLNodeList.Release();
pXMLNode.Release();

NodeValue="";
hr = pXMLDoc->getElementsByTagName(L"Operator", &pXMLNodeList);
if(SUCCEEDED(hr)){hr = pXMLNodeList->get_item(0, &pXMLNode);}
if(SUCCEEDED(hr)){hr = pXMLNode->get_text(&NodeValue);}
hr = PutDictValue(OutputDict, L"Operation", NodeValue);
pXMLNodeList.Release();
pXMLNode.Release();

NodeValue="";
hr = pXMLDoc->getElementsByTagName(L"Result", &pXMLNodeList);
if(SUCCEEDED(hr)){hr = pXMLNodeList->get_item(0, &pXMLNode);}
if(SUCCEEDED(hr)){hr = pXMLNode->get_text(&NodeValue);}
hr = PutDictValue(OutputDict, L"result", NodeValue);

return hr;
}

This code is pretty simple. You could explore different mechanisms (for example, XPath) to accomplish the same task that was accomplished using getElementsByTagName, get_item, and get_text. At the end of this method call we return OutputDict with all the expected name-value pairs.

Once we have the dictionary, we use the IPersistXML interface of the dictionary to generate the outgoing XML. This done within the call to DehydrateDictionary.

if(SUCCEEDED(hr)){hr = DehydrateDictionary(&OutgoingXML, EquationDict);}

The code for DehydrateDictionary can be found in the appendix of this document.

Once we have the XML in a BSTR, we need to place it into the stream. The stream expects double-byte characters. If you place ANSI characters into the stream, you will find that the persisted document will come out as garbage. Additionally, we will have to tell BizTalk Server how large this document is inside the stream. We will take the easy way out by multiplying the string length by two. In a real implementation you would want to convert the BSTR to a byte pointer and set the size from that. However, for our example, this will be sufficient.

m_LastDocLength = OutgoingXML.Length () * 2;
hr = m_DataStream->Write((void*)OutgoingXML.m_str , m_
      LastDocLength, &byteswritten); 
m_CurrentStreamPos +=byteswritten;

Note that the length is assigned to a member variable because it will be needed again later in the GetDocInfo call. We took the easy road here, because we know two things:

  • BizTalk Server currently is only passing one document at a time.
  • In our Init implementation we checked to make sure that it had only one document coming into the serializer.

We also have to save the DocHandle to a member variable, because BizTalk Server will also be expecting it in the GetDocInfo call. This would be a little more complex if we were expecting multiple documents, because we would need to save off the document handle for each document and track each document's offset inside the stream.

GetInterchangeInfo

In GetInterchangeInfo we will tell BizTalk Server the number of groups and return an InterchangeID. For our sample implementation we will set the number of groups to 0 and return a GUID for the InterchangeID.

HRESULT     hr;
*lNumGroups = 0;
// Use a GUID tracking ID
UUID           tracking_id;
CComBSTR   StringUUID;
WCHAR*      pwszUUID;

Next we will create a GUID for the InterchangeID.

hr=HRESULT_FROM_WIN32(UuidCreate(&tracking_id));
hr=HRESULT_FROM_WIN32(UuidToStringW(&tracking_id, &pwszUUID));
StringUUID = pwszUUID;
*InterchangeID = StringUUID.Detach (); 
hr = HRESULT_FROM_WIN32(RpcStringFreeW(&pwszUUID));

GetDocInfo

Similar to the parser method GetNativeDocumentOffsets, in this method we will tell BizTalk Server the handle to the document that it gave to us in AddDocument and the size and location of the document inside the stream.

*DocHandle = m_LastDocHandle;
*SizeFromXMLDoc = FALSE;
DocStartOffset->QuadPart = m_CurrentStreamPos - m_LastDocLength;
*DocLen = m_LastDocLength;

This information will be used by BizTalk Document Tracking to track the document as it is being serialized. If we were to take a look at document tracking, we should be able to see the native and the XML formats of the documents. Running a two-document interchange through the serializer and parser pair resulted in BizTalk Document Tracking having an incoming interchange with two documents and two separate outgoing interchanges. The following illustration shows the native and XML format windows for one of the documents.

Ee265734.buildbiztalkserverps09(en-US,BTS.10).gif

Figure 9. BizTalk Document Tracking showing documents for the parser and serializer

The window on the left shows the document as it was submitted to BizTalk Server. The window on the right shows the document after it passed through the parser, the mapping in the channel, and finally through the serializer back to the dictionary XML format. If we were to click the View XML format option, we would be shown the equation XML that BizTalk Server was using internally.

GetGroupInfo

Once again, we are not doing anything with groups in this sample. We simply return E_NOTIMPL.

HRESULT hr;
hr = E_NOTIMPL;
return hr;

However, were this method to be implemented, it would return to BizTalk Server the number of documents in the group, the offset into the stream for the start of the group, and the number of bytes (length) of the group.

Modifying the Initial Test Harness for the Serializer

Once you have all the needed code in the serializer, or even if you want to do it with just the stub code, compile the parser and then register it. This will allow us to set up the test scenario.

Setting up the test for the serializer is exactly the same as the parser test harness setup except for a couple of minor changes. You could follow those directions augmented by the following or modify the existing ports and channels that were used for the parser test. Furthermore, you could simply use the parser test harness and make a few minor changes.

You will need to create a document definition for the dictionary XML. I called mine DictionaryXMLSchema and the Schema. Up until now we have been using the dictionary XML without a schema. However, for our serializer we need an envelope that represents the document's persisted form. That schema looks like this:

<?xml version="1.0" ?> 
<!--  Generated by using BizTalk Editor on Wed, Dec 20 2000 10:59:42 AM  --> 
<!--  Microsoft Corporation (c) 2000 (http://www.microsoft.com)   --> 
<Schema name="DICTIONARY" b:BizTalkServerEditorTool_
Version="1.0" b:root_
reference="DICTIONARY" b:standard="XML" xmlns="urn:schemas-microsoft-
      com:xml-data" xmlns:b="urn:schemas-microsoft-
      com:BizTalkServer" xmlns:d="urn:schemas-microsoft-
      com:datatypes">
  <b:SelectionFields /> 
<ElementType name="VALUE" content="textOnly" model="open">
<b:RecordInfo /> 
<AttributeType name="xml-space">
 <b:FieldInfo /> 
</AttributeType>
<AttributeType name="dt_dt">
  <b:FieldInfo /> 
</AttributeType>
<attribute type="dt_dt" /> 
<attribute type="xml-space" /> 
</ElementType>
<ElementType name="DICTITEM" content="eltOnly" model="open">
<b:RecordInfo /> 
<AttributeType name="key">
  <b:FieldInfo /> 
</AttributeType>
<attribute type="key" /> 
 <element type="VALUE" maxOccurs="*" minOccurs="0" /> 
</ElementType>
<ElementType name="DICTIONARY" content="eltOnly" model="open">
<b:RecordInfo /> 
<AttributeType name="version">
<b:FieldInfo /> 
 </AttributeType>
 <attribute type="version" /> 
 <element type="DICTITEM" maxOccurs="*" minOccurs="0" /> 
</ElementType>
</Schema>

Once you have created the document definition, the next step is to create an envelope of CUSTOM type that uses this document definition. I called mine DictionaryXML.

Ee265734.buildbiztalkserverps10(en-US,BTS.10).gif

Figure 10. Creating an envelope

Once the envelope is created, you create your associated port and channel. I created a separate port and channel for clarity, but you could reuse the ones from the parser test and modify the envelope being used.

In the envelope page we must specify the envelope that we just created. Notice the Delimiters button. If you put delimiters in here, they will show up in the dictionary passed into the serializer's Init method.

Ee265734.buildbiztalkserverps11(en-US,BTS.10).gif

Figure 11. Setting the port properties to include the envelope

The next change is creating the channel. At the end of the channel setup, in the Advanced Configuration dialog box, you will need to click the Advanced button so that we can point to our serializer. Once you have clicked it and receive the tabbed dialog box, click the Envelope tab. From the drop-down list, select the serializer.

Ee265734.buildbiztalkserverps12(en-US,BTS.10).gif

Figure 12. Setting the channel properties to use the custom serializer

Once that is selected, you should be able to modify the receive function from earlier to point to the new channel that is set up with the serializer, run a document through the parser and serializer, and test to see if it produces the correct output. Of course, I would suggest running debug first to step through the code.

Running the Serializer

Once messaging has been configured or reconfigured, to use the custom serializer (and envelope), you can run the same example files that were used for the parser. If you set up a separate set of messaging artifacts for testing the serializer, you will need to either create a new File receive function or change the existing one to use the new channel.

Execution and debugging should work the same as before. The major difference is that now once the document has passed through the channel, the end result should be a dictionary XML document and not an equation XML document.

Implementing the Serializer Summary

After implementing the parser, the serializer should have seemed simple. This is due to the fact that we were dealing with a single document at a time, whereas in the parser we had multiple documents in the interchange and a little more complication. We looked at how to retrieve the data and format it for sending it back into the stream. We also looked at how to tell BizTalk Server to track our outgoing document so that we could use BizTalk Document Tracking not only to look at our inbound documents in native and XML formats, but also to look at our outbound documents. Finally, we looked at how to set up BizTalk Server to use the serializer that we implemented.

Overall Summary

While the BizTalk Server interfaces for integrating your own form of parsing and serializing at first seem daunting, the challenge is knowing what BizTalk Server is expecting in the method calls and that, in fact, it boils down to parsing a character string.

While not covered in this document in detail, we briefly touched on what one would have to change to implement the handling of groups within a single interchange. Additionally, we referred to the fact that if we want to implement our own custom correlation, we would implement the parser to expect receipts. Once the parser found that the incoming document was indeed a receipt, the parser would parse the receipt and pass back to BizTalk Server a ProgID for a custom correlator to handle processing the receipt.

In reality, this example was not complex, but I hope that it served its purpose of delineating the information and facilitating the review of the documents so that you could see the incoming and outgoing documents in a way as to help make everything lucid. Included with the source download that accompanies this document will be the BizTalk Server 2000 exports for the channels, ports, envelopes, and document definitions used in this example.

Appendix: Code Snippets

The following two miscellaneous functions handle calls to the IPersistXML interface of a Commerce.Dictionary object.

HRESULT STDMETHODCALLTYPE RehydrateDictionary(BSTR DictXML, CComPtr<IDictionary> pDict)
{
   HRESULT hr;
   CComPtr<IPersistXML> pPersistXML;
   hr = pDict.QueryInterface(&pPersistXML);
   if(SUCCEEDED(hr)){ hr = pPersistXML->LoadXML( NULL, DictXML);}
   
   return hr;
}

HRESULT DehydrateDictionary (BSTR* outXML, CComPtr<IDictionary> pDict)
{
HRESULT hr;
CComPtr<IPersistXML>   pPersistXML;

hr = pDict.QueryInterface(&pPersistXML);
hr = pPersistXML->SaveXML(NULL, outXML);
return hr;
}

This method serves a heuristic purpose in facilitating inspection of the contents of Commerce.Dictionary objects.

HRESULT STDMETHODCALLTYPE TraceDictionaryValues(IDictionary *pDict)
{
HRESULT                             hr;
CComPtr<IEnumVARIANT>   pEnumVar;
CComPtr<IUnknown>           pEnum;
CComVariant                        *ItemsInDict;
CComVariant                        ValuesInDict;
CComBSTR                           TraceMSG;
ULONG                                NumFetched;
long                                    ElemsInDict;
long                                    idxElem;

TraceMSG = "\n";
hr = pDict->get__NewEnum (&pEnum);
if(SUCCEEDED(hr)){hr = pDict->get_Count(&ElemsInDict);}
ItemsInDict = new CComVariant[ElemsInDict];
if(SUCCEEDED(hr)){hr = pEnum.QueryInterface (&pEnumVar);}
   
hr = pEnumVar->Next(ElemsInDict, ItemsInDict, &NumFetched);
if(SUCCEEDED(hr))
{
      for (idxElem=0;idxElem<ElemsInDict;idxElem++)
      {
         if (ItemsInDict[idxElem].vt ==  VT_BSTR)
         {
             hr = GetDictValue(pDict, ItemsInDict[idxElem].bstrVal , &ValuesInDict);
             if(SUCCEEDED(hr))
             {
                 TraceMSG += ItemsInDict[idxElem].bstrVal;
                 TraceMSG += " = ";
                 TraceMSG += ValuesInDict.bstrVal;
                 TraceMSG += "\n";
             }
         }
      }  //end for loop
}
if(SUCCEEDED(hr)){ATLTRACE(TraceMSG);}
return hr;
}

This is a preliminary document and may be changed substantially prior to final commercial release. This document is provided for informational purposes only and Microsoft makes no warranties, either express or implied, in this document. Information in this document is subject to change without notice. The entire risk of the use or the results of the use of this document remains with the user. The example companies, organizations, products, people and events depicted herein are fictitious. No association with any real company, organization, product, person or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

© 2001 Microsoft Corporation. All rights reserved.

Microsoft, BizTalk, Visual C++, and Visual Studio are either registered trademarks or trademarks of Microsoft Corporation in the U.S.A. and/or other countries.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Show:
© 2014 Microsoft