Using the XML Diff and Patch Tool in Your Applications

 

Neetu Rajpal
Microsoft Corporation

August 29, 2002

Download Xmldiffpatch.exe.

Note The example companies, organizations, products, domain names, e-mail addresses, logos, people, places, data, and events depicted herein are fictitious. No association with any real company, organization, product, domain name, e-mail address, logo, person, places, data, or events is intended or should be inferred.

Summary: Explains how to use the XML Diff and Patch tool, which compares two XML files and produces an XML output of the differences, by utilizing a typical scenario that readers can apply to their own applications. (7 printed pages)

XML Diff and Patch Tool Use Scenario

Fabrikam Car Repair is a standard car repair shop. Customers drop-off their cars that need repairs, and Fabrikam Car Repair evaluates the condition of the car and calls the customer back with the estimated repair costs. When deciding what to charge the customer for replacing a particular part, they take the cost of the part (A) and add a fixed surcharge (B), and then add labor (C). So, A+B+C equals the price that the customer is quoted for repairs.

The price of the parts changes based on the costs from the car manufacturer. In the past, Fabrikam employees manually calculated the cost of the repairs every time a new price list was published, but then they realized that they could write a computer program to calculate the cost. They also found the added advantage of automatically generating the monthly sales reports using this data by transforming the data using XSLT.

Fabrikam decided to store the price list in XML and write XSLT to transform the data to generate the customer's receipt, and also to generate the monthly sales reports. This increased productivity because only one data file and two XSLT files were necessary to dynamically create sales reports and repair quotes, effectively eliminating the need to manually generate these materials.

However, every time the manufacturer updates the pricelist, the programmers at Fabrikam Car Repair have to open their XML files, manually identify all the changes, and then sync up the files. Identifying the differences between two XML files is not an easy task. Let's consider a few differences that make a lot of sense when you are comparing two text files, but might be dubious when comparing two XML files.

Code comparison 1

The two XML files below have two different namespace prefixes, but the namespace Uniform Resource Identifier (URI) values are the same, and the local names of the elements are the same.

XML file A XML file B
<a:Car xmlns:a=http://www.ford.com>

<a:Name> Taurus </a:Name>

</a:Car>

 

<b:Car xmlns:b=http://www.ford.com>

<b:Name> Taurus </b:Name>

</b:Car>

 

Code comparison 2

These two XML files have the same elements, but the order of children in the elements is different.

XML file A XML file B
<Car>

<Name> Taurus </Name>

<Color> White </Color>

</Car>

 

<Car>

<Color> White </Color>

<Name> Taurus </Name>

</Car>

 

Code comparison 3

These two XML files have the same elements and they are in the same order, but their attributes are in a different order.

XML file A XML file B
<Car

name="Taurus"

color="White"

/>

 

<Car

color="White"

name="Taurus"

/>

 

Code comparison 4

The two XML files below have the same elements, namespaces, namespace prefixes, and child and element order, but there are extra spaces between the elements.

XML file A XML file B
<Car>

<Name> Taurus </Name>

</Car>

 

<Car><Name>Taurus</Name></Car> 

All of the variations mentioned above have different levels of significance attached to them based on the particular XML application. Such a problem requires a solution that has a native understanding of XML rules and does much more than a simple text comparison. To that end, the Microsoft Webdata team created the XML Diff and Patch tool.

Let's apply the Diff and Patch tool API to solve the Fabrikam Car Repair problem. A portion of the XML file (pricelist.xml) is listed below. This is the file that the XSLT is applied to, which means that this is the file that the programmer has to manually update when the manufacturer provides a new price list.

<PartPriceInfo xmlns:ns1="http://www.Subaru.com">
   <ns1:Subaru model="Legacy">
      <ns1:Muffler> 400 </ns1:Muffler>
      <ns1:Bumper> 100 </ns1:Bumper>
      <ns1:Floormat> 50 </ns1:Floormat>
      <ns1:WindShieldWipers> 20 </ns1:WindShieldWipers>
   </ns1:Subaru>
   <ns1:Subaru model="Outback">
      <ns1:Muffler> 500 </ns1:Muffler>
      <ns1:Bumper> 150 </ns1:Bumper>
      <ns1:Floormat> 75 </ns1:Floormat>
      <ns1:WindShieldWipers> 20 </ns1:WindShieldWipers>
   </ns1:Subaru>
</PartPriceInfo>

Here's the new pricelist submitted by the manufacturer (newpricelist.xml):

<PartPriceInfo xmlns:ns2="http://www.Subaru.com">
   <ns2:Subaru model="Outback">
      <ns2:Muffler> 600 </ns2:Muffler>
      <ns2:Bumper> 150 </ns2:Bumper>
      <ns2:Floormat> 75 </ns2:Floormat>
      <ns2:WindShieldWipers> 25 </ns2:WindShieldWipers>
   </ns2:Subaru>
   <ns2:Subaru model="Legacy">
      <ns2:Muffler> 400 </ns2:Muffler>
      <ns2:Bumper> 100 </ns2:Bumper>
      <ns2:Floormat> 50 </ns2:Floormat>
      <ns2:WindShieldWipers> 20 </ns2:WindShieldWipers>
   </ns2:Subaru>
   <ns2:Subaru model="Impreza">
      <ns2:Muffler> 450 </ns2:Muffler>
      <ns2:Bumper> 120 </ns2:Bumper>
      <ns2:Floormat> 65 </ns2:Floormat>
      <ns2:WindShieldWipers> 20 </ns2:WindShieldWipers>
   </ns2:Subaru>
</PartPriceInfo>

Here is the list of differences between the two files:

  • The order of the children under PartPriceInfo is different.
  • PartPriceInfo has an extra child.
  • <Subaru model="Outback"> has different values for <Muffler> and for <WindShieldWipers>.
  • The prefix for the namespace http://www.Subaru.com is different.

The programmer is only interested in the differences listed in the second and third bullets. If these files were bigger (which they likely would be in such a scenario), it would be pretty tedious to figure out the differences through visual inspection of the files.

Here is the code that the programmer can to find out the differences between the files using the XML Diff and Patch tool:

public void GenerateDiffGram(string originalFile, string finalFile, 
                                    XmlWriter diffGramWriter)
{
   XmlDiff xmldiff = new XmlDiff(XmlDiffOptions.IgnoreChildOrder | 
                                    XmlDiffOptions.IgnoreNamespaces | 
                                    XmlDiffOptions.IgnorePrefixes);
   bool bIdentical = xmldiff.Compare(originalFile, newFile, false, diffgramWriter);
   diffgramWriter.Close();
}

XmlDiffOptions.IgnoreChildOrder says that the order in which the children appear under an element is not important to this application. XmlDiffOptions.IgnoreNamespaces specifies that only differences in the local names are important, and XmlDiffOptions.IgnorePrefixes means that the different prefixes for the element names should not be recognized as differences. In our case, the originalFile is pricelist.xml and newFile is newpricelist.xml.

The third parameter of false means that the two parameters we want to compare represent complete XML documents and not fragments of an XML document. A complete XML document has to conform to all of the top-level rules of XML (one and only one root element, for example), whereas an XML fragment could be any set of XML constructs. The System.Xml.XmlTextReader actually has the ability to parse XML fragments, as well as full documents. The Compare() method returns true if the two files are identical, and false otherwise.

The last argument, diffgramWriter, is where the output of the comparison is written. The output generated is an XML document that records the differences between the two files. Here is what it looks like in this scenario:

<?xml version="1.0" encoding="utf-16" ?> 
<xd:xmldiff version="1.0" srcDocHash="2079810781567709607" 
options="IgnoreChildOrder IgnoreNamespaces IgnorePrefixes" 
xmlns:xd="https://schemas.microsoft.com/xmltools/2002/xmldiff">
   <xd:node match="1">
      <xd:add type="1" name="Subaru" ns="http://www.Subaru.com" prefix="ns2">
         <xd:add type="2" name="model">Impreza</xd:add> 
            <xd:add>
               <ns2:Muffler xmlns:ns2="http://www.Subaru.com">450</ns2:Muffler> 
               <ns2:Bumper xmlns:ns2="http://www.Subaru.com">120</ns2:Bumper> 
               <ns2:Floormat xmlns:ns2="http://www.Subaru.com">65</ns2:Floormat> 
            </xd:add>
         <xd:add match="/1/2/4" opid="1" /> 
      </xd:add>
      <xd:node match="2">
         <xd:node match="1">
            <xd:change match="1">600</xd:change> 
         </xd:node>
         <xd:add>
            <ns2:WindShieldWipers xmlns:ns2="http://www.Subaru.com">25</ns2:WindShieldWipers>
         </xd:add>
         <xd:remove match="4" opid="1" /> 
      </xd:node>
   </xd:node>
   <xd:descriptor opid="1" type="move" /> 
</xd:xmldiff>   

This file records all the changes, additions, and deletions to the original XML file that were made to create the final XML file. Suffice it to say, this file, as it exists, does not help our Fabrikam Car Repair shop programmer. However, this information is vital and sufficient for the XMLPatch class in the XML Diff and Patch tool to patch up the original file to create the new changed file. Here is the code to do it:

public void PatchUp(string originalFile, String diffGramFile, String OutputFile)
{
   XmlDocument sourceDoc = new XmlDocument(new NameTable());
   sourceDoc.Load(originalFile);
   XmlTextReader diffgramReader = new XmlTextReader(diffGramFile);

   xmlpatch.Patch(sourceDoc,diffgramReader);

   XmlTextWriter output = new XmlTextWriter(OutputFile,Encoding.Unicode);
   sourceDoc.Save(output);
   output.Close();
}

The XmlPatch.Patch method takes the XmlDocument that represents the original file and applies the diffgram to it to generate the patched-up file. Here is what the patched-up file (patchedPriceList.xml) looks like:

<?xml version="1.0" encoding="utf-16" ?> 
<PartPriceInfo xmlns:ns1="http://www.Subaru.com">
   <ns2:Subaru model="Impreza" xmlns:ns2="http://www.Subaru.com">
      <ns2:Muffler xmlns:ns2="http://www.Subaru.com">450</ns2:Muffler> 
      <ns2:Bumper xmlns:ns2="http://www.Subaru.com">120</ns2:Bumper> 
      <ns2:Floormat xmlns:ns2="http://www.Subaru.com">65</ns2:Floormat> 
      <ns1:WindShieldWipers>20</ns1:WindShieldWipers> 
   </ns2:Subaru>
   <ns1:Subaru model="Legacy">
      <ns1:Muffler>400</ns1:Muffler> 
      <ns1:Bumper>100</ns1:Bumper> 
      <ns1:Floormat>50</ns1:Floormat> 
      <ns1:WindShieldWipers>20</ns1:WindShieldWipers> 
   </ns1:Subaru>
   <ns1:Subaru model="Outback">
      <ns1:Muffler>600</ns1:Muffler> 
      <ns2:WindShieldWipers xmlns:ns2="http://www.Subaru.com">25</ns2:WindShieldWipers> 
      <ns1:Bumper>150</ns1:Bumper> 
      <ns1:Floormat>75</ns1:Floormat> 
   </ns1:Subaru>
</PartPriceInfo>

PatchedPriceList.xml has a different child order than the newpricelist.xml only because we had specified XMLDiffOptions.IgnoreChildOrder in our original compare. Had we not specified XMLDiffOptions.IgnoreChildOrder, the child order would have been maintained and the diffgram would look a bit different.

Conclusion

By using the XMLDiff class, the programmer is able to determine if the two files are in fact different based on the conditions that are important to their application. The programmer is able to ignore changes that are only superficial (for example, different prefixes for same namespace). XMLPatch then provides the ability to update the original XML by applying only the changes that matter to the original XML.

Questions and comments regarding this article can be posted at https://www.gotdotnet.com/community/messageboard/MessageBoard.aspx?id=207.