Skip navigation

Compare and Patch XML Documents

The XML Diff and Patch tool synchronizes data on different servers, allowing for easy transfer.

RELATED: "Displaying XML in .NET" and "Displaying XML in ASP.NET." 

Comparing XML documents and identifying differences can be a difficult task, particularly when writing this type of functionality from scratch. Fortunately, Microsoft has released version 1.0 of its XML Diff and Patch tool. This handy .NET assembly allows XML developers to compare XML documents, generate a differences document (referred to as an XDL DiffGram), and patch original documents to synchronize them with others (see Figure 1). Interestingly, this functionality potentially could allow cached XML data or configuration files on different servers to be synched. Instead of sending the entire modified document across the network, a DiffGram document - likely much smaller - could instead be sent so the appropriate data or files would be "patched." 

In this article, I'll provide an overview of the functionality found in the XML Diff and Patch tool and demonstrate how you can use it to generate DiffGram documents.

 

Get to Know the XML Diff Language

Before examining how to create .NET applications that can compare XML documents, let's take a quick look at the XML Diff Language (DiffGram) used by the XML Diff and Patch tool. This language defines several different elements used to determine when nodes should be added, removed, or changed. It also defines path descriptors that identify where modifications should occur in the original XML document. Figure 2 shows an exemplary XDL DiffGram document.





  

    

      3

    

    

      

         Jane Doe

      

    

  

Figure 2. The XML Diff and Patch tool relies on XDL DiffGrams to define patches that should be applied to XML documents. This example defines a change in an attribute named customerID and adds a new customer child node into the "patched" XML document.

Although I won't go into much detail about path descriptors because the tool generates them automatically for you, the help documentation contains this definition: "The XML Diff Language (XDL) uses path descriptors to identify the nodes in the source XML document. Path descriptors work on a DOM data model and use the node position as the core identifier for the nodes. XDL does not use XPath because the XPath data model differs from DOM." All path descriptors refer to the original source XML tree before changes are applied. When the path descriptor is applied to the first node of the source tree, which has been changed to be the third node in the changed tree, the path descriptor for this node is "1" because the node is first in the source document (here the source document functions as a base).

Aside from path descriptors, XDL DiffGrams also can maintain add, remove, and change elements. The add element is used when nodes that do not exist in the source XML document appear in the modified XML document. The following fragment, for example, specifies that a new customer node was found after the first node, which did not exist in the source XML document. Note that the path descriptor is shown in the match attribute:

  



    

    

      

        Jane Doe

      

    

  

The remove element identifies any nodes removed from the source XML document. The next example demonstrates how you can use it to remove the second node from an XML document:



    

  

Finally, the change element identifies any data value changes between the source and modified XML documents:



    

      1

    

  

The change element demonstrated in the preceding example changes the first node's customerID attribute to a value of 1.

Although you do not necessarily need to know the XDL DiffGram format to use the XML Diff and Patch tool, a basic understanding of what the tool generates can help you better understand what's going on behind the scenes. This can be helpful, particularly in situations where problems could arise. (The tool's help documentation contains many more details about the DiffGram format if you need more information.)

Now that you've seen what an XDL DiffGram looks like, let's examine how to create one using the classes that ship with the tool.

 

Find Differences Between Documents

To compare XML documents and find differences, you must first reference the Microsoft.XmlDiffPatch namespace within your application to access the classes it contains. The XmlDiff class contains the necessary properties and methods to perform a comparison between XML documents. Additionally, the class's properties allow you to control which parts of XML documents are to be compared. Different types of nodes - such as namespaces, white space, processing instructions, and more - may be ignored, if so desired.

After determining which nodes will and will not be compared, you can set the Algorithm property of the XmlDiff class to a value of Auto, Fast, or Precise. Figure 3 replicates the table from the XML Diff and Patch tool documentation that describes these enumeration values.

Member Name

Description

Auto

Default. Chooses the comparison algorithm for you depending on the size and assumed number of changes in the compared documents.

Fast

Compares the two XML documents by traversing the XML tree and comparing it node-by-node. This algorithm is fast but might produce less precise results. For example, it might detect an add-and-remove operation on a node instead of a move operation.

Precise

Based on an algorithm for finding editing distance between trees, also known as Zhang-Shasha algorithm. This algorithm gives very precise results, but it might be slow on large XML documents with many changes.

Figure 3. The XmlDiffAlgorithm enumeration contains the three differing members shown in this table. For larger XML documents, performance will be improved by using the Fast value; for smaller documents, you can find more precise differences by using the Precise value.

After the Algorithm property is set to a valid enumeration member, the Compare method can be called; it contains several different overloaded versions:

public Boolean Compare(String, String, Boolean)

public Boolean Compare(String, String, Boolean, XmlWriter)

public Boolean Compare(XmlNode, XmlNode)

public Boolean Compare(XmlNode, XmlNode, XmlWriter)

public Boolean Compare(XmlReader, XmlReader)

public Boolean Compare(XmlReader, XmlReader, XmlWriter)

Note that the first two overloads accept the paths to the XML documents being compared. You also can pass XML documents that have been loaded into an XmlNode object (an XmlDocument, for example) and even compare fragments against each other.

The example shown later in Figure 6 utilizes the last overload in the preceding list. This version accepts two XmlTextReader objects that contain the original and modified XML documents, as well as the XmlWriter that the difference document (DiffGram) eventually will be written to. To illustrate how the DiffGram document is generated from the differences between two web.config files, I have provided original and modified web.config documents, shown in Figures 4 and 5, respectively.



 

  

   

  

  

  

 

Figure 4. The original web.config document contains the standard elements and attributes found in a typical web.config file.



 

  

 

 

  

  

  

  

  

 

Figure 5. The modified web.config document contains an appSettings node as well as an associated child node (marked in bold) not found in the original web.config document shown in Figure 4.

Figure 6 shows a method named GenerateDifferences that acts as a wrapper around the XmlDiff class and its Compare method. GenerateDifferences accepts two input parameters representing the original and modified XML documents. It returns a Boolean value that informs the calling program whether differences exist or not. If a DiffGram document is generated, it is returned by the third parameter (named diffDoc), which is passed by reference from the calling program using the C# ref keyword.

public bool GenerateDifferences(string doc1, string doc2,

                ref string diffDoc) {

  //Create XmlDiff object for document comparison

  XmlDiff diff = new XmlDiff();

 

  //Set comparisons that should be ignored

  diff.IgnoreComments = true;

  diff.IgnorePI = true;

  diff.IgnoreWhitespace = true;

 

  //Choose most precise algorithm

  //For large documents look at using

  //the "Fast" algorithm

  diff.Algorithm = XmlDiffAlgorithm.Precise;

 

  //Compare documents and generate XDL diff document

  StringWriter sw = new StringWriter();

  XmlTextWriter writer = new XmlTextWriter(sw);

  writer.Formatting = Formatting.Indented;

 

  XmlTextReader originalReader =

     new XmlTextReader(new StringReader(doc1));

  XmlTextReader modifiedReader =

     new XmlTextReader(new StringReader(doc2));

  bool status = diff.Compare(originalReader,

                 modifiedReader,writer);

 

  //Output difference document (ref parameter)

  diffDoc = sw.ToString();

 

  //Close writer

  writer.Close();

  originalReader.Close();

  modifiedReader.Close();

 

  //return status

  return status;

}

Figure 6. The XmlDiff class allows XML documents to be compared by calling its Compare method. The GenerateDifferences method shown in this example permits two strings containing XML data to be compared to one another.

The code begins by creating an XmlDiff object and setting a portion of its properties. Within this example white space, comments and processing instructions are ignored as XML documents are compared. Note that the Algorithm property is assigned to a value of Precise. (For details on the Precise enumeration value, refer back to Figure 3.)

After creating the XmlDiff object, the code creates the XmlTextWriter object that will write out the DiffGram document. The Formatting property is set to Formatting.Indented so the DiffGram is indented nicely and thus easily legible. Two XmlTextReader objects are then created and loaded with the XML string data passed into the GenerateDifferences method. By using the StringReader class, XML data in the form of a String can be loaded directly into an XmlTextReader.

After the XmlTextReader objects are created, the XmlDiff object's Compare method is called and the two readers and writer are passed in as arguments. Compare returns a Boolean true when the documents under comparison match up, and a false when they have differences. In cases where the documents differ, the XmlTextWriter writes the DiffGram document to a StringWriter, which, in turn, is assigned to the "ref" parameter named diffDoc. Although this example accepts strings as input, you easily can modify it to handle file paths if desired.

 

Patch XML Documents

Once an XDL DiffGram document is generated by calling the XmlDiff class's Compare method, you can use the DiffGram to "patch" original documents and sync them with modified documents. You do this by using another class in the Microsoft.XmlDiffPatch namespace named XmlPatch. XmlPatch contains a method, named Patch, which handles the modification of the source XML document. The method has several different overloaded versions as shown below:

public void Patch(XmlDocument, XmlReader)

public void Patch(XmlNode, XmlReader)

public void Patch(String, Stream, XmlReader)

public void Patch(XmlReader, Stream, XmlReader)

In cases where the original document must be modified directly, you can use one of the first two versions of the Patch method. The DiffGram always is loaded into an XmlReader object before being passed as an argument. You also can pass the document generated from running the patch operation to a stream in cases where the document needs to be moved or saved in a different location. This is available in the final two overloaded versions illustrated previously. Figure 7 demonstrates how to utilize the first overloaded version to patch a document.

 

public string PatchXML(string original,string patchXml) {

  StringWriter sw = new StringWriter();

  XmlTextWriter writer = new XmlTextWriter(sw);

  writer.Formatting = Formatting.Indented;

 

  XmlDocument originalDoc = new XmlDocument();

  originalDoc.LoadXml(original);

 

  //Create XmlPatch object to perform patch operation

  XmlPatch patch = new XmlPatch();

  XmlTextReader reader =

     new XmlTextReader(new StringReader(patchXml));

 

  //Perform patch operation

  patch.Patch(originalDoc, reader);

  originalDoc.Save(writer);

  reader.Close();

  return sw.ToString();

}

Figure 7. The XmlPatch class can read an XDL DiffGram document and use it to patch an existing XML document. By shipping DiffGram documents around, documents can be kept in sync without being forced to ship a mass of data over the network, particularly in cases involving large XML documents.

The code in Figure 7 demonstrates how you could utilize the PatchXML method to wrap the functionality exposed by the XmlPatch class via its acceptance of both the original XML document and DiffGram document as a string. First, the original document is loaded into an XmlDocument while the DiffGram is loaded into an XmlTextReader. Both objects are then fed into the Patch method that applies any modifications to the original document. Figure 8 shows a portion of the output generated by running this article's sample application.


Figure 8. The XML Diff and Patch sample application allows original and modified XML documents to be compared. This figure shows the output generated when differences between the documents are found as well as how you can use these differences to "patch" an XML document.

The process of comparing and modifying XML documents has been greatly simplified with Microsoft's XML Diff and Patch tool. In this article you learned how to write wrapper methods around the comparison and patching functionality built into XML Diff and Patch and how it uses DiffGrams to track changes. By leveraging the XML Diff and Patch tool you easily can sync different XML documents containing cached data, configuration data, or data used for another purpose.

The code used in this article is available for download.

To see a live example of the code shown in this article, visit the XML for ASP.NET Developers Web site: http://www.xmlforasp.net/codeSection.aspx?csID=84.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish