Validating XML Data in .NET

When you develop .NET applications, you use the XmlTextReader class to parse XML documents. Although the .NET Framework provides full XML Document Object Model (DOM) support and Simple API for XML (SAX) classes, XML readers and writers present an excellent balance of execution speed, memory footprint, productivity, and ease of use for XML developers. But the XmlTextReader class doesn't do data validation. To validate data, you need to use the XmlValidatingReader class.

Let's consider how data validation works in .NET applications. In the .NET Framework, you implement XML DOM classes on top of readers. To gain more speed while parsing, XML readers don't provide for data validation. But in general, data validation against a Document Type Definition (DTD), XML Data Reduced (XDR), or schema is useful, particularly during the development cycle. After you determine the application's behavior and all its constituent modules, you can optimize the application's performance by dropping the code that validates data because you don't need to continuously validate the format of data that survived the test cycle. Dropping validation code is a common practice that isn't much different from removing debug information when you're ready to ship any Windows application. But when you develop XML applications, some exceptions make data validation necessary every time a piece of code is called to action.

You need data validation, for example, when you process incoming data whose origin isn't certified or whose content layout isn't completely predictable (i.e., cases in which the text that users type at runtime determines the XML data layout).

To validate XML data in .NET, you must use the XmlValidatingReader class, which derives from the XmlTextReader class and works in much the same way. Unlike the parent class, XmlValidatingReader lets you set the required validation type and the action to take in case of errors. When you call the XmlValidatingReader's Read method, the method—in addition to jumping from one node to the next—checks the structure of the node and the node's attributes against the specified validation type.

You usually create an instance of the XmlValidatingReader class from a valid instance of an XmlReader-based class:

XmlTextReader r = new XmlTextReader(fileName);
XmlValidatingReader vr = new XmlValidatingReader(r);

You cannot create a validating reader directly from a filename. By contrast, you can set the validating reader to validate only an XML string fragment that includes certain types of nodes in a certain parsing context.

After you've obtained a running instance of XmlValidatingReader, you use it just as you would its parent class. The following code shows how you can configure the object to automatically detect the validation type required and enforce the type's rules.

vr.ValidationType = ValidationType.Auto;
vr.ValidationEventHandler += new

The XmlValidatingReader class supports up to four validation types: Auto, XDR, Schema, and DTD. The ValidationType enumeration collects these values. You can also use a fifth option: None, meaning that you don't want the class to validate the data.

By setting the ValidationType property on the reader to Auto, you ask the class to apply an algorithm to determine the source document's integrity. The class first checks for a DTD defined in a declaration. The class loads and processes any DTD it finds. If it doesn't find a DTD, the class looks for an XML Schema Definition (XSD) schemaLocation attribute. If the class can't find an XSD declaration, it checks for an XDR x-schema attribute and attempts to locate the resource. Last, the class checks for inline schemas embedded with the tag. If you specify a precise schema to validate, the process is more direct but it leaves no margin for dynamically changing data.

The validating reader is, first, a reader that jumps from one node to the next. The class validates on the single node that's visited each time you call Read. In case of errors, the validation process can be either blocking or nonblocking. The process is blocking if you don't specify an event handler through the ValidationEventHandler property. If you don't specify a handler, the first exception originates an XmlException whose handling you determine. The exception blocks the process unless your handling code resumes it.

If you provide a specific ValidationEventHandler handler, the class calls it for any exception that needs to be raised during the process. If not, the process doesn't stop until it reaches the end of the file.

Because the validation reader is simply a reader, you can't validate first then proceed with normal processing if validation succeeds. But you can use separate readers for validating and reading. If you want to use a shared reader, be ready to write slightly more sophisticated code to manage normal reading and possible validation errors.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.