Validate XML Data Feeds With XML Schemas

Write a reusable XML validation class to validate XML documents against DTDs as well as XDR and XSD schemas.

XtremeData

LANGUAGES: C#

TECHNOLOGIES: XML Schemas | Windows Services

 

Validate XML Data Feeds With XML Schemas

Write a reusable XML validation class to validate XML documents against DTDs as well as XDR and XSD schemas.

 

By Dan Wahlin

 

Validating XML is important any time the structure or data contained within an XML document must follow a predefined format for an application to use it. You can use several different formats to validate XML documents, including Document Type Definitions (DTDs), XML Data Reduced (XDR) schemas, and W3C XML (XSD) schemas. Although the .NET Framework provides support for each of these formats, XML schemas arguably provide the greatest power and flexibility in validating XML. I'll assume you have a general feel for how DTDs and schemas are created as well as why they are used.

 

In this article, I'll show you how to write a reusable XML-validation class you can use to validate XML documents against DTDs as well as XDR and XSD schemas. Once I've gone over the details of the class (named XmlValidator), I'll show you how to plug it into the XmlImportService Windows service - detailed in my previous article entitled Improve Data Exchange - to validate "dropped" XML documents. Because XmlValidator is a regular .NET class, you can use it in any .NET application that requires XML document-validation capabilities.

 

I'll start with a brief overview of the XmlImportService Windows service's functionality here in case you haven't read my previous article. XmlImportService is responsible for watching a given directory on the file system for XML document drops. This task is accomplished using the FileSystemWatcher class in the System.IO namespace. Once an XML file is detected in the watch folder, it needs to be validated using the XmlValidator class I'll describe in this article. If the document passes the validation process, it will be parsed using the XmlTextReader, and SQL insert statements will be generated to insert rows into the Northwind Database's Customers table. Because XmlImportService runs as a Windows service, you can configure it to start automatically even if a user is not logged into the system. Figure 1 provides an overview of the functionality the XmlImportService application provides.

 


Figure 1. The XmlImportService watches for XML documents in a "drop" directory, then validates and parses them. The XmlValidator component discussed in this article will be plugged into the XmlImportService application to allow for XML validation.

 

Create the XmlValidator Class

XML validation within the .NET Framework is performed by the XmlValidatingReader class in the System.Xml namespace. You can hook up this class to DTDs as well as to different types of schemas validating XML documents. When errors occur during the validation process, a delegate named ValidationEventHandler points to an event handler through which you can access the errors. ValidationEventHandler is associated with an event exposed by XmlValidatingReader that also is named ValidationEventHandler.

 

The ValidationEventHandler delegate has this signature:

 

public delegate void ValidationEventHandler(

    object sender,   ValidationEventArgs e);

 

The ValidationEventArgs parameter has Exception, Message, and Severity properties you can use to access detailed error information.

 

XmlValidatingReader derives from the abstract XmlReader class and therefore acts much like the XmlTextReader class you can use to parse XML documents in a forward-only manner. Throughout this article, you'll see how to use the XmlValidatingReader's schemas and ValidationType properties, its Read method, its ValidationEventHandler event, and two of its overloaded constructors, all within the custom XmlValidator class. You can find a complete listing of the XmlValidatingReader's different properties and methods in the .NET SDK.

 

Figure 2 contains the shell for the custom XmlValidator class. Notice that it contains several different fields used to track data during the validation process; it also contains methods named ValidateXml and ValidationCallBack.

 

public class XmlValidator {

    bool _valid;

    string _logFile;

    string _validationErrors = String.Empty;

    XmlTextReader xmlReader = null;

    XmlValidatingReader vReader = null;

 

    public ValidationStatus ValidateXml(object xml,

        XmlSchemaCollection schemaCol, string[] dtdInfo,

        string logFile) {

 

    }

 

    private void ValidationCallBack(object sender,

      ValidationEventArgs args) {

 

    }

}

Figure 2. You can validate XML documents against DTDs or schemas by using the XmlValidator class's XmlValidatingReader class. I'll discuss the code that goes within the different methods later in this article.

 

Consumers of the XmlValidator class will call the ValidateXml method to validate XML documents against DTDs or schemas. This method accepts an object representing either the path to the XML document or the XML document loaded into a StringReader object; an XmlSchemaCollection object containing one or more schemas used to validate the XML document; a string array containing any needed DTD information; and the path to the error-log file. ValidateXml returns a struct named ValidationStatus, which contains the status of the validation operation:

 

public struct ValidationStatus {

    public bool Status;

    public string ErrorMessages;

}

 

The ValidationStatus struct contains two public fields: Status and ErrorMessages.

 

Any validation errors that occur while the XML document is validated cause the ValidationCallBack method to be called. If an error-log file path is passed into ValidateXml, the error details will be written to the file by ValidationCallBack. Figure 3 shows the complete code for this method.

 

private void ValidationCallBack(object sender,

    ValidationEventArgs args)   {

    _valid = false;

    DateTime today = DateTime.Now;

 

    StreamWriter writer = null;

    try {      

        if (_logFile != null) {

            writer = new StreamWriter(_logFile,true,

                Encoding.ASCII);

            writer.WriteLine("Validation error in XML: ");

            writer.WriteLine();

            writer.WriteLine(args.Message + " " +

                today.ToString());

            writer.WriteLine();

            if (xmlReader.LineNumber > 0) {

                writer.WriteLine("Line: " +

                    xmlReader.LineNumber +

                    " Position:" + xmlReader.LinePosition);

            }

            writer.WriteLine();

            writer.Flush();

        } else {

            _validationErrors = args.Message + " Line: " +

                xmlReader.LineNumber + " Column:" +

                xmlReader.LinePosition + "\n\n";

        }

    }

    catch {}

    finally {

        if (writer != null) {

            writer.Close();

        }

    }

}

Figure 3. ValidationCallBack is responsible for accessing error information passed by the XmlValidatingReader as it validates XML documents. If an error-log file is supplied, the method will write the error details to the file using a StreamWriter object so the errors can be viewed later. Although this example uses a StreamWriter, you also could use the XmlTextWriter when an XML error file needs to be generated.

 

Now that I've explained the supporting code behind ValidateXml, let's take a look at the inner workings of the method. Figure 4 shows the complete code for ValidateXml.

 

public ValidationStatus ValidateXml(object xml,

    XmlSchemaCollection schemaCol, string[] dtdInfo,

    string logFile) {

    _logFile = logFile;

    _valid = true;

 

    try {

        if (xml is StringReader) xmlReader =

            new XmlTextReader((StringReader)xml);

        if (xml is String)   xmlReader =

            new XmlTextReader((string)xml);

    

        //DTD info can be passed into the ValidateXml()

        //method to validate XML dynamically against

        //a DTD event when the DOCTYPE keyword is not

        //added directly into the XML document.

    

        //The dtdInfo array should contain the name of

        //the root tag (docTypeName) in position 0 and

        //the path to the DTD in position 1 of the array:

        //string[] dtdInfo = {"customers",dtdPath};

        if (dtdInfo != null && dtdInfo.Length > 0) {

            XmlParserContext context =

                new XmlParserContext(null,null,

                dtdInfo[0],"",dtdInfo[1],"",

                dtdInfo[1],"",XmlSpace.Default);

            xmlReader.MoveToContent();

            vReader =

                new XmlValidatingReader(

                    xmlReader.ReadOuterXml(),

                    XmlNodeType.Element,context);

                vReader.ValidationType =

                    ValidationType.DTD;

        } else {

            vReader = new XmlValidatingReader(xmlReader);

            vReader.ValidationType = ValidationType.Auto;

            if (schemaCol != null) {

                vReader.Schemas.Add(schemaCol);

            }

        }

 

        vReader.ValidationEventHandler +=

            new ValidationEventHandler

             (this.ValidationCallBack);

        //Parse through XML

        while (vReader.Read()){}

    } catch  {

        _valid = false;

    } finally {  //Close our readers

        if (xmlReader != null) xmlReader.Close();

        if (vReader != null) vReader.Close();

    }

    ValidationStatus status = new ValidationStatus();

    status.Status = _valid;

    status.ErrorMessages = _validationErrors;

    return status;

}

Figure 4. The ValidateXml method accepts several parameters that determine how to locate the XML document as well as the DTD or schema documents used in the validation process. First, the XML document is loaded into an XmlTextReader, which is passed to the XmlValidatingReader's constructor. The ValidationEventHandler event is hooked up to the ValidationEventHandler delegate, and the Read method is called.

 

By looking at the code in Figure 4, you'll see that when the ValidateXml method is first called, the type passed into the xml object parameter is examined. You can determine whether the caller of the method passed the path to the XML document (as a string) or to a StringReader loaded with the document by using the is keyword:

 

if (xml is StringReader) xmlReader =

    new XmlTextReader((StringReader)xml);

if (xml is string)   xmlReader =

    new XmlTextReader((string)xml);

 

After the XmlTextReader is created, the type of document used to validate the XML is determined. Although you can pass DTD info to the ValidateXml method, I'll focus on validating against XSD schemas in this article. If you're interested in using the XmlValidator class to validate XML documents against DTDs dynamically, you'll find that the code shown in Figure 4 contains a few comments explaining how to use the dtdInfo parameter.

 

Assuming that validation will be performed using an XSD schema, the code creates a new XmlValidatingReader class, sets the ValidationType property to ValidationType.Auto - this allows for validating against XDR or XSD schemas - and associates it with the XmlSchemaCollection object passed into ValidateXml by referencing the Schemas property:

 

vReader = new XmlValidatingReader(xmlReader);

vReader.ValidationType = ValidationType.Auto;

if (schemaCol != null) vReader.Schemas.Add(schemaCol);

 

The code shown in Figure 4 finishes by hooking the ValidationEventHandler event to the ValidationCallBack event handler discussed earlier using the ValidationEventHandler delegate. Then, the code calls the XmlValidatingReader's Read method. Any errors found while the XML document is read are directed to ValidationCallBack:

 

Adding Validation Capabilities to XmlImportService

 

Now that you've seen how the XmlValidator class works, let's take a look at how you can use it within the XmlImportService Windows service. When an XML document is dropped into the watch folder, the XmlValidator class is invoked, so the XML document can be validated (see Figure 1). If the XML document is valid, it will be parsed with the XmlTextReader class, and SQL statements will be generated and executed.

 

Figure 5 shows the code used to invoke the XmlValidator class from within the XmlImportService's Folder_OnChanged event handler. This event handler is called by the FileSystemWatcher class when an XML document is dropped into the watch folder. Many of the settings, such as the folder to watch and schema to use for validation, are stored in an application's configuration file (see Figure 6). Notice that the validation process is simplified greatly by using XmlValidator. The code passes the path to the XML document that should be validated, an XmlSchemaCollection object that contains the XSD schema to use, and the path to the error log file to the ValidateXml method. After the method is called, the ValidationStatus struct's Status property is checked to see whether validation was successful. You can download the complete code for the XmlImportService application as well as the XmlValidator class.

 

//*** Validate XML document against schema

//Get schema document path

string schemaPath =

    ConfigurationSettings.AppSettings["XmlSchemaFile"];

if (schemaPath != null) {

    //XML doc should be validated against schema

    //Create XmlValidator object

    XmlValidator validator = new XmlValidator();

  

    //Build schema collection

    XmlSchemaCollection schemaCol =

     new XmlSchemaCollection();

    schemaCol.Add(null,schemaPath);

  

    //Get error log file

    string logFile =

        ConfigurationSettings.AppSettings[

        "XmlValidationErrorLogFile"];

    //Validate Document

    ValidationStatus validationStatus =

        validator.ValidateXml(filePath,

        schemaCol,null,((logFile != null)?logFile:null));

 

    //Check validation status

    if (!validationStatus.Status) {

        //Log that errors occurred during validation and

        //stop processing on this document

        this.WriteToLog("Validation of " + filePath +

            " failed. " +

            "See error log file for more details.");

        File.Move(filePath,filePath + "." +

            Guid.NewGuid().ToString() + ".notValid");

        return;

    } else { //Validation successful

        this.WriteToLog(filePath +

        " validated successfully. ");

    }

}

Figure 5. The XmlValidator class can be used by any application that needs the capability to validate XML documents. The code shown here is added into the XmlImportService application's File_OnChanged event.

 

    

        

            value="c:\Program Files\Wahlin

              Consulting\XmlImportServiceSetup\Data" />

        

            value="customers.xml" />

        

            value="c:\Program Files\Wahlin

              Consulting\XmlImportServiceSetup\Data\

              Schemas\Customers.xsd" />

        

            value="c:\Program Files\Wahlin            

              Consulting\XmlImportServiceSetup\

               Data\Errors\ErrorLog.xml" />

        

            value="server=localhost;uid=sa;pwd=;initial

              catalog=Northwind" />

    

Figure 6. The XmlImportService application's XML configuration file contains the location of the folder to watch as well as the schema to use in validating the XML documents that are dropped into the watch folder. By storing this information in a configuration file, you can make changes without recompiling the XmlImportService application.

 

By using XSD schemas, you can perform fine-grained structural and data-type validation within .NET XML applications. Validating XML documents can prevent multiple errors from occurring when an XML document is parsed or when XML data is moved into a database. This provides the obvious benefit of helping your applications run smoothly as other entities exchange XML data with you.

 

The sample code in this article is available for download.

 

Dan Wahlin (a Microsoft Most Valuable Professional in ASP.NET) is president of Wahlin Consulting and founded the XML for ASP.NET Developers Web site (http://www.XMLforASP.NET), which focuses on using XML and Web Services in Microsoft's .NET platform. He also is a corporate trainer and speaker, and he teaches XML and ASP.NET training courses around the United States. Dan co-authored Professional Windows DNA (Wrox) and ASP.NET Tips, Tutorials & Code (Sams), and he authored XML for ASP.NET Developers (Sams). E-mail Dan at [email protected].

 

Tell us what you think! Please send any comments about this article to [email protected]. Please include the article title and author.

 

 

 

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish