Skip navigation

Generate XML Schemas Programmatically in .NET

Leverage classes in the System.Xml.Schema namespace and gain control over the schema generation process.

XML Schemas present an excellent way to describe the structure and types associated with an XML document. For more information, see "Using XML Schemas ." As a result, schemas are used ubiquitously throughout the .NET Framework in everything from Web services to DataSets to XML resource files. The .NET Framework even includes a tool named xsd.exe that, through automated processes, allows you to create schemas, convert between schema types (XDR to XSD), generate strongly typed DataSets, and create specialized schema-based classes. Although tools such as xsd.exe can reduce development time in many cases, there may be situations where you need more control over your schemas. In this article I'll demonstrate how you can gain complete control over the schema generation process by leveraging classes in the System.Xml.Schema namespace. To get the most out of this article, you should have a good understanding of XML schemas.

Although the System.Xml.Schema namespace plays an important role in .NET, several of its classes are used to support other classes, such as XmlSchemaCollection. As a result, you may not be aware of all the great schema-oriented features this namespace contains. Classes within the System.Xml.Schema namespace can be quite useful when an existing schema needs to be edited or when a schema needs to be created from scratch based on a given database structure, object hierarchy, or XML document.

There are over 60 classes within the System.Xml.Schema namespace that represent virtually every aspect of schemas - from regular expression pattern tags to complex and simple types. The main class that you'll use to start creating a customized schema is named XmlSchema; it can be used to create the root of a schema document (and associated namespaces, attributes, and so on) plus add elements, complex types, and more. Other important classes in this namespace include XmlSchemaElement, XmlSchemaAttribute, XmlSchemaComplexType, and XmlSchemaSimpleType - to name a few.  

 

Create a Schema Generator

Now that you've been introduced to a few of the main classes in the System.Xml.Schema namespace, let's examine how to create a custom class named SchemaBuilder that's capable of generating a schema from an existing XML document. SchemaBuilder contains a single public method named BuildSchema, whose signature looks like this:

public string BuildSchema(string xml,NestingType type) {}

BuildSchema is capable of creating two different styles of schemas, including Russian doll style (nested, complex types that mirror the XML document structure), and globally declared complex types that allow for better type reuse. The style of schema to build is determined by passing an enumeration named NestingType to BuildSchema:

public enum NestingType {

    RussianDoll,

    SeparateComplexTypes

}

In addition to passing the NestingType enumeration, the caller of the BuildSchema method also passes either a string containing the XML document to base the schema upon, or a path/URL pointing to an existing XML document.

Upon being called, BuildSchema creates a schema root element similar to the one shown here:

<xsd:schema attributeFormDefault="unqualified"

    elementFormDefault="qualified" version="1.0"

    xmlns:xsd="http://www.w3.org/2001/XMLSchema">

</xsd:schema>

The code to accomplish this task is shown in Figure 1. If you look through the code you'll see that it creates a new XmlSchema object and then calls various properties such as Version and ElementFormDefault to set attributes on the root element. The qualified schema namespace (held in a constant named SCHEMA_NAMESPACE) is added by creating a class named XmlSerializerNamespaces located in the System.Xml.Serialization namespace.

XmlSchema schema = new XmlSchema();

schema.ElementFormDefault = XmlSchemaForm.Qualified;

schema.AttributeFormDefault = XmlSchemaForm.Unqualified;

schema.Version = "1.0";

//Add additional namespaces using the Add() method shown

// below if desired

XmlSerializerNamespaces ns = new XmlSerializerNamespaces();

ns.Add("xsd", SCHEMA_NAMESPACE);

schema.Namespaces = ns;

Figure 1. When creating a schema root element, call properties on the XmlSchema class such as Version and ElementFormDefault. This example adds a qualified schema namespace, sets the schema version, and sets the elementFormDefault and attributeFormDefault attributes.

After the schema root element is created, the manner in which the XML document (that's used as the basis for the schema) should be loaded is analyzed. The code that executes this analysis relies upon the XmlDocument class (in the System.Xml namespace), as shown in Figure 2.

//Begin parsing source XML document

XmlDocument doc = new XmlDocument();

try {

    //Assume string XML

    doc.LoadXml(xml);

}

catch {

    //String XML load failed.   Try loading as a file path

    try {

        doc.Load(xml);

    }

    catch {

        return "XML document is not well-formed.";

    }

}

XmlElement root = doc.DocumentElement;

Figure 2. Perform a few simple tests to determine whether or not an XML string or file path/URL is passed. When strings are passed, the LoadXml method is called. Otherwise, the Load method is called.

After the XML document is loaded and the document's root element is found, the process of creating the different schema definitions is started by passing the root node (named root) to a private method named CreateComplexType:

XmlSchemaElement elem = CreateComplexType(root);

Here's the signature for CreateComplexType:

private XmlSchemaElement CreateComplexType(XmlElement el){}

CreateComplexType is the workhorse of the GenerateSchema class. It recursively walks through the source XML document and identifies all of the element and attribute nodes that should be added into the schema document. Figure 3 shows a portion of the CreateComplexType method code that identifies the XML elements and attributes in the XML document and creates corresponding schema types.

//Create complexType

XmlSchemaComplexType ct = new XmlSchemaComplexType();

if (el.HasChildNodes) {

    //loop through children and place in schema sequence tag

    XmlSchemaSequence seq = new XmlSchemaSequence();

    foreach (XmlNode node in el.ChildNodes) {

    if (node.NodeType == XmlNodeType.Element) {

        if (namesArray.BinarySearch(node.Name) < 0) {

            namesArray.Add(node.Name);

            namesArray.Sort(); //Needed for BinarySearch()

            XmlElement tempNode = (XmlElement)node;

             XmlSchemaElement sElem = null;

            //If node has children or attributes then

            //create a new complexType container

            if (tempNode.HasChildNodes ||

                tempNode.HasAttributes) {

                //Recursive call

                 sElem = CreateComplexType(tempNode);

            else {

                //No comlexType needed...add SchemaTypeName

                sElem = new XmlSchemaElement();

                sElem.Name = tempNode.Name;

                if (tempNode.InnerText == null ||

                   tempNode.InnerText == String.Empty){

                   sElem.SchemaTypeName =

                     new XmlQualifiedName("string",

                     SCHEMA_NAMESPACE);

                } else {

                    //Try to detect the appropriate

                    //data type for the element

                    sElem.SchemaTypeName =

                       new XmlQualifiedName(CheckDataType

                          (tempNode.InnerText),

                          SCHEMA_NAMESPACE);

                    }

               }

                //Detect if node repeats in XML so

                //we can handle maxOccurs

                if (el.SelectNodes(node.Name).Count > 1) {

                    sElem.MaxOccursString = "unbounded";

                }

                //Add element to sequence tag

                seq.Items.Add(sElem);

            }

        }

    }

    //Add sequence tag to complexType tag

    if (seq.Items.Count > 0) ct.Particle = seq;

}

if (el.HasAttributes) {

    foreach (XmlAttribute att in el.Attributes) {

        XmlSchemaAttribute sAtt = new XmlSchemaAttribute();

        sAtt.Name = att.Name;

        sAtt.SchemaTypeName =

          new XmlQualifiedName(CheckDataType(

            att.Value),SCHEMA_NAMESPACE);

        ct.Attributes.Add(sAtt);

    }

}

Figure 3. This code walks through the source XML document and creates schema definitions that match up with elements and attributes. If elements are found to have child nodes, CreateComplexType is recursively called to walk through all the descendants.

The code starts by creating a new XmlSchemaComplexType object. Then, it checks if the current element in the XML document (the root element when the method is initially called) has any child nodes by calling the HasChildNodes property of the XmlElement class. If children are found, a new XmlSchemaSequence object is created to hold the child element definitions. After the sequence object is created, each child is enumerated through and processed.  

Because XML is extensible, it's quite possible that several children of a given parent node have the same name. For example, an <orders> parent node may have multiple <order> child nodes. Since each child node needs to be defined only once in the schema, an ArrayList named namesArray is used to track child node names that have been defined; this prevents duplicates from showing up in the schema. Each child (that isn't a duplicate) has an associated XmlSchemaElement object that is created to represent it in the schema. This XmlSchemaElement object is added to the sequence tag with the following code:

seq.Items.Add(sElem);

As attributes are encountered, they're also enumerated through and an associated XmlSchemaAttribute object is created to represent the individual attribute. Each XmlSchemaAttribute object is added to the initial XmlSchemaComplexType object (discussed earlier) through its Attributes collection.

 

Handle Schema Nesting

Once the complex type element and related sequence element are created, an element representing the parent node is created and the complex type is assigned to the element (see Figure 4). This code generates the proper nesting of complex types based on the NestingType enumeration value passed to the BuildSchema method.

//Now that complexType is created, create element and add

//complexType into the element using its SchemaType property

XmlSchemaElement elem = new XmlSchemaElement();

elem.Name = el.Name;

if (ct.Attributes.Count > 0 || ct.Particle != null) {

//Handle nesting style of schema

if (generationType == NestingType.SeparateComplexTypes) {

string typeName = el.Name + "Type";

ct.Name = typeName;

complexTypes.Add(ct);

elem.SchemaTypeName =

                new XmlQualifiedName(typeName,null);

} else {

elem.SchemaType = ct;

}

} else {

if (el.InnerText == null ||

          el.InnerText == String.Empty) {

    elem.SchemaTypeName =

           new XmlQualifiedName("string",SCHEMA_NAMESPACE);

} else {

elem.SchemaTypeName =

               new XmlQualifiedName(CheckDataType(

                  element.InnerText),SCHEMA_NAMESPACE);

}

 

}

return elem;

Figure 4. The code shown here hooks up to a parent element the complex type created earlier. The type of nesting desired by the client is generated by checking the NestingType enumeration value passed to the BuildSchema method. If the complex types are to be separated (as opposed to nested), each complex type is added to an ArrayList, named complexTypes, which is later enumerated through to add each complex type definition into the schema.

 

Handle Data Types

You may have noticed a call to a method named CheckDataType back in Figure 3. This method attempts to determine what data type should be assigned to an element or attribute type definition based on the element's inner text or attribute's value. Figure 5 shows the code for CheckDataType; it can easily be extended to support other data type checks as needed.

private string CheckDataType(string data) {

//Int test

try {

             Int32.Parse(data);

    return "int";

} catch {}

 

//Decimal test

try {

    Decimal.Parse(data);

    return "decimal";

} catch {}

 

//DateTime test

try {

             DateTime.Parse(data);

    return "dateTime";

} catch {}

 

//Boolean test

if (data.ToLower() == "true" ||

             data.ToLower() == "false") {

    return "boolean";

}

 

return "string";

}

Figure 5. The CheckDataType method attempts to determine what data type should be assigned to a schema element or attribute definition.

After all elements in the source XML document are created, processing returns to the BuildSchema method and finishes the schema by adding the root element definition to the XmlSchema object (refer back to Figure 1). Adding the root element definitions to the XmlSchema object involves referencing its Items collection, as shown in Figure 6. After the root element definition is added to the schema root tag, the schema is compiled by calling the XmlSchema object's Compile method to see if any errors exist. Assuming no errors are found, the schema is written to a StringWriter class, which is returned from the BuildSchema method.

//Add root element definition into the XmlSchema object

schema.Items.Add(elem);

//Reverse elements in ArrayList so root complexType

//appears first where applicable

complexTypes.Reverse();

//In cases where the user wants to separate out the

//complexType tags loop through the complexType ArrayList

//and add the types to the schema

foreach(object obj in complexTypes) {

    XmlSchemaComplexType ct = (XmlSchemaComplexType)obj;

    schema.Items.Add(ct);

}

 

//Compile the schema and then write its contents

//to a StringWriter

try {

    schema.Compile(

       new ValidationEventHandler(ValidateSchema));

    StringWriter sw = new StringWriter();

    schema.Write(sw);

    return sw.ToString();

} catch (Exception exp) {

    return exp.Message;

}

Figure 6. After all the elements and associated complex types have been created, the root element is added to the XmlSchema object's Items collection through the Add method. The schema is then compiled to see if any errors exist. If none are found, it's returned from the BuildSchema method.

 

Putting it Together

By using classes found in the System.Xml.Schema namespace, you can see that it's possible to create dynamic schemas from existing XML documents. This same process can be extended to create customized schemas for other sources, such as database tables, classes, and so on. By leveraging the schema classes shown here, any type of XML schema can be generated for use in applications.

For more information on schemas, check out the document, "XML Schema Part 0: Primer" or the SoftArtisans Knowledge Base article, "Working with XML Schemas: Comparing DTDs and XML Schemas." To view a live demo of the GenerateSchema class in action, visit the XML for ASP.NET Developers website.

Note: To run the downloadable code with version 1.1 of .NET you need to set the validateRequest attribute to false in the web.config file (or on the Page directive). See the .NET SDK for more details.

The sample code in this article is available for download.

 

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish