Transform Your Data - 30 Oct 2009

Use .NET classes to convert flat files to XML.

XtremeData

LANGUAGES: C# | XML | XPath

ASP.NET VERSIONS: 1.0 | 1.1

 

Transform Your Data

Use .NET classes to convert flat files to XML.

 

By Dan Wahlin

 

Data can come in all shapes and sizes, from flat files to EDI to XML. Converting between different data formats to accommodate both new and old systems often can be a challenging and time-consuming process. By using classes in the .NET platform, you greatly simplify conversions between different data formats. Because .NET is based on object-oriented programming techniques, the code you write can be reused easily. In this article, I'll introduce you to .NET classes you can use to convert between different data file formats. Specifically, I'll focus on converting delimited and fixed-length flat files to XML.

 

So, why would you want to convert flat files to XML? After all, doing so introduces another step in the processing pipeline. Aside from the fact that many applications now can work with XML directly, converting flat files to XML can be worthwhile.

 

First, you can validate XML data using XSD schemas with little effort. With flat files, you must write a custom validation mechanism. XML data also can be parsed easily using several different .NET XML-parsing APIs. Flat files require you to write a custom parsing mechanism to parse out the data. XML can be transformed into a variety of output formats using XSLT. With flat files, you also must write a custom transformation mechanism (as shown in this article). XML is a global standard many application platforms support. Although many applications and databases support flat files, by no means are they standardized from company to company.

 

Another great benefit of to working directly with XML is your back-end processes don't have to change much (if at all) to accommodate different file formats you might receive (flat file, EDI, etc.).

 

Convert Flat Files to XML

Although XML is in widespread use now, many "legacy" systems still don't know a thing about XML or how to use it. Many of these systems work with different types of flat files that delimit data using commas or tabs, or they have fixed-length fields. The keyword here, of course, is "delimit." Aside from a potential header row, flat files do not describe data in any great detail; they simply delimit it.

 

In cases where flat file data must be converted into XML data so an XML-aware application can use it or to standardize overall business operations, you can use several different techniques, including developing custom XmlTextReaders or even custom StreamReaders. One of the most efficient and flexible techniques I've come across employs the .NET platform's StreamReader and XmlTextWriter classes to perform this transformation process. By using the StreamReader and XmlTextWriter classes, relatively little work is required on your part (which means you can focus on more important tasks such as your golf swing).

 

To convert a flat file to XML, the file can be read with a StreamReader, which splits each line of data based on a split character - comma, tab, pipe, etc. - or field lengths. The resulting array can be iterated through and used to generate well-formed XML using different methods associated with the XmlTextWriter class. Figure 1 contains a simple flat file that delimits product data; Figure 2 shows the same data transformed into an XML document.

 

Elbow Joint,12930430,6,25,06/28/2000,1238 Van Buren,

  B2B Supply,1111236894,Walters

Valve,39405938,3,40,06/20/2000,4568 Arizona Ave.,

  A+ Supply,2221236894,Tammy

PVC,234954048,6,20,06/14/2000,49032 S. 51,A+ Supply,

  2221236894,Walters

Figure 1. This data is delimited using commas (line breaks have been added to accommodate the page width).

 

<?xml version="1.0" encoding="utf-8"?>

<supplies>

  <item partID="12930430">

    <description>Elbow Joint</description>

    <numberInStock>6</numberInStock>

    <numberOnOrder>25</numberOnOrder>

    <deliveryDate>06/28/2000</deliveryDate>

    <supplierstreet>1238 Van Buren</supplierStreet>

    <suppliercompany>B2B Supply</supplierCompany>

    <supplierphone>1111236894</supplierPhone>

     <orderedBy>Walters</orderedBy>

  </item>

  <item partID="39405938">

    <description>Valve</description>

    <numberInStock>3</numberInStock>

    <numberOnOrder>40</numberOnOrder>

    <deliveryDate>06/20/2000</deliveryDate>

    <supplierstreet>4568 Arizona Ave.</supplierStreet>

    <suppliercompany>A+ Supply</supplierCompany>

    <supplierphone>2221236894</supplierPhone>

    <orderedBy>Tammy</orderedBy>

  </item>

  <item partID="234954048">

    <description>PVC</description>

    <numberInStock>6</numberInStock>

    <numberOnOrder>20</numberOnOrder>

    <deliveryDate>06/14/2000</deliveryDate>

    <supplierStreet>49032 S. 51</supplierStreet>

    <suppliercompany>A+ Supply</supplierCompany>

    <supplierphone>2221236894</supplierPhone>

    <orderedBy>Walters</orderedBy>

  </item>

</supplies>

Figure 2. Using the StreamReader and XmlTextWriter classes, you can convert the flat file shown in Figure 1 into an XML document that XML-aware applications can use more easily.

 

Create a Reusable Class

A class named FlatFileToXmlTransform reads each line in the flat file shown in Figure 1, and it splits the data into separate parts based on a specific delimiter character. This class also can work with fixed-length flat files as you'll see later in this article.

 

The constructor for the FlatFileToXmlTransform class initializes private fields that store information needed by the class:

 

string _csvPath;

XmlDocument _mapDoc;

 

public FlatFileToXmlTransform(string csvPath,string xmlMap)

   {

    _csvPath = csvPath;

    try {

         _mapDoc = new XmlDocument();

        _mapDoc.Load(xmlMap);

    }

    catch {}

}

 

The FlatFileToXmlTransform constructor accepts two parameters that contain data representing the path to the flat file as well as the path to an XML mapping file (discussed in a moment). The XML mapping file is loaded into a DOM structure by calling the XmlDocument class's Load method.

 

FlatFileToXmlTransform contains a method named Transform, which transforms the flat file to an XML document and returns the output document as a stream. The complete code for the Transform method is shown in Figure 3. After creating MemoryStream and XmlTextWriter object instances, the code handles splitting each line of code (based upon either a delimiter or predefined field lengths).

 

public Stream Transform() {

  StreamReader reader = null;

  XmlElement root = _mapDoc.DocumentElement;

  string line = null;

  MemoryStream stream = null;

  try {

    if (_mapDoc == null) return null;

    char[] delimiter = null;

    if (root.GetAttribute("delimiter") != null &&

     root.GetAttribute("delimiter") != String.Empty) {

      delimiter =

       root.GetAttribute("delimiter").ToCharArray();

    }

    stream = new MemoryStream();

    writer = new XmlTextWriter(stream,Encoding.UTF8);

    writer.Formatting = Formatting.Indented;

    writer.WriteStartDocument();

    writer.WriteStartElement(root.GetAttribute("root"));

    reader = new StreamReader(new FileStream(_csvPath,

     FileMode.Open,FileAccess.Read));

 

    //Read each line of file

    while ((line = reader.ReadLine()) != null) {

      string[] tokens = null;

      if (delimiter == null) { //Handle fixed-length files

        tokens = SplitLine(line);

      } else {          //Handle delimited files

        tokens = line.Split(delimiter);

      }

       //Filter() method available in code download

      GenerateXml(Filter(tokens));

    }

    writer.WriteEndElement(); //Close root element

    writer.Flush();

    //CopyStream() method available in code download

    return CopyStream(stream);

  }

  catch (Exception exp) {

    throw new ApplicationException("Flat-file parsing " +

     "errored out",exp);

  }

  finally {

    if (reader != null) {

      reader.Close();

    }

    if (writer != null) {

      writer.Close();

    }

  }

}

Figure 3. The Transform method is responsible for splitting the data contained within a flat file into an array. The array is mapped to the desired XML output document.

 

After each line of data in the flat file is split into an array named "tokens" within the Transform method, the array is passed to a method named GenerateXml, which processes each array item and maps it either to an XML element or an attribute. This mapping is accomplished by reading from an XML mapping file (passed to the FlatFileToXmlTransform class constructor shown in Figure 3).

 

Figure 4 shows a sample XML mapping file that maps the flat file data shown earlier in Figure 1 to the XML document shown in Figure 2. When looking through the mapping file's elements and attributes, you'll see that each piece of data in the flat file is identified by position and mapped either to an element or to an attribute node through the type attribute. The delimiter used within the flat file is identified in the mapping file by an attribute named "delimiter".

 

<?xml version="1.0" encoding="utf-8" ?>

<mappings root="supplies" child="item" delimiter=",">

  <mapping pos="0" name="product" type="Element" />

  <mapping pos="1" name="partID" type="Attribute" />

  <mapping pos="2" name="numberInStock" type="Element" />

  <mapping pos="3" name="numberOnOrder" type="Element" />

  <mapping pos="4" name="deliveryDate" type="Element" />

  <mapping pos="5" name="supplierStreet" type="Element" />

  <mapping pos="6" name="supplierCompany" type="Element" />

  <mapping pos="7" name="supplierPhone" type="Element" />

  <mapping pos="8" name="orderedBy" type="Element" />

</mappings>

Figure 4. This XML file "maps" flat file data to XML elements or attributes.

 

The GenerateXml method uses the Document Object Model (DOM) and XPath to match array items to mapping nodes within the XML mapping document. First, data items that should be mapped to attributes are identified so they can be added to the container node (identified by the child attribute on the "mappings" root element; see Figure 4). Next, each item in the array is iterated through, and the associated mapping node is found using a simple XPath statement. The data is written to the output XML document by calling the XmlTextWriter's WriteElementString method. Finally, the container's closing tag is written out by calling WriteEndElement. This process is shown in Figure 5.

 

private void GenerateXml(string[] tokens) {

  XmlElement root = _mapDoc.DocumentElement; //Get map root

  //Create container

  writer.WriteStartElement(root.GetAttribute("child"));

  //First add attribute nodes while child container

  //is still open

  XmlNodeList atts =

   root.SelectNodes("./mapping[@type='Attribute']");

  foreach (XmlNode attNode in atts) {

    writer.WriteAttributeString(

     attNode.Attributes["name"].Value,       

     tokens[Int32.Parse(

      attNode.Attributes["pos"].Value)].ToString());

  }

  for (int i=0;i<tokens.Length;i++) {

    XmlElement node =

      (XmlElement)root.SelectSingleNode("./mapping[@pos='" +

     i.ToString() + "' and @type='Element']");

    if (node != null) { //mapping exists

      writer.WriteElementString(node.GetAttribute("name"),

       tokens[i].ToString());

    }

  }

  writer.WriteEndElement(); //Close container

}

Figure 5. After the XML mapping file is loaded into an XmlDocument object, XPath is used to map array items to <mapping> nodes.

 

Use Transform for Fixed-length Flat Files

The Transform method also can handle fixed-length flat files. If no delimiter attribute is found within the XML mapping file, the code within Transform automatically calls a method named SplitLine, which is responsible for reading field lengths from the XML mapping file and using them to split up each line of the file. This is accomplished by iterating through the mapping nodes within the XML mapping file and reading the length attribute. A sample XML mapping file designed to work with fixed-length files is shown in Figure 6; the code for the SplitLine method is shown in Figure 7.

 

<?xml version="1.0" encoding="utf-8" ?>

<mappings root="supplies" child="item" delimiter="">

    <mapping pos="0" length="25"

      name="product" type="Element" />

    <mapping pos="1" length="10"

      name="partID" type="Attribute" />

    <mapping pos="2" length="5"

      name="numberInStock" type="Element" />

    <mapping pos="3" length="5"

      name="numberOnOrder" type="Element" />

    <mapping pos="4" length="12"

      name="deliveryDate" type="Element" />

    <mapping pos="5" length="25"

      name="supplierStreet" type="Element" />

    <mapping pos="6" length="25"

      name="supplierCompany" type="Element" />

    <mapping pos="7" length="12"

      name="supplierPhone" type="Element" />

    <mapping pos="8" length="20"

      name="orderedBy" type="Element" />

</mappings>

Figure 6. The Transform method is capable of converting both delimited and fixed-length flat files into XML documents. This XML mapping document defines the lengths of fixed-length fields.

 

private string[] SplitLine(string line) {

  XmlElement root = _mapDoc.DocumentElement;

  int currPos = 0;

  string[] tokens = new String[root.ChildNodes.Count];

  //Read through mapping file to know how to parse

  //string in order to generate token array

  for (int i=0;i<root.ChildNodes.Count;i++) {

    XmlElement mapping = (XmlElement)root.ChildNodes[i];

    int endPos =

      Int32.Parse(mapping.GetAttribute("length"));

    string field = line.Substring(currPos,endPos);

    tokens[i] = field.Trim();

    currPos += endPos;

  }

  return tokens;

}

Figure 7. SplitLine iterates through the XML mapping file nodes and finds length attributes that define the length of each field in the fixed-length flat file. The lengths are used to split up file lines and generate a string array.

 

You can see that it does take some work to convert flat files into XML. All this complexity is encapsulated, however, within the FlatFileToXmlTransform class. A consumer of the class can perform the conversion quite easily (see Figure 8).

 

FlatFileToXmlTransform xmlOutput =

 new FlatFileToXmlTransform(filePath,xmlMap);

try {

  Stream s = xmlOutput.Transform();

  if (s != null) {

    s.Position = 0;

    StreamReader reader = new StreamReader(s);

    this.txtXml.Text = reader.ReadToEnd();

    reader.Close();

  }

}

catch (Exception exp) {

  this.txtXml.Text = exp.InnerException.Message;

}

Figure 8. Converting a flat file to XML is a snap with the FlatFileToXmlTransform class. Although this example writes the generated XML document to a textbox control for display, it could write the output stream to a file just as easily.

 

Although the FlatFileToXmlTransform class is not perfect by any means - for instance, it handles only quote text qualifiers - it does provide a flexible starting point for converting various types of flat files to XML documents. By changing the XML mapping file, different types of XML output can be generated, which allows changes to be made without recompiling and deploying .NET classes.

 

A note on the code: To post data using the ASP.NET test form included with this article's downloadable code in version 1.1 of the .NET Framework (see the Download box for details), you need to add validateRequest="false" to the Page directive as shown here:

 

<%@ Page validateRequest="false" ... %>

 

No code changes are necessary if you're using version 1.0 of the .NET Framework.

 

The sample code in this article is available for download.

 

Dan Wahlin (a Microsoft Most Valuable Professional for ASP.NET and XML Web Services) is president of Wahlin Consulting and founded the XML for ASP.NET Developers Web site (http://www.XMLforASP.NET), which focuses on using XML and Web Services in Microsoft's .NET platform. He also is a corporate trainer and speaker, and he teaches XML and .NET training courses around the United States. Dan co-authored Professional Windows DNA (Wrox) and ASP.NET Tips, Tutorials & Code (Sams), and he authored XML for ASP.NET Developers (Sams). E-mail Dan at [email protected].

 

 

 

 

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish