XML is hardly a compact file format. In fact, XML as a markup language is inherently redundant. When you write particularly large and complex XML documents, the amount of repeated information can escalate significantly, which might invalidate the XML solution in the customer's eyes.
Let's look at a typical example of this tendency toward huge XML documents. Consider an XML document that contains payment information, including the companies that pay, the invoices you send, and perhaps the currencies the companies use. If you organize the document as a list of payments, you need to repeat for each node all the information you hold and need about the related invoices (number, description, date), the customers (name, address, bank), and the currency (name, current exchange rate). Depending on how complex your document is, this schema probably produces a much larger document than you need, especially if you have several cross-reference possibilities and you need an efficient way to retrieve such information as the customers and invoices they paid or invoices paid with a certain currency. The more you need to keep data related, the more you need an efficient way of cross-referencing XML data. After all, this is exactly the problem you solve by establishing separate tables when you design a database structure.
To port the same logic into the land of XML, you need to know about ID and IDREF data types, two special XML types that let you build a cross-reference between blocks of information within an XML schema. The structure of the document looks like this:
<xml> <payments> <data:payment id="po1"> <ref:company ref="C001" /> : </data:payment> : </payments> <companies> <data:company id="C001"> <name>First, Inc</name> : </data:company> : </companies> </xml>
You have two blocks of information: payments and companies. Instead of including direct information about the company that made a payment, each <payments> node cross-references a node in the <companies> list. With databases and tables, you don't insert company-related fields in the same table as the payments. Likewise, in XML you define two separate groups of nodes and establish a link between them.
To put this logical schema to work, you need to take two steps. First, write a couple of XML schemas to tell the parser that you're using attributes of type ID and IDREF. In the code snippet above, I use two namespaces, data and ref. The ID attribute is "id," whereas the IDREF attribute is "ref." Second, when you write script code for this type of document, use the XML Document Object Model (XMLDOM) nodeFromID method to retrieve the cross-referenced node:
set nodeCompany = xmldom.nodeFromID("C001")
The ID type marks an attribute as a piece of information that another attribute can reference.
<AttributeType name="id" dt:type="id"/> <ElementType name="company" content="eltOnly"> <attribute type="id"/> </ElementType>
In contrast, the value of the IDREF attribute specifies where the target node ID is in the document or in an external document, as in
<AttributeType name="ref" dt:type="idref"/> <ElementType name="company" content="eltOnly"> <attribute type="ref"/> </ElementType>
You can use ID and IDREF with Microsoft's MSXML parser version 2.0 and later (version 2.0 ships with Internet Explorer—IE—5.0). These data types are handy tools for cross-referencing XML data.