In my last column, I demonstrated how to store all the Active Server Pages (ASP) session variables in an XML string that you could store in a database for persistence. In doing so, I faced the problem of defining an XML schema. When defining a schema, the best advice is to try to map as closely as possible the data you want to describe. But the data doesn't tell you whether a certain piece of information is better rendered as an attribute or as a node. While the point might seem minor, the correct choice can improve overall optimization and position you well for expanding the schema further.
Attributes describe properties of nodes (also frequently referred to as elements). Nodes are more complete and flexible objects than attributes. Nodes represent a hierarchy and contain attributes. By design, attributes are more limited than nodes. For instance, attributes can't contain subelements, and you can't specify that attributes appear in a particular order. You can specify whether an attribute is required or optional, but an attribute can appear only once per node.
However, attributes have some capabilities that nodes lack. For example, attributes can accept enumerated values and can include a default value to be used if the attribute is omitted in a certain node. The following code illustrates an enumerated type attribute:
<AttributeType name="payment" dt:type="enumeration" dt:values="cash card check" />
To specify a default value for an attribute, you use the default attribute as follows:
<AttributeType name="payment" dt:type="enumeration" dt:values="cash card check" /> <attribute type="payment" default="cash"/>
Different node types may have attributes with the same name. These attributes are qualified by the node and considered independent and unrelated.
Nodes and attributes are completely different things. Nevertheless, when you think of a schema for a certain piece of information, you might wonder whether to structure it as an attribute of some existing node or as a new node child of the existing one. In short, you need to decide which of the following solutions to implement:
<z:row field1="one" field2="two" /z:row>
<z:row> <field1>one</field1> <field2>two</field2> </z:row>
<z:row> <field name="field1">one</field> <field name="field2">two</field> </z:row>
I deliberately chose the <z:row> node for demonstration because <z:row> is one of the elements that ADO 2.x uses when it persists recordsets to XML. It offers a good model to follow because recordsets are quite complex data to render in a context that always requires excellent performance.
Microsoft opted for solution 1 above, the all-attributes solution. The reasoning is based on some assumptions about how you will use XML recordsets. One assumption is that you won't need to extend the format in the future; another is that the application requires the highest performance and should tout a component-oriented view of the data. Attributes make you think about properties of the object behind the node. Using attributes and minimizing the number of nodes helps the parser do its job. For the Microsoft XML parser, attributes and nodes are both COM objects with much the same interface, but instantiating a node requires more work.
How do you decide whether attributes or nodes work better in your case? Let me offer two very simple rules that I use. Rule 1 is "Use as many attributes as possible if you know that the schema isn't going to change much. Use as many nodes as possible if you know that the schema is going to change significantly." In general, consider that, in terms of system resources
<element> <field>one</field> </element>
is more expensive than
<element field="one" />
What about Rule 2? Mmm, after all, it's always (also) a matter of preferences.