The XML Query Language

Right now, XML Path Language (XPath) is as close as the XML community is to a query language for XML content. XPath provides a decent selection capability, but it's limited. It supports only Boolean, string, and numeric data types and doesn't work effectively with case-insensitive strings or regular expressions. XPath doesn't let you select part of a node or combine different results to produce new nodes. It doesn't provide the ability to build new data. And finally, XPath isn't based on XML.

Extensible Style Language Transformations (XSLT) is another query language—of sorts. XSLT is an XML language, and it includes all of XPath's capabilities—plus some additional ones, such as template support. Currently, however, XSLT requires that well-formed XML documents be loaded into memory, and it doesn't improve on XPath's data types support.

Both XPath and XSLT are difficult to work with, and they certainly aren't easy to learn. Thus, the introduction of XML Query Language could be even more important than XML itself.

Many developers expect XML Query Language to be to XML what SQL is to relational databases. All the major software vendors are expected to fully support XML Query Language the way they support and use XML now. Making data searchable through XML Query Language will become essential, just as exposing data through XML is today. Unfortunately, XML Query Language is still in the draft stages with a World Wide Web Consortium (W3C) working group and probably will be for the next couple of years. You can monitor W3C's XML Query Language site for the latest breaking news.

In general, the current W3C XML Query Language specification focuses more on the theoretical model than on concrete implementation details. As often happens, these details will be left to the creativity of vendors. The specification includes four parts: requirements, data model, algebra, and syntax.

The requirements portion of the XML Query Language draft defines the scenarios in which the language must be usable. You should be able to use it to select, transform, and index XML fragments underlying the data. Note that this data can come from physical XML documents and from the computer-oriented representation of stored data. XML Query Language improves on XPath's selection capabilities by adding support for more data types and by adding the ability to consider externally linked documents as a subtree of the existing document. Transformation will introduce the means to project as a separate tree a subset of an existing tree. In general, XML Query Language's ability is superior to XPath and XSLT.

The data model portion of the XML Query Language draft describes the set of information available to a query string in an essentially node-centric fashion. The information is seen as a collection of nodes where each node can be a different data type. Actually, the data model isn't very different from the Document Object Model (DOM) representation of some XML content.

The XML Query Language algebra provides an abstract representation of the query in terms of the basic operations defined in the XML Query Language. Whatever the algebra's final form, you will cope with it through a higher-level language, namely the XML Query Language syntax. The syntax will be just what you expect: rigorous, declarative commands that, once processed, produce a set of nodes.

Currently, the W3C working group is far from having a complete draft available. In other words, nobody can really be sure about XML Query Language's definitive form. What appears to be certain is that XPath isn't dead and that it will influence—to some extent—the soon-to-be XML Query Language. The XML Query Language seems to fall somewhere in-between XPath and XSLT: It's simpler than XSLT with a richer syntax than XPath, and it's more powerful than either.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.