Skip navigation

Using XML to Build Internet Solutions

Get a jump on the future of data transfer

Standardization of data-exchange mechanisms enables a variety of applications to read from and write to the mechanisms' data streams. Applications can work with a standardized mechanism's data stream in many ways, including ways the mechanism's developers can't foresee. To date, programmers haven't standardized most data-exchange mechanisms; they've developed individual mechanisms to work only with specific applications. Extensible Markup Language (XML) is different.

What Is XML?
XML is a specification for storing and exchanging data that the World Wide Web Consortium (W3C) created in 1996 to standardize information delivery across the Internet. The W3C defines XML as a subset of Standard Generalized Markup Language (SGML), which is a standard markup language for documents. But XML isn't a new language; instead, you can think of it as a language specification.

XML looks similar to HTML. XML uses tags and attributes to define data in the same way that HTML uses tags to define formatting. However, instead of having a fixed set of tags, as HTML has, XML lets you define the tags your XML streams use. Therefore, you can use tags to make records' content self-evident.

For example, suppose you're selling a course of instruction. You can define a record for your course using the following syntax:

<Courses>
<Course ID="ACD1000">
<Title>Advanced Component Development</Title>
<startdate>November 2, 1998 </startdate>
<duration>3</duration>
</Course>
</Courses>

Most people who read this data stream can understand its content and structure. The data stream defines your course and specifies its course ID, title, starting date, and duration. You don't need to know a proprietary language to understand this data.

XML is also useful for working with multiple records. For example, you can easily use XML to describe a situation in which a client orders two or more courses at the same time.

<Courses>
<Course ID="A1000">
<Title>Advanced Component Development</Title>
<startdate>November 2, 1998</startdate>
<duration>3</duration>
</Course>
<Course ID="B1000">
<Title>Using XML</Title>
<startdate>November 12, 1998</startdate>
<duration>2</duration>
</Course>
</Courses>

This example repeats the Course tag within the Courses tag to account for multiple user selections. XML also lets you nest tags in multiple layers to define hierarchies of data.

Defining XML tags. HTML specifications explicitly define all the tags you can use in HTML code; this convention makes browsers work. When a browser reads an H1 tag, the browser knows it needs to output the text between the H1 start and end tags as a Heading 1 style tag. In contrast, XML doesn't have any predefined tags. This characteristic makes XML perfect for transporting information between applications. When two applications exchange information, both must understand the XML stream; interoperability requires nothing else.

Applications can rely on only the data in the XML stream for definitions of XML tags, or they can refer to a Document Type Definition (DTD). A DTD describes an XML vocabulary—a set of definitions of the elements you can use within XML data streams that are based on that vocabulary. Each definition defines one element, and definitions can specify the elements' data type. However, the definitions don't specify elements' content.

The data stream in my previous examples uses the Courses, Course, Title, startdate, and duration tags to identify its data. The Course tag has an attribute (ID) that provides additional information about each record. By adding the ID attribute to the Course tag, I can include the course ID as part of the Course tag so that I don't need to add an ID tag to the record.

The ability to create a vocabulary for an XML data stream is handy because you can make the data structure of that vocabulary's streams clear. You can create a vocabulary to match your database definitions, your corporate procedures, and your organization's documentation. You can mold XML to fit your company's needs. This method is different from other approaches that require you to use rigid, predefined data structures.

Separating content from formatting. Web servers today use HTML to transport pages' content to viewers. HTML transports a page's data and the formatting information that tells browsers how to display the page in one data stream. HTML's mixing of data and presentation information complicates the process of transporting information. For instance, a Web server might send the course information in my first example as the following HTML code:

<BODY>
<P>&nbsp;</P>
<P>Course ID: <STRONG>ACD1000</STRONG></P>
<P>Title: <STRONG>Advanced Component Development</STRONG></P>
<P><STRONG></STRONG>Start Date: <STRONG>November 2, 1998</STRONG></P>
<P>Duration: <STRONG>3</STRONG></P>
</BODY>

You probably have trouble identifying the data in this example; this code doesn't make differentiating between the data and the formatting information easy. Therefore, creating applications that automatically separate HTML records' data from their formatting information isn't easy.

XML makes separating formatting information and data simple. The W3C recently released a draft for Extensible Style Language (XSL), which is a specification for creating and publishing XML documents. XSL works with XML in the same manner that Cascading Style Sheets (CSS) technology works with HTML. XSL and CSS define a document's formatting, but XSL offers more functionality than CSS offers. CSS lets you specify only documents' onscreen formatting; XSL lets you specify documents' onscreen and print formatting.

Many organizations are discussing and designing specifications and tools for XML. The W3C is developing more XML specifications. Vendors are building tools that work with XML. And industry organizations are working to define XML specifications for medicine, finance, and other industries. This interest in XML suggests that the specification will likely get the broad support it needs to become a successful standard.

XML in the Real World
Many vendors are creating products that support XML. Microsoft, Netscape, and Adobe Systems are among the companies presently exploring the benefits of XML. Microsoft already uses XML in Internet Explorer (IE) 4.0 and Commerce Server.

To use XML, an application must have a parser, which is a software engine that can read XML and extract data from an XML stream. For instance, a parser would read my first example and pick out ACD1000 as the value for the Course ID field. Many specifications for exchanging data use positional information to define fields or use explicit rules about data's position within a data stream. Because XML parsers depend only on a DTD or the data definition in a stream to extract the data from the stream, XML is easier to use and more flexible than other specifications. You can build a parser today that will work in future applications you haven't yet planned. The parser doesn't need to understand the XML streams it parses; it must understand only XML's simple rules.

Several XML parsers are available. IE 4.0 contains a parser (msxml.dll) that the browser uses to read channel definition file (CDF) and Open Software Description (OSD) streams. IE uses CDF streams for channel information and uses OSD streams for downloadable files. Developers can use the IE 4.0 parser in custom applications, but the applications' users will need to have IE 4.0 installed on their system. Microsoft's Commerce Server also contains an XML parser, and several parsers are free for downloading from the Internet.

XML is useful in other ways, too. Figure 1, page 126, demonstrates how a user can browse a Web site and interact with a custom XML application on the site. The user makes a selection that sends an XML stream to the Web server. (The XML stream you see in Figure 1 is this article's first sample of XML code.) When the server receives the code, it needs only an XML parser to read the stream. The server's application can use a commercially available parser or a custom parser to access the XML stream's content.

This use of XML becomes more interesting when you up the ante from one application to multiple applications that share data. Passing data between a browser and server that use the same application is easy, but many developers believe that using HTML features such as forms is faster for single-application data transfer because using XML as the transfer mechanism adds overhead to the development process.

However, XML is definitely a better choice than HTML for transferring data among applications. Figure 2 shows how developing applications to transfer data becomes more complex when you add to the process an inventory server that runs a different application than the Web client and server, and Figure 1's Web site handles user requests through that added server. The component server in Figure 2 executes the components that are common to both applications. The inventory server executes components and application features that are specific to inventory procedures. XML can serve as the data transport specification for each part of the request-fulfillment process in Figure 2. If each server's application includes an XML parser that understands the data stream's DTD, each application can pull data from the XML stream without requiring custom code for data interpretation.

Future implementations of many software products will use XML. For instance, IE 5.0 will permit XML data islands within HTML pages (i.e., blocks of XML code that you embed into the HTML code that defines a page). I can turn the XML code from my first example into a data island by placing XML tags before and after the code.

<XML ID="CourseXML">
<Courses>
<Course ID="ACD1000">
<Title>Advanced Component Development</Title>
<startdate>November 2, 1998 </startdate>
<duration>3</duration>
</Course>
</Courses>
</XML>

This code defines the XML stream as an object. If I place the code in an HTML page, a script running in the page can access the XML elements by using CourseXML as the object's name. This feature lets developers work directly with XML data without using a parser or creating parser code.

Of course, users need browsers that support XML to view a site feature that uses an XML data island. This restriction creates a problem, but it need not stop you from using XML. For now, you can create XML vocabularies and use XML behind the scenes in your applications. Then, when XML support becomes more prevalent in browsers, you'll be ready to benefit from XML's ability to transfer data to clients.

Get Ready for the Future
XML offers developers new options for defining and transporting information. This specification will become popular in the future as more applications take advantage of its simplicity and power. If you're developing applications now, you don't need to wait for the XML market to mature. You can find ways to use XML today. For more information about implementing XML solutions, see "Related Reading Online." If you begin developing XML applications, you'll position yourself to take advantage of XML as other applications support it in the future.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish