URL, URI, and URN

We use URLs so frequently that it's easy to find people who think URL is a proper noun rather than an acronym. URL stands for Uniform Resource Locator. HTTP, which you regularly use to connect to remote sites, requires that the target machine you want to talk to be univocally identified through a name. Behind the URL is a syntax that mandates the following format:

<protocol>:// <host> \[:<port>\] \[<path> \[? <query>\]\]

<protocol> is the name of the protocol used to access the resource. The resource isn't necessarily a remote resource and the protocol isn't necessarily HTTP. For example, both the following URLs are perfectly legitimate:

http://www.wintellect.com
file://c:\winnt\system32\shell.dll

<host> represents the IP address of the server.

<port> is the number of the TCP port where the conversation takes place. The port attribute defaults to 80, which is the port where systems usually exchange HTTP handshakes and packets. The following is another perfectly legitimate URL:

http://www.wintellect.com:80

<path> and <query> are other parameters that let you specify a path within the host machine and some arguments to direct the system to locate a specific document.

Of all this syntax, only the host parameter is mandatory. The path portion of the string usually has a protocol-specific format.

All this sounds so familiar that we sometimes forget that there's more. A URL is nothing more than a special instance of a slightly more general mechanism for identifying resources over the Web. This basic mechanism is the Uniform Resource Identifier (URI). By definition, a URI is a formatted string that univocally and uniquely identifies a resource. There are two types of URIs: URLs and Uniform Resource Names (URNs). URNs identify XML schemas. The URN syntax is as follows:

urn:<namespace>:<string>

<namespace> is the namespace identifier.

<string> is a namespace-specific string.

For example,

urn:schemas-microsoft-com:rowset

is the URN that defines an ADO recordset.

Basically, you have two flavors of identifiers for Web resources: address-specific (URL) and name-based (URN). URLs contain the underlying protocol you need to use to access the specified resource. URLs, therefore, encode the physical path to the resource and identify that path through a location. By contrast, a URN is location-independent and has no notion of the protocol you might use to access a specified resource. In other words, the resource is only a name—a unique name.

Roughly speaking, the difference between URLs and URNs is the difference between a person's social security number and his or her name. The social security ID lets you know instantaneously all available information about a person. Not only do you unequivocally identify the person, but you also have the key to access all the data he or she represents. When you use a name, you have only a unique name that doesn't always give you the key to access the right information.

A URL takes you straight to the resource and the data. A URN is only a unique name you can use to identify any resource you want. URNs are related to namespaces. Both URLs and URNs can uniquely identify resources over the Web. Use URLs when you need to know or specify location information. Use URNs if the resource is location-independent.

Comments

Plain text