Understanding the Session Initiation Protocol

The popularity of Instant Messaging (IM) and videoconferencing is growing rapidly both on a commercial level, in which organizations use these technologies to conduct business, and on a casual level, in which individuals increasingly rely on IM and personal WebCams to communicate with friends and family. As a result, Microsoft has decided to align its strategic approach to these realtime collaborative technologies with new and evolving standards. One such standard that Microsoft is helping to shape and define is the Internet Engineering Task Force's (IETF's) Session Initiation Protocol (SIP, which is pronounced as the word "sip"). Microsoft is releasing the Real Time Communications (RTC) Server, which uses SIP to provide IM and videoconferencing capabilities. (For information about how Microsoft intends to release the RTC Server, see the Web-exclusive sidebar "Packaging of the RTC Server," http://www.winnetmag.com, InstantDoc ID 27398.) Microsoft will ship the RTC Server around the same time it ships Windows .NET Server (Win.NET Server) 2003. At the time of this writing, the RTC Server is about to enter beta testing and is generally referred to by its code name, Greenwich.

To understand how RTC Server and similar products work, you need to know about the protocol that underlies the server's functionality. Here's a look at the basic concepts behind and the components of SIP.

SIP Basics
SIP is an end-to-end and client/server protocol that facilitates the creation, modification, and termination of communications sessions between one or more participants. These communications sessions can include different forms of interactions—basically any form of peer-to-peer or multipoint communication, including multimedia conferences and telephone calls. The participants can be either humans (who use endpoints such as SIP-enabled telephones or videoconferencing clients) or an automation component (e.g., voicemail server, media-archiving server).

Although the telephony industry conceived SIP, this protocol is designed to simplify Internet-based communication. (For information about SIP's beginnings, see the Web-exclusive sidebar "A Short History of SIP," http://www.winnetmag.com, InstantDoc ID 27399.) Any SIP-based communications session typically involves at least three separate activities and protocols:

SIP provides the basic signaling between participants to set up the session.

SIP uses the Session Description Protocol (SDP) to define the nature of the communication used within the session, including the type of media (e.g., video, audio), transport protocol (e.g., IP, UDP, Real Time Protocol—RTP), and media format (e.g., H.261 video, Moving Pictures Experts Group—MPEG—video).

SIP uses the appropriate protocol to transfer information in the session. For example, SIP uses RTP to transfer realtime information and Real Time Streaming Protocol (RTSP) to deliver streaming media.

SIP is defined in IETF Request for Comments (RFC) 2543, which you can retrieve from the IETF repository (http://www.ietf.org/rfc.html). RFC 2543 wasn't published until March 1999, so the IETF working groups had the opportunity to incorporate concepts from other Internet protocol architectures that were well established and successful. The SIP architecture borrows many concepts from SMTP. For example, in SIP, users are designated with a SIP address (i.e., a SIP URL) that's similar to an SMTP address (i.e., a mailto URL). SIP also borrows concepts from HTTP. Information about the communications session that SIP is controlling is similar to the information that you would expect to see with HTTP. For example, a SIP packet might look something like the one that Figure 1, page 26, shows. In this packet, user George on the PC named gpc.yankees.com invites user Jerry to a session.

SIP provides four key functions. Two functions—name mapping and redirection, and capabilities negotiation—occur during a session's setup. The other two functions—participant management and capabilities management—occur during the session.

Name mapping and redirection. SIP translates participants' descriptive naming information to SIP location information that's consistent with directory or other services. SIP facilitates personal mobility so that users can establish a SIP session when they're on the move (e.g., moving from their connected desktop PCs to their cars), thereby making their mobile telephone or wirelessly connected PDA their preferred communications device.

Capabilities negotiation. SIP determines the various media capabilities of all the participants in a session and agrees on the media facilities to be used during the session. For example, if two participants in a session have video capability but a third participant has only audio capability, SIP determines that a video stream can be used in the session but only the audio stream should be transmitted to the nonvideo participant.

Participant management. During a session, SIP lets participants bring new participants into a session or terminate or suspend connections with existing participants.

Capabilities management. During a session, SIP monitors the media capabilities and makes adjustments if necessary. For example, suppose a session consists of two participants, both of whom have only audio capability. If another participant joins the session and has video capability, SIP adds a video stream for the new participant.

The SIP Components
SIP consists of five components: user agent client (UAC), user agent server (UAS), proxy server, redirect server, and registrar server. The UAC and UAS are client-side components, whereas the proxy, redirect, and registrar servers are server-side components.

The UAC
The UAC is an application that initiates a SIP request to a UAS. (A SIP message is either a request from a UAC to a UAS or a response from a UAS to a UAC.) The original SIP specification defines six possible types of requests (i.e., methods) that a UAC can issue: INVITE, ACK, OPTIONS, BYE, CANCEL, and REGISTER. (Extensions of SIP-related RFCs define additional methods, such as MESSAGE, INFO, and NOTIFY.) For example, the SIP request that Figure 1 shows uses the INVITE method and a Request Uniform Resource Identifier (URI) of sip:[email protected]. The request ends with SIP's version number.

When the UAC initiates a SIP session, the UAC determines the protocol, port, and IP address of the UAS to which to send the request. In the absence of any locally configured proxy-server information (more about this topic later), the UAC uses information in the Request URI to determine how to route the SIP request to its endpoint. The Request URI always specifies a host but doesn't always specify the port and protocol. If the specified host is an explicit IP address, the UAC attempts to contact the UAS at the specified address. If the Request URI specifies a Fully Qualified Domain Name (FQDN), the UAC queries DNS services for resolution by using an ADDRESS, CNAME, or other resource record. (The OS implementation on which the UAC resides might provide alternative mechanisms for host-name resolution, such as local HOSTS files, WINS, or broadcasts, but RFC 2543 doesn't specify their use. Regardless, if the local OS resolves and returns a host name to the UAC, the UAC uses the resolved name in the subsequent SIP communication.)

If the Request URI specifies a port, the UAC attempts to contact the UAS at that port. If the Request URI doesn't specify a port, the UAC contacts UDP port 5060 by default. If the Request URI specifies a transport protocol (either TCP or UDP), the UAC uses that protocol. If the Request URI doesn't specify a transport protocol, the UAC attempts to use UDP; if that connection attempt fails, the UAC tries TCP.

If an INVITE request includes the port and protocol, it might look like

INVITE sip:[email protected]:5050;transport=TCP SIP/2.0

However, most applications use the shortened format (i.e., host name only), which looks like the first line in Figure 1.

The UAS
The UAS is an application that receives SIP requests from a UAC and returns responses to those requests. The UAS can be an application with which a user interacts, so upon receipt of a SIP request, some form of notification from the UAS to the user might take place.

Like HTTP status codes, SIP responses consist of a three-digit integer result code coupled with a textual phrase. The SIP result codes fall broadly into six categories: 1xx (informational), 2xx (success), 3xx (redirection), 4xx (client error), 5xx (server error), and 6xx (global failure). Section 7 in RFC 2543 defines the result codes in their entirety.

Whatever the nature of the UAC's request and the UAS's resulting action, the UAS sends a response to the UAC. Sometimes the UAS issues more than one response to a request. For example, when the UAC issues an INVITE request to participate in a session, the UAS might issue three responses to the UAC's initial request. The SIP transaction is complete only when the UAS fully responds to the initial request. The UAC issues a final request, which uses the ACK method, to confirm that it has received the final response to its INVITE request. The only time the UAC uses the ACK request is in conjunction with the INVITE request.

Because SIP is a peer-to-peer protocol and a client/server protocol, a SIP endpoint must be able to initiate and respond to SIP session requests. Accordingly, such an endpoint must possess both UAS and UAC functionality. The user agent (UA) does just that—it has both UAC and UAS functionality. Another term commonly used for the UA is the SIP client.

The Proxy Server
Up to this point, the communication between the UAC and UAS has been effectively peer-to-peer, although strictly speaking it's client/server. However, proxy servers are commonly used in SIP implementations. The proxy server acts as an intermediary that can service requests or forward them to other UASs or UACs for servicing. In this approach, a user's UA (read client) routes all its SIP transactions through an explicit proxy server (read server). Figure 2 shows a typical intraorganizational configuration in which a user, when initiating a SIP session to another user within the same organization, has messages routed through a proxy server before those messages are relayed to the destination SIP client.

This intraorganizational architecture can extend to an interorganizational one. Figure 3 shows the interorganizational, or federated, architecture. Users in each organization have their UA configured to point to their respective proxy servers, and the proxy servers communicate with each other to relay messages.

Another common reason for using a proxy server is name mapping. For email services, users often have an address by which they're known externally and a private internal email address for internal mail routing. A similar concept is available with SIP. A proxy server can query a location service, such as a Lightweight Directory Access Protocol (LDAP) directory service, and map an external SIP identity to an internal SIP identity.

In addition to showing the federated architecture, Figure 3 shows the interactions between SIP components when name-mapping services are in effect. When the Acme proxy server receives the INVITE request, the server consults the location service to perform name mapping if necessary. In this case, the Acme proxy server submits [email protected] and the location service returns the SIP address's internal form [email protected].

The implementation of a location service isn't in any way tied to the SIP specification. Nor does SIP mandate any means of interaction between a proxy server and a location service. How you choose to set up the SIP system is entirely up to you. You can choose whether or not to use a location service. If you decide to use one, you can choose the type of location service to use. The RTC Server uses a proprietary location service, not Active Directory (AD).

The location service can specify multiple SIP addresses for users if necessary. For example, you might need multiple SIP addresses if a recipient logs on at different locations (e.g., a telephone and a computer terminal). When the location service returns multiple SIP addresses, the proxy server relays the INVITE request to each identified UAS. Proxy servers try each SIP address either sequentially or in parallel until the call is successfully established, the callee declines to accept the call, or the callee can't be reached.

When a proxy server forwards a SIP request, it adds its name to the beginning of the list of forwarders in the SIP message header's Via field. This field lets SIP responses take the same return path as requests. During the return path, each proxy server removes its name from the Via field after it processes the SIP response.

The Redirect Server
One of SIP's fundamental functions is redirection. Redirection lets users temporarily change locations and still be contactable through the same SIP identity. For example, if user [email protected] is out of the office and has registered his SIP identity at the hotel at which he's staying, a SIP message routed to [email protected] might ultimately get routed to [email protected].

In such cases, having the proxy server relay the original SIP message to the new SIP address is inappropriate because the network traversal would likely be suboptimal. Instead, you should have a redirect server in Acme inform the caller that a different SIP address should be used to contact the intended recipient. The redirect server needs to have access to the location service for Acme users, as Figure 4 shows.

After the proxy server receives the initial INVITE request, the proxy server sends the request to the redirect server. After the redirect server finds out about the new temporary address from the location server, the redirect server returns a SIP/2.0 300 Moved Temporarily response. The Contact field contains the temporary address. The original caller then reissues its INVITE request, but this time to the temporary address. To complete the transaction, the caller issues an ACK request to the redirect server. (Figure 4 doesn't show this last step.)

Although redirect servers and proxy servers are separate components, the RTC Server implements the components on one server. A server that includes a combination of functions is called a SIP server. Configuration settings on the SIP server determine how SIP messages for various SIP recipients are processed (e.g., whether the messages go to a proxy server or to a redirect server).

The Registrar Server
Users can inform a proxy or redirect server about the addresses at which they can be contacted. When a user wants to change an address, the SIP client issues a REGISTER request. A registrar server accepts REGISTER requests from the SIP client and records the user's new information. Typically, the SIP server uses this information to populate the location service so that the location service can redirect subsequent requests to the correct address.

The SIP clients can contact a registrar server two ways. They can directly contact the registrar server by using address information that's configured into the client. Or they can indirectly contact the registrar server by using the multicast address sip.mcast.net (224.0.1.75) to broadcast a registration request to the all SIP servers group. SIP servers listen on this address and process SIP REGISTER requests accordingly.

Figure 5 contains a sample REGISTER request. The Request URI specifies the destination of the REGISTER request (i.e., the registrar server) and must contain only a host name or domain name—it can't contain a username. The To field identifies the user's SIP address to be registered. When the user is performing the registration, the addresses in the To and From fields usually match. In the case of a third-party registration (i.e., a third party is making a registration on another user's behalf), the From field contains the third party's SIP address.

Although not always required, the REGISTER request might include a Contact field. In the sample REGISTER request in Figure 5, notice that the Contact field's address differs from that in the To field. In such cases, a proxy or redirect server directs future requests for the user specified in the To field to the address specified in the Contact field.

The Expires field specifies the registration's lifetime in seconds. The sample REGISTER request expires in 12 hours. However, if the client doesn't specify a value for the registration lifetime, SIP servers typically register the client for 1 hour. Users can cancel a registration by reissuing a REGISTER request with an expiration time of 0 seconds.

In the same way that you can combine the proxy and redirect functions on one SIP server, you can add the registrar function to the same SIP server. The RTC Server implements the proxy, redirect, and registrar functions on one SIP server.

Core Concepts Covered
I've introduced you to the core concepts of SIP as specified in RFC 2543. You need to fully understand these concepts before you look at how real products implement SIP. In an upcoming article, I'll show you how Microsoft's SIP client (Windows Messenger) and Microsoft's SIP server (RTC Server) interact and work.

Understanding the Session Initiation Protocol

Comments

Plain text