Traditional web pages and applications rely on synchronous communications. In response to user actions, the web browser makes requests to the server, which processes each request and returns a full web page that the browser renders. This cycle repeats for as long as the user is enamored with a website. Information shown on a page can quickly become stale, and if it’s data that the user can update, then it might cause conflicts with existing back-end data or state information.
Ajax provides a solution for some applications, but it’s primarily a one-way request: the user or browser has to trigger a request to the server in some way. Fundamentally, this uses the traditional request/response of the web, albeit implemented in a way that enhances the user experience. So this approach really doesn’t handle real-time requests. You can set things up for long-term communication, but it tends to be bulky and cause a heavy load on the server.
There are several different options that people have created over the years to solve this need. Some of these techniques make use of long-time browser features—so they’re usable with virtually all browsers used today, even those used by a minority of users. Others take advantage of brand-spanking new features, such as WebSockets in HTML5, which only modern versions of browsers support. Four of the best available options include the following:
1. HTTP Polling
Polling is one of the more reliable and fail-safe types of real-time communications. It really isn’t real-time, but comes pretty close for all practical matters. But the reliability comes at a potentially significant cost in network traffic, with constant requests from clients and the requisite processing of each request, no matter how small and efficient that might be, particularly if there’s nothing new to send to the client. As a result, the server spends a lot of time processing requests that return no data at all, which isn’t a great situation. Nevertheless, this workaround can work very well if it’s the only option you have.
The big downside to this technique is the potentially huge number of requests from clients relative to the responses that contain meaningful data, unless server events are happening fast and furiously. Stated differently, polling is hugely wasteful. Another problem is that client and server events aren’t in sync, so it’s possible for many server events to occur between client requests. You have to be careful not to create -- and, in fact, to actively protect against -- a denial of service attack against your own server!
2. HTTP Long Polling
There’s an interesting variation of HTTP polling that can alleviate some of the drawbacks of that technique. HTTP long polling keeps each Ajax request waiting until something interesting happens, essentially by not generating a response immediately. So the client makes a request, and the server keeps the connection open until it has something meaningful to return to the client. Compare this to the regular polling technique, where the client makes a request, the server immediately responds (usually with no data), and the client immediately makes a new request.
The client makes each request with the expectation that there’s going to be some future server event that will generate a response. So instead of immediately generating a response to each client request, the server blocks the incoming request until a server event occurs, the request times out, or something happens to the connection. Regardless of which one occurs, the client immediately initiates a new request that’s blocked until something interesting happens on the server.
There are a lot of different ways to implement long polling and support for various techniques differs among browsers.
The problem with this technique is that HTTP requests and responses aren’t built to handle long-lived connections. Traditional web requests require a connection between the client and server only long enough for the server to immediately generate a response, which closes the connection. Sometimes it takes some time for a server to generate a response -- often because of a long-running database query or network latency -- and these tend to be pretty unreliable. HTTP just wasn’t designed for leaving connections sitting open. Long connections tend to get broken, so they are often disconnected. This is actually part of the long-polling workflow: if a connection is broken without a response, then the client simply makes a new request, which is the same response for a successful request that returns data. But all this adds a level of complexity to the technique that doesn’t help its reliability.
3. Server-Sent Events
Server-sent events is yet another variation on polling. It’s similar to long polling where the client makes a request and the server responds when and if it has meaningful data to send to the client. But rather than closing the connection with each response, the server keeps the connection open indefinitely. This lets the server send additional responses as meaningful data becomes available.
This technique enables one-way communication from the server to the client. This means that the client isn’t able to send any additional requests or data to the server after the initial request on the same connection. Instead, the client needs to make additional Ajax requests on a new connection. Fortunately, this doesn’t usually require closing the existing connection to the server, but it means that you can have multiple connections open between client and server.
This technique often uses the EventSource API that’s a part of HTML5, which you can use to manage server-sent events. This API is supported in modern versions of most widely-used browsers, with the notable exception of Internet Explorer (IE) 10. As of right now, I have no idea whether Microsoft will see fit to include it in a future version of IE.
The main downside of server-sent events is that it only supports communications from the server to the client, so it’s not bidirectional. But clients can still communicate with the server using additional Ajax requests, which can use any of the communication models, including the traditional HTTP request/response.
The fourth common real-time communications model is the only one that’s truly real-time and bidirectional: WebSockets. This is a new protocol that’s part of HTML5. It essentially converts standard HTTP request connections into full-duplex TCP communications channels, with a secure communication option. A lot of people are betting that this will be the real-time communication technology of choice.
The exchange between client and server involves an upfront negotiation. In essence, the client asks the server if it supports WebSockets, and is it interested? If the answer is affirmative on both counts, then the server responds, passing back a key based on the client’s request. Then client and server can pass information back and forth as needed, efficiently and effectively.
Browser support is growing, but WebSockets aren’t yet a feature for all modern browsers. Probably a bigger problem is that even if a user’s browser supports WebSockets, then things such as antivirus software, firewalls, and HTTP proxies can interfere with their use. I expect that over time WebSockets will become more widely supported and less widely interfered with, but who knows how fast that will happen.
Is Real-Time Communications Really This Complicated?