Memories of Microsoft IT's 7-node mega-Exchange 2003 cluster

Memories of Microsoft IT's 7-node mega-Exchange 2003 cluster

Depending on your taste and the topic under discussion, nostalgia can be either mind-bendingly boring or terrifically interesting. A few weeks ago, I was shooting the breeze with a friend about some of the old Microsoft Exchange Conference (MEC) events that we had attended and our attention focused on some of the EMEA events that happened at the Acropolis in Nice, France.

Amongst the stories about parties on yachts in Nice harbour, we talked about some of the headline sessions that were popular at the time, which brought us to Microsoft’s Exchange mega-cluster, described with gusto by luminaries of the product group such as Paul Bowden to massive audiences, all of whom seemed mightily impressed by the concept.

Fifteen years ago, Exchange began its relationship with clustering with the Exchange 5.5 and “Wolfpack”/Windows NT4.0 combination. Clustering was expensive and never really caught on. Moving forward a couple of years and software versions, Microsoft IT wanted to show what could really be done and so deployed a 7-node Exchange 2003 cluster.

As I recall, four of the nodes were serving client connections while two were passive and waited patiently to swing into action should another node fail. The seventh node was used for backups and other administrative operations. In the context of the times, the mega-cluster was a strange and exotic beast that was capable of supporting 16,000 active clients. Each server ran twenty databases arranged in four storage groups. Outlook 2003 (newly equipped with cached Exchange mode capability) was the client of choice – mobile clients were a minor concern. Building such a monster was a financial and technical challenge (the SAN at its core was a massive SPOF), but it seemed pretty cool too.

As I am reminded of all too frequently, the Internet has a memory, and I found a blog post by Scott Schnoll from 2005 titled “How Microsoft achieves 99.99% uptime with Exchange 2003.” Its content is a good reminder of how far the technology has advanced over the last ten years with huge advances in reducing the need for disk I/O and native high availability built into the product rather than a dependency on hardware.

One of the interesting aspects described in the post is the weekly service review. This kind of human-intense review was common practice at the time but as systems have scaled up and become more complex, much more attention is paid today to automation of monitoring, reporting, and rectification (as in Managed Availability). As the post notes, “regular reviews are very important to achieving high availability.” Even with today’s tools, great value is gained when humans cast a cold eye over the results of automation.

Today’s mega-cluster might be a sixteen-member Database Availability Group (DAG). Such a monster would be capable of supporting many more than 16,000 mailboxes, so a fairer comparison might be a four-node DAG where each database would have at least three copies. Exchange 2013 needs more CPU and memory than Exchange 2003 but today’s server hardware is much more capable too. I rather think that a four-node DAG would be cheaper in real terms by a considerable factor, perhaps ten times less depending on the selected hardware.

The investments made in Exchange over the last ten years have been of enormous benefit to on-premises customers and created the technical and economic foundation for Exchange Online/Office 365. Looking back at the mega-cluster and the excitement it engendered at the time, it’s a real reminder of just how far we’ve come.  I wonder what progress will be made in the next decade?

Follow Tony @12Knocksinna

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.