Our company recently performed a business continuity drill at our offsite disaster recovery location. Our goal was to simulate losing our entire building (and all our servers) to a disaster, yet restore our business functions within 72 hours. Months of planning went into the successful drill. Although we encountered and overcame many problems and challenges in testing our plan, one of the biggest nasty surprises happened when we tried to restore our Exchange Server 2003 Service Pack 2 (SP2) active/passive cluster—and no cluster hardware was available. With a bit of quick thinking, I managed to solve that problem by setting up the cluster in a virtualized environment and restoring the Exchange data.
Failed Restore Attempts
As our company's Windows and Exchange administrator, it was my job to document the restore process for each server. I've restored standalone Exchange servers many times by using the Setup command with the /disasterrecovery switch, so I figured that restoring our Exchange cluster would be a snap. How wrong I was.
A couple of weeks before the 72-hour drill, I decided that I'd better run through my documentation to ensure that I hadn’t missed anything. I restored our Active Directory (AD) to a test network and proceeded to install Exchange by using the /disasterrecovery switch. Imagine my surprise when the Setup program displayed the message Operation Successful after only a few seconds of running! Clearly, something was wrong. I took a quick look at the setup.log file and saw the message that Figure 1 shows. I knew right away that I was in trouble. My plan had a huge hole: It simply wasn't possible to restore an Exchange store from a cluster to a standalone server.
My first thought was to create a single-node cluster. As long as I had a second SCSI controller, the Cluster service would think that I was using a shared disk. Unfortunately, all the servers at our disaster recovery site had only one SCSI card. No matter what I tried, I couldn't get the Cluster service to create anything except a local quorum. Desperate, I attempted to install Exchange by using a local quorum. But as luck would have it, there's a known issue with the Microsoft Distributed Transaction Coordinator (MS DTC) cluster resource on Windows Server 2003 SP1. (For more information about this problem, see the Microsoft article "FIX: The MSDTC service does not start in a stand-alone cluster after you install Windows Server 2003 with SP1" at http://support.microsoft.com/?kbid=899426.)
Now I was stuck. I didn’t have the hardware that I needed, and installing Exchange on a cluster with a local quorum entailed addressing problems that I didn’t have time to solve.
VMware to the Rescue!
I needed to come up with a solution—fast. Our disaster recovery drill was only a few days away, and not restoring our central communication system wasn't an option. I had to find a way to restore Exchange.
The first idea I had was to use VMware to set up and run the Exchange cluster in a virtual machine (VM). (You could probably create the same solution by using Microsoft Virtual Server instead of VMware.) In February 2006, VMware rebranded its GSX Server product as VMware Server and offered it as a free download (http://www.vmware.com/products/server). I’ve set up Microsoft clusters with VMware in the past. If I encountered problems, I knew that I could post a call for help on a forum I actively participate in, Mark Minasi’s Reader Forum (http://www.minasi.com/forum).
Because I use VMware for testing almost daily, I have “Syspreped” copies of all the current Windows OSs all ready to go. The Windows Sysprep utility strips out the unique SID that's assigned to each Windows machine when it's built. When the machine is copied, then powered on again, a new SID is generated during a mini-setup. This is the only way that Microsoft supports copying or “cloning” Windows machines, and this method works great. Having these OS copies on hand means that I don't have to install an OS onto a VM from scratch, saving me hours of time. Instead, I just copy the VM to a new folder and start it up. The OS runs a quick setup routine, asking for information such as the username, server name, IP address, and license key.
After I copied Windows Server 2003, Enterprise Edition to its new folder, I started to configure the server to support a Microsoft cluster. The most important thing to watch out for is how the hard drives are configured. The recovery server must have the same configuration as the production server(s). For example, our Exchange binaries (D:\Program Files\Exchsrvr) are installed on the D drive, not C. The Information Store (IS) and transaction logs are also on separate drives. This type of Exchange configuration information is stored in AD, so missing an important detail like that will cause the installation to fail.
To convince the Cluster service that the server has shared drives, I had to set the cluster drives to use the second SCSI controller (VMware Server comes with four SCSI controllers by default). To do so, when I created the virtual drive, I selected the Advanced options in the Add Hardware Wizard. So although the drive that hosts C and D is set to SCSI ID 0:0, the drives for the quorum, logs and IS need to be on SCSI ID 1:0, 1:1, and 1:2, respectively, as Figure 2 shows. Omitting this important step will cause the Custer Setup Wizard to choose a local quorum.
I then started the new server, logged on to it, and formatted the hard drives, paying special attention to ensure that I had the correct drive sizes and letters. It’s important that the drive letters match the Exchange production servers exactly. If they don’t, the Exchange setup on the cluster will fail. After completing the server setup, I took a snapshot of it in VMware, just in case I needed to go back and start again. (A VMware snapshot backs up the state of all the VM's disks, the contents of the VM's memory, and the VM settings.)
Now that I had the (virtual) hardware that I needed, it was time to configure the cluster to support Exchange. Windows 2003 has a much-improved process for installing a cluster compared with Windows 2000 Server's cluster-installation process. Instead of using the Add/Remove programs applet to install the Cluster service, in Windows 2003 Enterprise the Cluster service is already installed and ready to configure. I called my Cluster "Standby" and gave it an appropriate IP address. These IP addresses need not be the same as the production-server IP address(es).
The Cluster Configuration Wizard should choose the Q drive for the quorum. If it chooses a local quorum instead of one of the SCSI drives, something is wrong. Check your hard-drive SCSI settings, make any necessary changes, and try it again. The Wizard will also warn you about not having a second network-adapter card, as Figure 3 shows. It's safe to ignore this error because we won't need a heartbeat connection to a second server.
If I had been installing the cluster in a production environment, now would have been a good time to add that second node and try to move the resources back and forth between the two servers to ensure that the cluster was working correctly. However, this exercise was for a disaster recovery drill. I just needed to ensure that I could re-create the original environment and restore the data. Adding a second node wasn’t necessary.
Install Exchange on the Virtual Cluster
I was almost ready to start the Exchange installation, but first I needed to install a few prerequisites: ASP.NET, Microsoft IIS 6.0, Network News Transfer Protocol (NNTP), SMTP, and MS DTC—the MS DTC cluster resource.
If you've set up Microsoft SQL Server to run on a cluster, you know that MS DTC is an important component that requires its own separate dependencies. Figure 4 shows these dependencies as they're listed in the MS DTC cluster-resource properties. However, in Exchange we can get by with using the cluster IP address, cluster name, and quorum disk for the MS DTC dependencies. Explaining why you can do so is beyond the scope of this article, but the Microsoft Exchange Team Blog has an in-depth discussion about the MS DTC resource and Exchange at http://msexchangeteam.com/archive/2005/01/17/354497.aspx.
Now I was ready to install Exchange. The first time I installed it, I used the default C:\Program Files\Exchsrvr that the setup program presented me with, forgetting that we had installed the Exchange files on the D drive on our production Exchange server. Unfortunately, I didn't know that I had made this mistake until I tried to create the Exchange System Attendant Cluster Resource in the Cluster Administrator Tool (as I explain a bit later), as the screen in Figure 5 shows. (This example shows why having a snapshot of your setup is a good idea: It can save you a lot of time by not having to start over if you make a crucial mistake. The second time I installed Exchange, I remembered to install Exchange into the correct location. (Tip: Remember that you don't have to run Forestprep or Domainprep. If you restored AD from backup, your AD already knows all about your Exchange installation. You just need to bring the server back from the dead.)
After Exchange was installed, I created two cluster resources, IP Address and Network Name, to simulate the Exchange cluster in production. This network name and IP address are the same ones that are listed in our user’s Outlook profile. I could have used a different IP address, but the network name must match the network name that's used in the production cluster since this name is listed in AD. I found that if I use a different IP address, some of the Exchange services (e.g., SMTP, POP) either didn't start or stopped running. If this happens to you, follow the advice in the Microsoft article "Events are logged after an IP address change on an Exchange cluster" to correct the problem.
Remembering how the production Exchange cluster was configured, I knew that many Exchange resources needed to be created. Fortunately, Exchange System Attendant creates them for you. To access Exchange System Attendant, in Cluster Administrator I selected File, New, Resource, then selected Exchange System Attendant from the drop-down list. I followed the rest of the wizard, making sure to choose the IP Address, Network Name, and Physical Disks, so that they were listed as dependencies.
Apply the Service Pack and Restore
The last step before I could restore the data was to apply Exchange 2003 SP2. It’s important to apply the same service pack as the one that's on the production Exchange server before you attempt to mount databases because the physical structure of database pages sometimes changes in service packs. Attempting to restore an Exchange database from a server running Exchange 2003 SP2 to a server with no service pack simply doesn't work.
I downloaded Exchange 2003 SP2 and attempted to run update.exe from it but received the error message that Figure 6 shows. When I received this message, at first I thought I'd need to set up a second node. Then I thought, “I wonder if I can simply take the cluster offline and apply the service pack to it that way?" I took the node offline and, sure enough, I was then able to run update.exe and apply SP2 successfully.
After the installation finished and I restarted the VM, I opened Cluster Administrator and upgraded Exchange by right-clicking an Exchange resource and clicking Upgrade Exchange Virtual Server. I now had an exact replica of our production cluster, running in a VM, ready to accept a database restore.
This story has a happy ending: Our disaster recovery drill went off without a hitch, and now management feels better knowing that we can restore our data center. This example is just one of many ways that virtualization technology has helped me over the years. If you haven't tried virtualization software yet, I encourage you to experiment with using it—for example, in software testing, server consolidation, extending the life of legacy OSs, or for disaster recovery, as my company has done. Now that both Microsoft and VMware offer free virtualization products, you've got an excuse to at least give the technology a try.