I want to create a high-availability environment for Microsoft Exchange Server 2003, but I'm worried that clustering alone won't be sufficient. My concern is that clustering doesn't adequately protect the Exchange databases that reside on the clustered servers' C drives. I'm aware of a few Exchange-replication products that might provide the additional protection I want, but I worry that the administrative overhead associated with those solutions will be more than my small IT group can manage. I need a solution that provides near real-time data replication to volumes attached to another server. Can you point me in the right direction?
I agree that clustering is more directed at companies looking for either a load-balancing or redundant-server solution. That said, keep in mind that in a Microsoft clustering configuration, the root volume isn't shared between servers, and if a system crashes because of a software problem, the culprit is usually the OS. The good news is that products that do what you want are available. Many Storage Area Network (SAN) devices copy volumes or maintain mirrored volumes between two or more servers, and some of these solutions let you use scripts or embedded controls to schedule a function to break and rebuild the mirror within a set time interval so that you can keep volumes in sync without a high risk of corruption spreading across volumes.
Just for grins, I tried this approach on my XIOtech Magnitude SAN array. (Because SAN array interfaces vary so widely, the following description of my experiment avoids details specific to the Magnitude array.) For my test, I connected two identical servers to the array and created three new volumes for the server on which I intended to run the initial OS and Exchange installations. I decided to use the first volume for the OS installation (booting from the SAN), the second volume for the Exchange installation, and the third volume for the Exchange mailbox databases.
Next, I determined which data I wanted to replicate and when. Aside from the Microsoft IIS metabase (\%systemroot%\system32\inetsrv\meta-base.bin), I wasn't concerned with any data on the server's C drive being more than a few days old when I brought the second server online. For this test, I decided that I could live with a metabase that was a few days old as well. Furthermore, nothing on the D drive absolutely needed to be in sync with the Exchange databases on the E drive, so setting up replication for every 2 days would be fine.
I installed the OS and Exchange. During the Exchange installation, I chose to place the default Badmail and Queue folders on the E drive, which would also contain my Exchange database and transaction logs. (Be aware that best practice is to separate the database and logs. For my test, I didn't make this separation, but doing so would involve less than a minute's worth of additional setup.) I opted for synchronization between mirrored volumes to occur every 10 minutes. (The acceptable time lapse between replication of the email databases will be different for each company. For some organizations, losing 10 minutes worth of email is fine; for others, a mirror would need to be maintained in real time.) I created a mailbox and made sure email was flowing as it should.
The next step was to set up the second machine and establish mirrored volumes. I wrote a script that would break and rebuild the mirrors at set intervals (some arrays might not require a script). I set the C and D drives to break and rebuild every 48 hours and the E drive to break and rebuild every 10 minutes. (In a high-volume environment, you'd need to play with these numbers a bit because the number of transactions can be substantial and rebuilding the mirror can require much more time than I needed.)
I let this setup run for 3 days. On the third day, I used the SAN array-management console to disconnect the volumes from the primary Exchange server. This action simulated about as dramatic a failure as you could experience, effectively removing all traces of the OS and Exchange from the primary server.
I'd already assigned the mirrored volumes in the scheduled rebuild to the second server, an approach that might not be advisable in a production environment. (If I'd turned on the second server while the first server was online, I would have ended up with two copies of the same server online at the same time.) In a production environment, you'd map the volumes to the second server only after the primary server fails. Depending on the efficiency of your array-management console, that step could take from 2 minutes to 10 minutes or longer. Because my volumes were already mapped, all I needed to do after my simulated failure was turn on my secondary server.
The server booted, and at logon everything appeared to be working. All the Exchange services started, and the Exchange database mounted. Nothing in the event log was out of the ordinary. When I opened Microsoft Outlook, all my email was there and I could send and receive email from the Internet. I checked everything on the system and couldn't find any significant errors, leading me to conclude that my test was a success. I've since tried this configuration successfully on a few other products, such as Microsoft SQL Server and Symantec AntiVirus Corporate Edition.