Q. How does cluster continuous replication (CCR) work in Microsoft Exchange Server?

A. Microsoft Exchange Server stores consist of two fundamental elements: transaction logs and the database. Information is written to the Extensible Storage Engine memory cache first, then to a transaction log by the log writer, and then to the database (.edb file). The writes to the database are done via another set of writes to memory in the Information Store cache buffers, which are then written to disk by the lazy writer. There is a Microsoft blog post on the details of this process.

In Exchange 2007, the transaction log files are a maximum of 1MB. When a transaction log is full or a certain amount of time has elapsed, a new log file is created and the current log file is renamed in a sequence with the format E(storage group)(8 digit hexadecimal number).log. These transaction logs make their way into the database via various Exchange components, as described above. Once the entire content of a transaction log has its content written to the database, the checkpoint file (Edb.chk) is updated to reflect the transaction has been written to the database and does not need to be re-read in the case of a recovery situation.

With CCR, you create a two-node Windows Server 2003 or Windows Server 2008 cluster and the passive node runs Exchange with its own copy of the database, without any client connections or offered services. You have two copies of the data, one for each node, so this is a shared-nothing model with the data not being a single point of failure. The initial copy of the database is generated through a process called seeding, where the existing database is copied from the active node to the passive node on passive node storage. This initial copy can take a significant amount of time depending on the size of the database. Once the database is in place, the passive node pulls the transaction logs from the active node via a hidden, secured file share. Once the transaction logs are closed, the passive node copies them to an inspector folder, inspects the logs, and then plays the logs into its copy of the mailbox database, thus keeping its database up-to-date. This structure is illustrated here.

In the event of a planned failover, the passive node becomes the active node, connects to the previously active (now passive) node, copies any remaining log files (including the one that would have just closed), plays into the logs into its database, and comes online.

In an unplanned failover, the current active node may not be available, which means some transaction logs will most likely be lost. The passive node will query the transport dumpster on the Hub Transport servers in the same site as the cluster to find any data it may be missing, and any missing data is played into the database. Because the two nodes are part of a cluster, failover occurs automatically. The use of the Hub Transport servers is why Exchange 2007 CCR clusters can span multiple subnets but not multiple Active Directory sites.

A storage group protected by CCR can only contain one database. You can use the CCR copy as the source to perform backups instead of backing up the active node. More detailed information about CCR is available in the Microsoft article "Cluster Continuous Replication."

Related Reading:

Videos:

Audio:

Exchange 2007 and High Availability w/Paul Robichaux

Check out hundreds more useful Q&As like this in John Savill's FAQ for Windows. Also, watch instructional videos made by John at ITTV.net.

Comments

Plain text