Putting Together Your High Availability Puzzle

With every release of SQL Server, Microsoft has emphasized one area of technology. For SQL Server 7.0, that area was scalability; for SQL Server 2000, it was security. For SQL Server 2005, the emphasis is system and database availability. Microsoft has not only added one completely new technology, database mirroring, to achieve higher availability, but also substantially improved existing availability features.

SQL Server 2005 provides four high-availability technologies: failover clustering and database mirroring, both with supported automatic failover; and log shipping and replication, with either manual or custom-coded failover. Because Microsoft supports automatic failover for both failover clustering and database mirroring, they're clearly the technologies of choice to maximize uptime. If you don't need automatic failover or you're willing to custom-code your automatic failover processes, log shipping and replication might provide the availability you need.

These four availability solutions address a system and database failure. However, Microsoft has also addressed another aspect of availability in SQL Server 2005: the availability of data in a highly concurrent system. If you can't access the data you need because another process has it locked, you have an availability problem. Microsoft has added several new features to support data availability in highly concurrent environments, including snapshot isolation and online index building.

In addition, some enhancements to the database restore process can make your data available more quickly. Although you probably think first about restoring a database as part of recovery from a failure, keep in mind that you might perform a database restore for other reasons, such as when you move to new hardware or create a test system with data from an earlier backup. Two new features that make your data available more quickly during a restore are online recovery and fast recovery (see "Faster Restoring" in this article). Let's look at what you can expect from these new and improved high-availability features.

Failover Clustering

Of SQL Server's high-availability solutions, failover clustering remains the technological leader. A failover cluster consists of a set of redundant servers (called nodes) that share an external disk system. Clustering requires special Windows software. In addition, to be eligible for Microsoft support, Microsoft must certify your entire cluster configuration, and it must be listed in the Windows Catalog in the cluster solution category. During a cluster failover, a virtual SQL Server instance moves from one node to another.

As a result, a cluster failover appears to external applications as if the virtual SQL Server instance is briefly unavailable (usually for less than a minute), then available again. The instance seemingly just stops and restarts. Behind the scenes, an orderly process takes place quickly. One SQL Server instance located on one physical server becomes unavailable. Windows closes the database data files that the instance had open on a commonly shared disk space. Then, another SQL Server instance starts on another physical server, opens the same data files, and takes over the virtual server name and virtual IP address of the failed instance.

SERVER VS. DATA REDUNDANCY

The fact that SQL Server's cluster failover works at the SQL Server instance level is its essential advantage. Because an entire instance can fail over from one to another node of a cluster, all server settings remain the same. All data files are the same, including system databases; therefore, all logins, permissions, SQL Server Agent jobs, server configurations, and more are preserved. Failover clustering is the only SQL Server high-availability technology that provides such server redundancy.

Unfortunately for failover clustering, server redundancy doesn't imply data-file redundancy. Because failover clustering makes use of shared disks among the nodes of the cluster, even though those disks might be located in redundant arrays and on a SAN, that common drive system is a potential point of failure. Some SAN vendors provide methods for replicating SAN data over relatively long distances, but the technology can be costly and complex to administer.

SQL Server 2005 extends the range of clustering and uses the full capabilities of Windows clustering.The number of nodes that SQL Server 2005 Enterprise Edition supports is now limited only by the version of Windows you use. Perhaps the biggest news in SQL Server 2005 clustering is that the Standard Edition now supports a twonode cluster, whereas in earlier versions of SQL Server, only the Enterprise Edition supported clustering.

Database Mirroring

The most exciting new SQL Server 2005 high-availability feature is database mirroring. As discussed, failover clustering, which provides server redundancy, doesn't provide data-file redundancy.Although database mirroring doesn't provide server redundancy, it provides both database redundancy and data-file redundancy.

When you set up database mirroring, you use two servers with a database that will be mirrored from one to the other. The source server is called the principal server, and the database that you want to protect is called the principal database. The other server, which receives mirrored data from the source, is called the mirrored server, and the copy of the principal database on it is called the mirrored database. When mirroring is up and running, the principal SQL Server 2005 instance transmits copies of the principal database's transaction log activity to the mirror SQL Server 2005 instance. The copy of the transaction log activity is written to the mirrored database's log, then those transactions are executed on the mirror database.The result is that the mirror database executes the same transaction log activity as the principal, but slightly behind in time. It mirrors the principal's activity.

To enable automatic failover, you must specify that the transmission will be synchronous (with SAFETY set to ON) and also specify a third observer SQL Server instance, called a witness. In synchronous mode, the principal will wait for acknowledgment from the mirror that it has written the mirrored log activity to disk before the principal moves ahead with the transaction. In the meantime, the principal, mirror, and witness all communicate periodically, indicating their online status to each other.

If the principal server suddenly fails, leaving both the mirror and witness servers still functional, an automatic failover will occur. After the mirror server detects that the principal is no longer available, the mirror server queries the witness to discover whether it detects the principal. If the witness also can't detect the principal, the mirror promotes itself to the principal role and brings its database online as the new principal.The witness then records the presence of a new principal in the configuration.

If the old principal is then brought back online, the former principal finds that the old mirror is now the new principal, and that it has been "outvoted."The new principal and the witness agree that the old principal is no longer the principal server. The old principal then takes on the mirror role and starts receiving the new principal's transaction log data. A database mirroring database failover can occur in just a few seconds.

You can also enable the client to automatically redirect its connections if a failover occurs. If your application connects to a principal database using ADO.NET or the Microsoft SQL Server Native Client (SQL Native Client), the driver will automatically redirect connections when a database mirroring failover occurs. You just specify the initial principal server and database in the connection string (and optionally the failover partner server). If a mirroring failover occurs and your application attempts to connect, the driver will detect the application and redirect the connection to the former mirror server, which is now the principal.

Mirroring Restrictions

When you set up database mirroring, the principal database must be in the Full recovery model and the mirror database must be restored with NORECOVERY. Therefore, you can't read from the mirror database, although you can make a database snapshot of it on the mirror server. The principal, mirror, and witness must all be distinct-SQL Server instances: you can't mirror a database on a single SQL Server instance. Related to that restriction, the principal and mirror databases must have the same name, and you can mirror only from one principal database to one mirror database. (However, a server that's a principal for one database can be a mirror in a different mirroring session for a different database.)

Database mirroring requires either Enterprise Edition or Standard Edition for the principal and mirror servers.The witness server, which is only an observer in a mirroring session, can be any edition of SQL Server—including SQL Server 2005 Express Edition.The Standard Edition supports mirroring only in synchronous mode (with SAFETY set to ON), whereas the Enterprise Edition also supports mirroring in asynchronous mode.

What's exciting about database mirroring is that it can provide very high availability, in most scenarios failing over from one server to another in just a few seconds.This failover is automatic, just like clustering, but much faster. And, unlike failover clustering, database mirroring doesn't require additional expensive and proprietary hardware for support. Database mirroring is supported on commodity hardware and is easy to manage and monitor.As a result, in some cases, it can provide higher availability than clustering at a significantly lower cost.

Of course, database mirroring provides redundancy only at the database level. Therefore, unlike failover clustering, when you have a database mirroring failover,you must ensure that the mirror server has all the proper logins, SQL Agent jobs, SQL Server Integration Services (SSIS) packages, and other supporting components and configurations.

In addition, if you have a SQL Server instance with many interdependent databases, enabling mirroring with automatic failover might not be appropriate. If only one database fails over, you could end up with one database online on one server and all the other databases online on another server. Then, the dependencies among the databases would break.As of this release, you don't have a way to bind a set of mirrored databases so that they all fail over together (although that's a natural next step in the evolution of database mirroring).

Log Shipping

You can think about log shipping as the opposite of failover clustering, at least from a technology standpoint. It's the low-tech, low-cost way to provide database redundancy, but without any automatic failover. You might be tempted to view log shipping as simply a slow method of database mirroring, but the underlying technologies are completely different. In log shipping, you automate the SQL Server process of backing up transaction logs from a primary server and restoring them to a secondary server. (Database mirroring uses a special endpoint transmission technology, and no intermediate files are involved.)

In SQL Server 2005, you'll find several important changes in log shipping. First, the supported version of log shipping is now available in all editions of SQL Server that support SQL Server Agent, which means in all editions except SQL Server Express. Additionally, SQL Server 2005 log shipping is exclusively stored procedure? and SQL Server Agent?based and doesn't use database maintenance plans. Finally, although a monitor server was required for SQL Server 2000 log shipping, that server is optional in SQL Server 2005.

All of these changes are clearly improvements, but they come at a cost. SQL Server 2000 log shipping can't be directly upgraded to SQL Server 2005, because maintenance plans are no longer used. Instead, you must manually reestablish log shipping on an upgraded set of servers.

SQL Server 2005 log shipping doesn't support automatic failover. If the primary log shipping server fails, you must recover the secondary server yourself, either manually or based on your own custom-coded failure detection.You can set up a system to make role reversals easy, so that controlled failover and failback, although still manual, involve only a few steps.

Like database mirroring, log shipping provides database redundancy only, not server redundancy. So just as with database mirroring, you must ensure that the secondary server is kept in sync with the primary for such matters as logins, permissions, and SQL Server Agent jobs. On the other hand, unlike database mirroring, you can ship logs to multiple secondary servers.

Replication

Replication, which has been available since SQL Server 6.0, is one of the oldest highavailability features in SQL Server.Although providing high availability isn't replication's primary purpose, in many cases, it does so successfully.

Merge replication. Microsoft designed merge replication for use by occasionally connected computers (e.g., laptops), but you can use it between database servers to support high availability. On systems with low to moderate activity, merge replication can provide redundant databases—although not with automatic failover. Merge replication offers two key benefits: It lets you update the same data on both the publisher and a subscriber, and it lets you manage any conflicts automatically. Also, merge replication offers the unique capability of automatic synchronization: When either a publisher or subscriber goes offline or is disconnected, each can work autonomously. When they're reconnected or brought back online, they automatically synchronize with each other. Merge replication can't, however, guarantee transactional consistency when multisite updates of the same data are involved.

Transactional replication. You often see transactional replication used for high availability because its performance can be much better than that of merge replication and because it can guarantee transactional consistency between the publisher and subscribers. Perhaps the most common highavailability scenario for transactional replication occurs when you copy data from one database, the publisher, to one or more subscribers through a distribution server. The subscribers are treated as read-only, and updates occur only on the publisher. If the publisher fails, one of the subscribers can become a read/write server and accept data updates—and even become a publisher to the other subscribers.

Peer-to-peer transactional replication. SQL Server 2005 provides a new form of transactional replication,peer-to-peer,in which each server is both a publisher and a subscriber to the same data set. The replication is essentially two-way, similar to merge replication. Unlike merge replication, however, peer-to-peer transactional replication doesn't provide automatic conflict management. Instead, you must ensure either that updates occur to just one database or that the updates are partitioned so that the same data isn't updated at the same (or nearly the same) time.

Like log shipping, replication is supported in all editions of SQL Server 2005 that support the SQL Agent service, so only SQL Server Express is excluded. If you want to ensure that failover to a subscriber will occur, you need to manually intervene or write custom code to detect a failure and perform the failover procedures. Also, just as in log shipping, you must ensure that the servers are configured appropriately to support failover.

AVAILABILITY IN A HIGHLY CONCURRENT ENVIRONMENT

If another user has locked the data you need, it doesn't matter how sophisticated your failover solutions are, your data is still unavailable. SQL Server 2005 provides a new technology called row-level versioning (RLV) to reduce the effect of locking on data availability. The most far-reaching feature that uses RLV is SQL Server 2005's new snapshot isolation, which Kalen Delaney describes in her article "Keeping Concurrent, October 2005."

Snapshot isolation. You can enable snapshot isolation as a database setting in all editions of SQL Server 2005. Snapshot isolation lets SQL Server keep track of previous versions of all modified data. Therefore, even though the data is still locked while it's being modified, other transactions can access a previous committed version of the locked data. Data is more available. However, as always, you pay a price.

The older versions of changed rows are stored in the tempdb database, and for systems that have a large amount of modified data, tempdb space requirements can grow dramatically. On any system that employs snapshot isolation, a DBA must carefully monitor the amount of row versioning that occurs and watch the size limits for the tempdb database. You see another cost of using row versioning when many changes are made to the same rows. SQL Server will maintain all changes to any row in a linked list as long as any open transaction or running statement might need the older versions. Additional changes to the same row will cause a new row version to be linked to the front of the list. A query that needs to select older versions of data might need to traverse an increasingly longer version chain, which means that a SELECT statement can take a long time to execute, even though the data is technically available.The data modification operations will also be slower because previous versions of the rows must be added to the linked list. For details about observing the versioned rows, see Kalen Delaney's "RLV with a View," February 2006.

Online index creation. SQL Server 2005's RLV technology also supports another high-availability feature, online index creation, which is available only in the Enterprise and Developer editions.Typically, building or rebuilding an index makes the index unavailable. If you build or rebuild a nonclustered index, no modifications are permitted on the base table because the nonclustered index must be maintained with every data modification. If you rebuild the clustered index, which contains the data itself, the entire table is usually unavailable during the process.

With the new online index creation feature, the table and its indexes are fully available while indexes are being built or rebuilt. You must specifically request online index creation by using either the CREATE INDEX or the ALTER INDEX statement. For example, executing the following statement performs an online rebuild of the clustered index on the Sales.SalesOrderDetail table in the AdventureWorks database:

ALTER INDEX PK_SalesOrderDetail_ 
  SalesOrderID_SalesOrderDetailID
   ON Sales.SalesOrderDetail 
  REBUILD WITH (ONLINE = ON);

Online index creation uses row versioning to keep the original index rows available even while changes are being made to the base table.Anyone selecting from the table sees the values as they were before the rebuild began. As with snapshot isolation, with online index building, you pay a price for the greater data availability. And again, part of that price is the space required in the tempdb database, which can be considerable if you're rebuilding the clustered index on a huge table. (Every row must be versioned as you build the next index, but you also need space to version any rows modified during the index-building process.) In addition, the actual building of the index might take more time than if the building were occurring offline.

FASTER RESTORING

You might need to restore a database as part of disaster recovery, but you might also perform this operation when you move a database to a new drive or copy it to a new machine. Restoring from a backup is also a way to revert a test database to an earlier point in time so you can resume testing from a known earlier state.To restore a database, SQL Server first copies the data and the log records from the backup media, then goes through a process called recovery.

Usually, recovery applies to all files and filegroups and involves two phases. In the first phase, called redo, all transactions marked in the transaction log as committed are verified in the data files and redone, or rolled forward, if necessary. In the second phase, called undo, SQL Server checks to see whether any uncommitted transactions have made changes to data files; those transactions will be undone, or rolled back. In SQL Server versions before SQL Server 2005, the database wasn't available for any use until both the redo and the undo phases were finished.

Fast restore. A new restore feature available only in SQL Server 2005 Enterprise Edition is fast restore. Fast restore makes the database available as soon as the redo phase is finished. The data involved in any transactions that were uncommitted when the backup was made are locked and unavailable in case an undo must be performed, but the rest of the data in the database is fully available.You needn't do anything to enable this feature other than use SQL Server 2005 Enterprise Edition.

Online restore. Another new restore feature available in SQL Server 2005's Enterprise and Developer editions is online restore. Online restore lets you restore damaged files or pages while the rest of the database remains fully available. For a database to be online, its primary filegroup must be online.Therefore, if any files in the primary filegroup are damaged, online restore isn't available. However, some or all of the secondary filegroups can be offline. You can restore the damaged files from backup while the rest of the database is online. Only the file and filegroup being restored are offline. In addition, if your SQL Server 2005 database is running under the Full recovery model, you can also restore one or more individual pages from a file. Only the filegroup containing those pages is offline; the rest of the database is online.

Piecemeal restore. A final restore enhancement is piecemeal restore, which is new in all editions of SQL Server 2005 and enhances the SQL Server 2000 partial restore. A partial restore in either SQL Server 2005 or SQL Server 2000 lets you restore only selected filegroups within a database. After the initial partial restore of the primary filegroup and perhaps some of the secondary filegroups, piecemeal restore lets you restore additional filegroups. Filegroups that aren't restored are marked as offline and aren't accessible until they're restored. In SQL Server 2000, you can perform a partial restore from a full database backup only, but that's no longer a requirement for SQL Server 2005.

DATABASE SNAPSHOTS

One more new SQL Server 2005 feature that many people mention when they discuss high availability is database snapshots. However, by themselves, database snapshots aren't strictly an availability feature. Although it's beyond the scope of this article to go into any detail about database snapshots, be aware that making a snapshot of a database has some availability benefits. First, if you're running tests and want to revert to an earlier point in time,the database is unavailable while restoring from a backup. If you revert to a snapshot instead, the period of unavailability is drastically reduced. Second, you can use snapshots in conjunction with database mirroring to provide a copy of the database for reporting purposes. If you don't use snapshot isolation, locking in the source database can make data unavailable for short periods of time, but a read-only reporting database ameliorates some of that unavailability.

Final Words

The availability of your system, your databases, and your data is crucial to good performance in your environment. SQL Server 2005 has added new features at every level to improve availability and has enhanced many existing features to provide increased availability with more ease than ever before. This discussion of new high-availability features and enhancements to existing features should help you see which features will best support the availability of your systems, databases, and data.

Note: Database mirroring is disabled in the SQL Server 2005 release to manufacturing (RTM) version, reportedly to permit fuller testing. Although you can enable it by using the documented trace flag 1400 as a startup parameter, as of this writing, Microsoft doesn't support mirroring for production use. Most observers expect Microsoft to finish its testing and release database mirroring in the first half of this year. Database mirroring is one of the most eagerly awaited SQL Server high-availability technologies.

Comments

Plain text