In the event of a disaster, fast and effective recovery of your database environment is essential. You must be able to quickly implement your disaster-recovery plan—which must be tested and well documented before the disaster. A successful disaster-recovery plan lets you recover your database within your company's defined acceptable amount of downtime and data loss, if any. Because backup and restore are important and required parts of any disaster-recovery plan, your backup/restore strategy should minimize both data loss and downtime. And when your database is measured in hundreds of gigabytes—or even terabytes—the plan to minimize downtime and restore quickly becomes even harder to achieve as database size affects recovery times and backup complexities.
Creating, testing, and maintaining a database environment where little to no data is lost and downtime is entirely avoided in a disaster is no trivial task. More important, in your quest for high availability, remember that more than the database can fail. Many SQL Server features—such as failover clustering, log shipping, and replication—offer high-availability solutions. However, regardless of the options you choose and the levels of redundancy your hardware provides, you'll always need a solid backup strategy. No matter what size the database or the availability requirements, restore is always an option—and in some cases, such as accidental data modifications or deletions, the only option—that lets you restore the database to a state before the modification.
Whether you're recovering from accidental data deletion, hardware failure, natural disaster, or other unplanned incident, your backup strategy is the foundation of a solid recovery plan. You can restore backups to different servers or different file locations, and you can easily send them to geographically dispersed locations either electronically or on removable media. Backups offer the most options for the lowest cost and require little additional hardware, except maybe backup storage devices such as tape. But you still need to take time to fully understand all the features and potential pitfalls to ensure that your backup is as automated as possible and your restore is as fast as possible. (For more information about restoring databases, see Kalen Delaney's articles "All About Restore," May 2002 and "Safe Transit.") With SQL Server 2000, backup and restore options are easy to automate and combine to create a flexible, effective recovery path. All backup/restore features are included with all SQL Server 2000 editions and don't require Enterprise Edition. Moreover, by practicing some of the advanced options I discuss here, you can minimize downtime and reduce data loss even in the event of total server failure.
What's Your Backup Strategy?
On many production servers, the backup and restore strategy is periodic complete database backups (e.g., weekly) with frequent log backups (e.g., hourly); some shops add occasional differential backups (e.g., nightly except on the day when full backups occur). The basic strategy is acceptable, but adding differentials helps minimize downtime by reducing the number of logs you have to apply to roll forward the database; you need to apply only the full backup, the most recent differential backup, and the logs since the last differential backup. However, both strategies must use a full backup as the starting point for a restore. For details about SQL Server 2000's backup types, see Michael Reilly's Certifiably SQL column "Backup Strategies," September 2000.
If you have a very large database (VLDB)—measured in hundreds of gigabytes or even terabytes—you need to ask yourself a few questions about your backup strategy. How long does your VLDB take to back up? More importantly, how long does it take to restore? Are these times acceptable based on your business requirements? Unless your strategy uses storage-assisted backup such as split-mirror, a restore that uses one of the full-backupbased strategies is likely to take hours. Even if hours of downtime are acceptable, how much data can you afford to lose? And what's your site redundancy strategy—do you use log shipping or do you copy all your log backups to a secondary site?
SQL Server pauses log backups while a full database backup is running. The Web-exclusive sidebar "Log Backups Paused for Good Reason," explains the technical reason for this requirement, but you could lose a lot of data if you had a site failure during a full backup. This scenario might sound unlikely, but when your full database backup is measured in hours, your risk of data loss increases. If your database backup takes 8 hours to perform, your secondary site could be as much as 8 hours behind at the time of a failure, resulting in 8 hours of data loss if the backup wasn't complete or hadn't copied to the secondary location. Losing 8 hours of data is unacceptable in most cases. But what if a log backup could occur while you were performing a different backup? Such an approach would let you ship changes to another site even while your large backups are occurring. To let logs back up every minute (a common frequency for minimizing data loss), you might choose not to perform a full database backup at all. By using a file and filegroup backup strategy, you can completely avoid performing a full database backup, so log backups will never be paused. This strategy lets a secondary site's data stay as close as possible to the primary site's and minimizes potential data loss.
Some basic backup and restore strategies have few restrictions; for example, a full database backup has no real restrictions—it can be performed anytime. However, the more advanced file and filegroup backup strategies have some basic requirements you need to understand. Primarily, you must perform log backups regularly. Log backups are a crucial component of restore and are required for recovery if you use the file and filegroup backup strategy. To allow log backups and thereby minimize your work loss exposure, you must first set the database recovery model to either Full or Bulk_logged. Because the Simple recovery model doesn't let you back up the transaction log, you can't use this recovery model with the file and filegroup backup strategy.
The second step—optional but beneficial in setting up the file and filegroup strategy—takes some planning. When you create objects, take time to place your objects strategically within the database. In most databases and especially in VLDBs, your data will likely vary in the way it's used. For example, you'll probably have some data that's predominantly read-only and some that's predominantly read/write. In some tables, new rows might be read/write to accommodate frequent inserts, whereas old rows (historical data used mainly for analysis) would be read-only. For other tables, the distribution might vary based on corporate policy (for example, price changes are infrequent, so price information would be predominantly read-only—until price increases were put into effect). After reviewing the overall usage of your data, you can determine table usage and strategically place read-only tables (or tables with isolated or batch updates) on one filegroup and read/write tables on another. Doing so will not only save you time during a restore but can also save money and time during backups because although the read-only portion of your database might be very large, it doesn't need to be backed up as often as the read/write portion.
Additionally, partitioning your larger tables to split read activity from write activity can improve maintenance performance. Operations such as rebuilding an index take less time on smaller tables. For example, consider a typical sales table that holds the online sales information for a large company. Sales for the current month are read/write (which requires frequent index maintenance), but sales from the previous month or quarter are read-only because they're used only for analysis (maintenance occurs only once a month, when the data is moved to the read-only portion of the database). For tables whose usage differs, consider using separate filegroups for each type of data. For example, creating four separate filegroups—one for read-only data, one for read/write data, one for text and image data, and one for extremely large tables—can help when you're determining your backup strategies. Not only will you create smaller tables (partitions) that reduce maintenance times, you'll add numerous backup and especially restore strategy options that can reduce downtime and backup costs. To determine the correct number of filegroups and files for your database, you need to know your data; the sidebar "Filegroup Usage for VLDBs" contains some recommendations.
Backup by the Numbers
Once you're using good design and the Full (or Bulk_logged) recovery model, you're ready to start taking advantage of the file and filegroup backup strategy. To demonstrate the syntax and exact usage of this strategy, let's look at the case study that Figure 1 illustrates. To create the PubsTest database and all the backups in the case study, run the FileFilegroupStrategiesCaseStudy.sql script, which you can download at the top of the page. You can execute this script in its entirety, but I recommend working through the script slowly to review the syntax and understand the backup strategy.
The case study uses seven files: a primary file, three files in a filegroup named RWFG (used for read/write data), two files in a filegroup named ROFG (used for read-only data), and one transaction log file. After creating the PubsTest database, the script modifies data in between several backups. The diagram lists the backup types from left to right in this sequence (the number corresponds to the number along the time line):
- Full file backup of the primary file
- Transaction log backup
- Full filegroup backup of the RWFG filegroup
- Transaction log backup
- Full filegroup backup of the ROFG filegroup
- Transaction log backup
- Differential filegroup backup of the RWFG filegroup
- Transaction log backup
- Full file backup of primary file
- Transaction log backup
- Differential filegroup backup of the RWFG filegroup
- Transaction log backup
- Final transaction log backup after the disaster
Note that the time line, at number 13, includes a backup after the disaster. When a database becomes suspect, your first question should be, "Can I back up the tail of the transaction log?" The tail of the log is the up-to-the-minute information that lets you recover your database from the last backup until the database became suspect. For backing up the tail of the log, use BACKUP LOG with the NO_TRUNCATE option.
When you use the file and filegroup backup strategy, you must make sure that you back up every file so that you create a complete backup set (all of the database's data files) and can recreate the database framework if needed. You can create a backup set either by backing up the files individually, by backing up filegroups, or by backing up some combination of the two—as long as all files are backed up at some point. In SQL Server 2000, you can back up any file individually—even when it's a member of a filegroup. In SQL Server 7.0, you must back up the entire filegroup; you can't separately back up individual files that are members of a filegroup. This change in SQL Server 2000 gives much better backup-and-restore granularity: If an isolated failure occurs, you have to restore only that file. However, finding the file's pages in a larger filegroup backup takes more time than restoring the file from an individual file backup.
The PubsTest case study provides the database framework along with the most recent file backup for the primary file (backup 9) and backups of the filegroups (backups 3 and 5). The example recovery scenario involves recovering the entire database. However, one of the biggest benefits of the file and filegroup backup strategy is the ability to recover quickly when isolated corruption or media failure occurs. Instead of recovering the whole database or recovering partially from a full database backup (which is slower because the backup contains the entire database, not just the target file or filegroup), you can recover just the corrupted file or filegroup. To determine the correct strategy for recovering from isolated corruption, see the sidebar "Restoring After Isolated Disk Failure," page 28, which contains syntax, examples, and some important recommendations.
To recover the entire PubsTest database, you build the database framework, restoring the last full file and filegroup backups first. To build the example framework, you'd restore backup number 9 to create the primary file, then backup 3 to create the RWFG filegroup, then backup 5 to create the ROFG filegroup. Because the database recovery process is incomplete (the database files are all at different states and still need to be rolled forward to a consistent point), the database is said to be in a loading state. To continue to apply transaction logs (to get the database files to the same point in time), use the NORECOVERY option when you restore all file and filegroup backups. This option lets you restore backups without bringing the database online until it's recovered. When you're ready to bring the database online, you can use the following simple statement:
RESTORE DATABASE dbname WITH RECOVERY
Before you get to that point, you need to roll forward the data to the point in time when the disaster occurred. To minimize rollforward time, you might have been performing differential backups in addition to transaction log backups. Differential backups can occur at any level: database, file, or filegroup. The next step on the road to recovery is to restore the last differential backups of all files or filegroups. In the case study, you have a differential backup for RWFG at backup number 7 and again at backup 11. Because differentials contain all changes made since the last full file or filegroup backup, you need to restore only backup 11. (If the differential backup failed, you could easily use the next most recent differential backup, so you have some redundancy in your backup strategy.)
Now you have the bulk of the database populated. However, the database is still not ready for recovery. The files are still at different stages of modification: The primary file is at point in time 9, RWFG is at 11, and ROFG is at 5. The next step is to roll forward the database by applying the correct transaction log backups. Watch out—this step can be tricky. To determine the proper sequence of log backups to apply, you need to first figure out the oldest backup set that you've restored. In the case study, you restored the following:
- Backup set number 9 to get the latest full file backup of the primary file
- Backup set number 3 to get the latest full filegroup backup of the RWFG filegroup
- Backup set number 5 to get the latest full filegroup backup of the ROFG filegroup
- Backup set number 11 to get the latest differential of the RWFG filegroup
At this point, the ROFG filegroup is at the earliest point in time—point in time 5. To recover up to the minute, you need to calculate the minimum effective log to apply. For this, you can use the information from the backup history. Querying msdb for backup history can help speed the process; however, msdb might not be available at the time of the failure. If msdb is available, you can query it directly to gather the information about your database backups. Querying backup history from msdb can produce quite a bit of information because SQL Server doesn't automatically clear msdb's backup history. Historical information is kept permanently. You might want to periodically clear old information from msdb, but make sure to retain one or two complete backup sets of information. To clear the backup history, you can use the msdb.dbo.sp_delete_database_backuphistory or msdb.dbo.sp_delete_backuphistory procedures (see SQL Server 2000 Books Online—BOL—for syntax and parameters).
If msdb is unavailable, you must gather the backup history by reviewing the header information directly from your backup devices. To view the header information, use the LOAD HEADERONLY syntax. Web Listing 1, downloadable from http:\\www.sqlmag.com, contains some examples of how to use this syntax. LOAD HEADERONLY requires that the backup devices be online. Bringing all the tape devices online and gathering all of this information will probably add quite a bit of time to your recovery process. You'll probably load some backup devices and inevitably interrogate some of the wrong ones to find the correct set and the right sequence to restore. To ensure that msdb is always available for querying, see the sidebar "Msdb Quick Tip" for instructions on resetting msdb to allow transaction log backups and how to back it up frequently.
In the PubsTest case study, you can review the backup set information by using the query that Listing 1 shows. Executing this query produces the results that Table 1 shows. These results are fairly clean because the backups used good naming conventions. I recommend that you have a standard for backup names and descriptions so that you can more quickly find the correct sequence of backups to restore. In the case study, I used descriptive names, such as "PubsTest Backup, File = pubs" and left the backup description option blank. Additionally, the backup history was cleared before these backups occurred.
Now, the database framework has been restored and you know that ROFG is at the earliest state in this database's backup set. To roll the database forward up to the minute of the disaster, you need to restore the transaction logs in the correct sequence. To determine the correct set of transaction log backups to restore, you need to find backup 5's minimum effective log sequence number (LSN), which the First_LSN column in Table 1 contains. In this case, the minimum effective LSN of backup number 5 is 13000000248600001. To know which logs to restore, find the first transaction log whose minimum effective LSN is less than this number—backup number 6, in this case. You need to find the next lower LSN because you're looking for the transaction log (or logs) that contains the transactional information of what happened during this backup. Because log backups can occur concurrently with file and filegroup backups, multiple log backups could occur within the time frame of a large file or filegroup backup. In some cases, the first transaction log to load might be "before" the file or filegroup set you've loaded. So, always review the LSNs to ensure that you have the proper starting point.
To complete the full recovery of this database, you need to apply all transaction logs starting with backup number 6. In this case, you'd restore backup 6, then 8, then 10, then 12, and finally backup 13 to bring the data up to the minute. For backup number 13 (the last transaction log), you can either use the RESTORE WITH RECOVERY option to bring the database online or use the NORECOVERY option, then follow this restore with a RESTORE DATABASE PubsTest WITH RECOVERY statement to recover the database.
Recovering the entire database from file and filegroup backups is by far the most complex backup and restore strategy. This strategy takes the greatest amount of testing to fully understand and requires good practices such as strong naming conventions and regular backups of msdb to ensure an easy recovery. But by using this strategy, you can completely recover a database without ever doing a full database backup or having to pause your log backups. With this strategy, you can selectively choose when to back up large read-only portions of your database, an approach that can save backup time and money in backup media. More important, because the log is never paused, your secondary sites (for example, log-shipping destination sites) will always receive the transaction logs as quickly as possible, which can drastically reduce data loss in the event of a site failure. For best results, have your most senior DBA devise the backup plan and the least senior DBA test it.