Backup Strategies

Last month, in "The Transaction Log", I looked at the interaction among SQL Server's transaction log, checkpoints, and automatic recovery process. Knowing how these features work together gives you a firm foundation for planning your backup strategy. This month, I guide you through SQL Server's different backup options and how to choose the best option-or combination of options-for your SQL Server installation. (For information about how to set up a backup job, see Certifiably SQL, "Defining Jobs," published in the June 2000 issue of SQL Server Magazine and available online.)

The following sections of the SQL Server 7.0 Administration (70-028) online exam guide cover backup strategies:

Planning: Develop a data availability solution. Choose the appropriate backup and restore strategy. Strategies include full database backup; full database backup and transaction log backup; differential database backup with full database backup and transaction log backup; and database file backup and transaction log backup.
Managing and maintaining data. Back up system databases and user databases by performing a full database backup, a transaction log backup, a differential database backup, and a filegroup backup.

The Implementing a Database Design (70-029) online exam guide doesn't specifically mention the backup process, but that doesn't mean the exam has no backup-related questions. Your database design influences your backup strategy. For example, your design may have to take into account the need to place different parts of the database on different filegroups to help the administrator back up efficiently and in the allotted time.

Types of Backup

SQL Server supports four types of backup: full, transaction log, differential, and file or filegroup. A full backup backs up the entire database-the database schema as well as the data. When you restore a database that you have a full backup of, you don't have to recreate all the objects, then reload the data from the backup. The restore is a one-step process.

A transaction log backup backs up all changes recorded in the transaction log since the last backup, whether the last backup is a full backup or another log backup. Many organizations perform a full backup once a week and log backups daily. But if you have high transaction volumes, you might have to back up the database nightly and back up the transaction log several times a day. Because the log backup records changes since the previous backup-not necessarily since the previous full backup-to restore the database, you need both the full backup and every log backup since the full backup. Even one missing or corrupted log backup in the sequence will prevent a restore. However, having a series of log backups lets you restore to a specific point in time. So if an error or a malicious user trashes your data at 3 P.M., you can restore the data as it was at 2:59 P.M. (more about this feature in next month's article about restoring databases).

SQL Server 7.0 introduced differential backup, which backs up only pages that have changed since the last full backup. Suppose you have records that have changed several times since the last full backup. A transaction log backup would detail every record change since the last backup, but a differential backup would back up only the current value of the data. Restoring from a differential backup is faster than restoring from a transaction log backup because the restore process doesn't have to apply each of several changes to a record; the process just restores the data as it was when you ran the differential backup. Combined with full database and transaction log backups, differential backups can significantly reduce the time you spend restoring your database after a problem.

File or Filegroup Backups

SQL Server 7.0 stores data in files instead of in devices, as earlier releases do. SQL Server 7.0 also introduced filegroups, which let you designate tables and indexes for SQL Server to create on specific files on specific disks. Using filegroups can boost performance and help you manage your backup processes. For example, suppose you only have between midnight and 5 A.M. on weeknights to back up a sales database that takes 8 hours to back up. You could design your database so that product information is in one filegroup, customer data is in a second filegroup, and sales data is in a third filegroup. You could back up the entire database on Sunday. Then Monday night, you could back up only the products filegroup. Tuesday, you could back up the customers filegroup. And Wednesday, you could back up the sales filegroup. You could then repeat the filegroup backup process Thursday, Friday, and Saturday. Such a strategy would let you break your backup into pieces that fit into the allotted time. And as you'll see next month, if you lose one disk-say the one with the customers filegroup-you need to restore only this filegroup, not the entire database.

Let's say that after this backup scheme has been running for a month, you realize that users add only a few new products per week. However, users add several hundred new customers and thousands of new sales per day-thanks to your e-commerce Web site. So, you modify your backup strategy. You back up the sales filegroup Monday, Tuesday, Thursday, and Friday nights. You back up the customers filegroup Wednesday night. And you don't bother backing up the products filegroup at all during the week. If you lose a filegroup, you can restore it to its current state by using the most recent filegroup backup or the last full backup (I cover how to do this in next month's article about restoring databases). However, you do need the transaction log backups, which SQL Server uses to restore only the data for the relevant filegroup.

Some DBAs and designers like to place older, archival, or infrequently used data on a separate filegroup. Why keep backing up last year's sales figures over and over again? Although filegroups add a level of complexity to your database design, they can offer real performance and backup and restore benefits. If you prefer not to use filegroups, you can still have SQL Server back up designated files rather than the entire database-achieving the same backup benefits with a slightly simpler database design. And if you do place data and indexes on different filegroups, SQL Server backs up the indexes along with the data for internal consistency.

To back up either a file or a filegroup, expand the Management item from Enterprise Manager, then right-click Backup, select Backup a database, and select the database you want to back up. (Alternatively, you can expand Databases from Enterprise Manager, right-click the database you want to back up, then select All Tasks, Backup Database.) Select the File and Filegroup backup option, then to open a list of filegroups and the files they contain, click the ellipsis to the right of the File and Filegroup option. Select the check boxes for the files and filegroups you want to back up. Note that if the File and Filegroup option isn't available for selection, you likely have the truncate log on checkpoint option set.

Clustered Indexes and Differential Backups

Knowing that SQL Server always reads and writes entire pages, you might think that if your users change any row on a page, SQL Server backs up the whole page. However, the backup process backs up extents, not pages. An extent is eight pages. In fact, if any page-or any row-in the extent is changed, SQL Server backs up the whole extent.

If you're adding records, using a differential backup may or may not be a problem, depending on whether you have a clustered index on the table. If you build the clustered index on an incrementing value, such as those an identity column generates, then the backup process appends all the new data pages and extents to the end of the existing data structure. Without a clustered index, SQL Server just backs up new pages and extents that you've added since the last full backup. But if you have a clustered index based on a randomly distributed key and add records that are scattered throughout the data pages or if you're modifying many existing records, think carefully about using differential backup.

Suppose you have a table with 10,000 rows in 80 pages, or 125 rows per page. Eighty pages make up 10 extents. If you had changes to only 10 rows, scattered throughout the table, SQL Server could possibly mark all the extents for backup. If other tables had a similar pattern of changes, your differential backup could be backing up the whole database.

Differential backups are great for large databases in which only a small percentage of the records might have changed. But if you perform only differential backups between your full backups, then you're giving up the ability to perform a point-in-time restore because you aren't storing the intermediate values for your changing records. What works best in many cases is a combination of backup strategies.

Dynamic Backups

SQL Server 7.0's backup process is faster than SQL Server 6.5's process. Let's look at how these two processes differ. When SQL Server begins a backup, it notes the Log Sequence Number (LSN) of the oldest active transaction and performs a checkpoint, which synchronizes the pages on disk with the pages in the cache memory. Then, SQL Server starts the backup, reading from the hard disk-not from the cache. In SQL Server 6.5, if a user needs to update a record, SQL Server allows the update if the backup process has already backed up the record. Otherwise, SQL Server holds up the request for a moment-long enough for the backup process to jump ahead and back up the extent containing that record. SQL Server 6.5 then lets the update request proceed and resumes the backup process at the point it was when it was interrupted. When SQL Server 6.5 reaches this extent again, the backup process skips it, because the process has already backed up this extent.

SQL Server 7.0, in contrast, doesn't worry about whether users are reading or changing pages. SQL Server 7.0 just backs up the extents sequentially, which is faster than jumping around as SQL Server 6.5 does. Because SQL Server 7.0 doesn't jump ahead to back up extents before users change data, you could end up with inconsistent data. However, SQL Server 7.0 also introduced the ability to capture data changes that users make while the backup is in progress. When SQL Server 7.0 reaches the end of the data, the backup process backs up the transaction log, capturing the changes users made during the backup process. Although dynamic backup comes with a performance penalty, Microsoft promises no more than about a 6 percent to 7 percent performance reduction, which most users would never notice. Scheduling backups during low database activity is still a good idea, but if you have to back up the transaction log several times a day, you won't be able to avoid having some users connected.

Because a backup can take considerable time, SQL Server 7.0's process is a welcome enhancement. Pre-SQL Server 7.0 releases back up the data as it is when SQL Server begins the backup; SQL Server 7.0 backs up the data as it is when SQL Server finishes the backup.

If you're backing up just the transaction log, at the end of the backup process, SQL Server 7.0 truncates the log, removing all transactions before the LSN it recorded for the oldest ongoing transaction. Truncating the log frees up space in the log and keeps it from filling up. (The log could still fill up, however, if you have a long-running transaction that isn't completing.) Remember that SQL Server doesn't truncate any log entry that has an LSN greater than that of the oldest active transaction.

Backup Strategies

Which method or methods should you use to back up your databases? The answer depends on many factors, including database size, volume of database additions and changes, how much time you have to restore the data, and whether you can re-enter the data. Let's consider several scenarios and possible strategies for each.

Scenario 1. You have a small database that changes infrequently. Once or twice a day, you load data from another application or users make a few minor changes. You decide to do a full backup every night at 10 P.M. You don't care about the transaction log because, in an emergency, you can restore the last night's backup, then rerun the batch jobs to add today's changes. Because the full backup doesn't truncate the log, you need to prevent the log from filling up by backing up the log with the TRUNCATE_ONLY option after you run the full backup. The TRUNCATE_ONLY option throws away the old log entries without backing them up. Alternatively, you could set the truncate log on checkpoint option for the database, which continually clears out the log.

Scenario 2. You have a small to midsized database that changes during the day, and you would have a hard time reproducing the changes, which come mainly from telephone sales orders. You opt for full backups every night, and clear the transaction log by backing up the log with the TRUNCATE_ONLY option. You don't set the truncate log on checkpoint option for the database because you want the log to fill up during the day. If a problem occurs, you can then use the previous night's full backup plus the transaction log to restore the data as it was before the system went down.

This strategy works in theory. But suppose the disk on which your database resides is damaged (as recommended, you placed the log on a separate physical disk, so the log is still safe). According to SQL Server Books Online (BOL), you should first back up the log by using the NO_TRUNCATE option to ensure that it is safe. The NO_TRUNCATE option backs up the log but doesn't delete old transactions. When you're through with the log backup, you restore the last full backup and all transaction log backups since the last full backup.

With SQL Server 6.5, you have no problem following this procedure. But in SQL Server 7.0, Microsoft radically changed how SQL Server stores data. SQL Server 7.0 stores all information about the various database files, including secondary files and log files, in the system tables, which by default are on the primary file. And you just lost the disk with the database on it-so you don't have your primary file. In this case, SQL Server doesn't have a clue about where your transaction log is and, therefore, can't back it up. The best you can do is restore the database to the point of your last transaction log backup before the system crash. Fortunately, SQL Server 2000 fixes this design flaw, letting you follow the procedure as you can with SQL Server 6.5. If you don't have SQL Server 2000 and need a workaround, see Wayne Snyder, "Ensuring Up-to-the-Minute Database Recovery," published in the October 1999 issue of SQL Server Magazine and available online.

Scenario 3. You have a midsize to large database with many changes each day. You don't have enough time to perform a full database backup every night, so you perform a full backup on the weekend and back up the transaction log several times a day. By Friday, you have about 20 transaction log backups. Restoring them all in order would take considerable time, and management has set quick database recovery as one of the company's goals. In this scenario, consider a combination of full, log, and differential backups. Every night at midnight, run a differential backup. During the day, schedule transaction log backups at 9 A.M., noon, 3 P.M., and 6 P.M. When a problem occurs and you have to restore on Friday, for example, you can do so from the full backup, Thursday night's differential backup, and the log backups you made since Thursday's differential backup. The differential backup lets you bypass the transaction logs from Monday through Thursday.

Scenario 4. You have a very large data warehouse that the company uses only for decision support-users don't update it. Each weekend, you use a Data Transformation Services (DTS) package to load the preceding week's sales database data into the data warehouse. You can use several different strategies-including periodic full backups-to back up the data warehouse. But the best solution would probably be a full backup immediately after the data load on the weekend. In case of trouble, you could just restore that backup. If the data warehouse is extremely large and the amount of data added is relatively small, you might choose to do less frequent full backups, maybe once a month. Or as long as you guard the data sources with an effective backup strategy, you might rely on a full backup from a few weeks ago, then run the DTS tasks again to reload the data from the past two weeks. Alternatively, you could choose weekly differential backups.

Backing up Analysis Services and OLAP Services' data cubes is beyond this article. If you have Relational OLAP (ROLAP) cubes, you can store the cubes as SQL Server tables and back them up using the standard SQL Server backup options. But if you use Multidimensional OLAP (MOLAP) or Hybrid OLAP (HOLAP) cubes, you're outside the scope of SQL Server backup.

As you're preparing your backup strategy and for the SQL Server certification exams, note that SQL Server 7.0 and SQL Server 6.5 handle the TRUNCATE_ONLY and NO_LOG backup options differently. SQL Server 7.0 Books Online (BOL) states that these options do the same thing: truncate the log without backing it up first. However, in SQL Server 6.5, these options work slightly differently in that TRUNCATE_ONLY first writes a checkpoint to the log (NO_LOG doesn't write a checkpoint). You use these options in conjunction with a full database backup to prevent the transaction log from filling up or to clean out a log that is full. Because the backup process calls a checkpoint before it starts the backup, SQL Server needs to write to the log and indicate that a checkpoint happened. Thus, if the log is full, SQL Server can't run the backup.

Backing Up to File or Tape

Unlike previous SQL Server releases, SQL Server 7.0 lets you combine SQL Server backups and Windows NT backups on the same tape. SQL Server can back up to a file (either locally or on a network drive), to a local tape drive, or to a named pipe (an option used mainly by third-party software developers). You can't back up your databases to a remote tape drive. If the tape drive is on another computer, you can just back up the database to a file on either computer, then use the NT or another backup utility to back up to a tape. If you use a remote tape drive, restoring is also a two-step process. Backing up your database to a hard disk is faster than backing up to tape, unless you have a very fast tape drive, such as a DLT drive. In fact, because of the faster performance, you might want to back up to hard disk even if you have a local tape drive.

One limiting factor in any backup-but particularly with tape drive backups-is the speed at which you can transfer data. The growth of tape capacity has lagged behind the growth of hard disk size and of databases, so even if you have a tape changer, you might need several tapes to back up a database. However, SQL Server can stripe backup sets similarly to how NT stripes hard disks. To set up striping, just select multiple devices for the backup.

SQL Server can stripe to tape or to disk and write blocks of data to each backup device, speeding up the backup. If your backup takes three tapes, for example, you might use three tape drives instead of one tape changer because now you can back up to the three tapes in parallel rather than in sequence. You can also use the striping technique with backups to disk files. If your backup takes 100GB of disk space, you could stripe the backup across two 50GB disk drives-again writing the data in parallel to two disk drives instead of in sequence to one drive. Even if you back up across multiple drives, you can restore from just one drive. If your computer room burns down, for example, and you can access only one tape drive, SQL Server can put the data pages from each tape back in the proper order, even though the files are multiplexed across more than one tape. However, when you build a stripe set of backup devices, you must use those devices as a set for all backups directed to those devices. For example, you can't back up your Customers database to devices A, B, and C, then back up your Products database to just device C. If you need to use the devices separately, you have to break up the backup set by reinitializing the devices.

Not Allowed

Although SQL Server 7.0 performs dynamic backups to capture data changes that users make during backup, SQL Server doesn't allow certain actions during a backup. You can't change the database size, either by adding files, deleting files, or shrinking the database or its files. You can't create indexes during a backup, and you can't perform any nonlogged operations, such as a bulk copy program (bcp) to load data. If you try to start any of these operations while a backup is running, the operation will fail. And SQL Server won't let you start a backup while any of these operations is in progress. Note that the ban on increasing the database size also applies to the automatic growth option, so be careful if you're relying on that option to expand your database.

SQL Server's backup process provides options to cover almost every need. With proper backup planning and testing, you can rest assured that you'll always have an accurate and up-to-date backup of your data. I'll finish this series next month by covering the restore process.

Comments

Plain text