The Road to Recovery

A frequently asked question on the SQL Server public help forums is, "How can I recover just one of the tables in my database?" The need for single-table recovery can arise if someone issues an inappropriate UPDATE or DELETE statement that changes a substantial amount of the table's data in unintended ways. In this situation, an administrator might want to replace the data in that table with data that was previously backed up. Another time you might want to restore only a subset of the data in the database is when one of the disks that contain your SQL Server database fails but the other disks are still fully functional. If you have a database backup from before the unintended change or disk failure took place, one recovery strategy is to restore the entire database into a new location, then copy the appropriate data from the restored database into the original database. However, for very large databases (VLDBs) that contain tens or hundreds of gigabytes of data, the time and space requirements might make this solution impractical. Having SQL Server restore the data that belongs to one specific table or set of tables isn't as easy as it might seem. However, SQL Server 6.5 and later releases give you some options for recovering only parts of a database.

Single-Table 6.5

SQL Server 6.5's mechanism for restoring data from one table is the LOAD TABLE command. The source of the table data can be a backup file of just that table, which you create by using the DUMP TABLE command, or a complete database backup. This single-table restore capability isn't as useful as it sounds, however. To make a backup of one table, you have to anticipate the need for restoring that table. If your database consists of hundreds of tables, you need to decide which tables are most likely to need individual recovery. Or rather than trying to decide which tables to back up, you might decide to make individual backups of every table. But the time, space, and administrative resources required to manage these backups can make this technique even less practical than just restoring the entire database.

Alternatively, when you're using SQL Server 6.5, you could restore the table from a complete database backup. This solution sounds ideal: You probably have complete database backups, so you don't need any extra time, space, or administrative resources for the backup operations. However, this solution has a big downside. Table data in a database backup file won't necessarily be transactionally consistent, so if you restore just the data from a single table, that data might be inconsistent and reflect an in-progress transaction.

For example, suppose that while a database backup operation is running, you issue the following transaction in a much larger version of the Pubs database:

UPDATE titles
SET price = price * 0.80

This statement attempts to update the price of every title in the titles table with a price decrease of 20 percent. The backup operation in SQL Server 6.5 attempts to back up each page in page-number order, but the pages in a table aren't necessarily in any particular sequence. Let's say that the titles table occupies pages 100 to 200 and pages 800 to 1000 in the database. The backup operation might write pages 100 to 200 to the backup device before the UPDATE statement starts, so these pages would keep the original prices. Then, while the backup thread is writing pages 201 to 799, SQL Server updates the entire table according to the above statement. So by the time the backup thread writes pages 800 to 1000, those pages contain the new, updated data values. The titles table in the full backup is now transactionally inconsistent. If someone performs a LOAD TABLE operation for the titles table, some rows will contain the original price values and some rows will contain updated values. Thus, SQL Server 6.5 can't guarantee that transactions are atomic—that either all the changes happen or none of them happen.

However, if no one performs a LOAD TABLE operation, the inconsistent table data isn't a problem because the full database backup also contains part of the transaction log. If you restore the entire database (rather than one table), SQL Server will run the same recovery process that it runs every time it starts up. During recovery, SQL Server compares the transactions in the log with the data in the database. The recovery process will detect that some pages in the titles table aren't up-to-date and will roll forward the changes to the rows on pages 100 to 200. When the recovery process is complete and the database is fully restored, the titles table will be transactionally consistent and all the rows will reflect the values that the UPDATE statement changed. Because of these shortcomings in SQL Server 6.5's ability to back up and restore a single table, SQL Server 2000 and 7.0 don't have this capability. In these releases, Microsoft included the capability of backing up and restoring files and filegroups.

Backing up and Restoring Files and Filegroups

When you create a database in SQL Server 2000 and 7.0, you can choose to spread the data over multiple files. You can also create named filegroups, which contain one or more files. (For details about creating and managing databases with multiple files or filegroups, see Michael D. Reilly, Certifiably SQL, "Creating SQL Server 7.0 Databases," November 1999, and Wayne Snyder, "Ensuring Up-to-the-Minute Database Recovery," October 1999.) One reason you might want to use multiple files and named filegroups is that these options let you create a table within a particular filegroup. You can't specify which file to create the table in, but you can create a filegroup with just one file. Placing a table in that filegroup is equivalent to placing a table in the only file that filegroup contains.

When your database is on multiple files or filegroups, SQL Server 2000 and 7.0 let you back up and restore individual files or filegroups. This ability can be useful with extremely large databases. You can choose to back up just one file or filegroup each day so that you don't need to back up the entire database as often. The ability to back up and restore individual files or filegroups can also be useful when you have an isolated failure on one drive and restoring the entire database would take too long. Even if you've only done full database backups, you can restore individual filegroups from a full backup and replace the damaged file or filegroup with the backed-up copy.

With this capability to restore a single filegroup, you might think you'd have the option of restoring only one table if that table existed on its own filegroup. Suppose you created filegroup FG1 containing one file, File1, then you created one table called Table1 on that filegroup. If someone incorrectly modifies Table1 and you have a full database backup from just before the modification, you might try to restore just the filegroup containing Table1, hoping to bring back the old data in Table1. However, this approach won't work. After restoring a file or filegroup backup, you must also restore all the transaction logs made between the last time you backed up the file or filegroup and the time you restored it. This requirement is supposed to guarantee that the restored files are in sync with the rest of the database. But in this case, you don't want the restored files to be in sync with the rest of the database, because applying all the transaction-log backups will redo all the unintentional modifications to Table1 that you're trying to undo.

Although restoring a file or filegroup isn't a solution for recovering a table damaged by user error, it helps with the problem of isolated media failure. For example, suppose you back up filegroup FG1 at 10:00 a.m. Monday. As users access the database, changes happen to the data on FG1 and SQL Server processes transactions that change data in FG1 and other filegroups. You back up the log at 4:00 p.m., and SQL Server processes more transactions that change data in FG1 and other filegroups. But at 6:00 p.m., a media failure occurs and you lose one or more of the files that make up FG1.

To restore, you must first back up the tail of the log that contains all changes that occurred between 4:00 p.m. and 6:00 p.m. You can then restore filegroup FG1 by using the RESTORE DATABASE command, specifying just filegroup FG1. Your database won't be in a consistent state because the restored FG1 will have changes only through 10:00 a.m., but the rest of the database will have changes through 6:00 p.m. However, SQL Server knows when the last change was made to the database because each page in a database contains information about when it was last modified. When you restore a filegroup, SQL Server makes a note of the latest time any page in the database changed. You must then restore log backups until the log reaches at least that latest change time; in this example, you'll reach that point when you apply the 6:00 p.m. log backup that you made after the failure occurred.

Partial Restore

SQL Server 2000 adds to the restore operation another capability that lets you do a partial restore of a database in special situations. In SQL Server 2000 Books Online (BOL), the description and syntax for a partial restore seem similar to those for file and filegroup backups, but partial restore differs in one significant way. With file and filegroup backups, you start with a complete database and replace one or more files or filegroups with previously backed-up versions. With a partial database restore, you don't start with a full database. You restore individual filegroups—including the primary filegroup, which contains all the system tables—to a new location, essentially creating a new database containing a subset of the original database. Any unrestored filegroups don't exist in the new database, and SQL Server treats them as OFFLINE when you attempt to reference data stored on them. You can then restore log backups or differential backups to update the data in those filegroups with recent changes.

This partial restore process gives you the option of recovering the data from a subset of tables after an accidental deletion or modification of table data. You can use the partially restored database to extract the data from the lost or damaged table and copy it back into your original database.

When performing a partial restore, you must always restore the primary filegroup, which contains all the system tables. When you restore the primary filegroup, all system tables are restored, even those associated with files that aren't included in the restore operation. So sp_help or Enterprise Manager will show the tables because they exist in the sysobjects system table, but when you try to access a table on an unrestored filegroup, you'll get a message like this one:

Server: Msg 8653, Level 16, State 1, Line 1
Warning: The query processor is unable to produce a plan because
the table 'table2' is marked OFFLINE.

Let's look at an example of a partial restore. The operation is intended for use when your databases are too big to restore in their entirety, but for this example, let's create a very small database called testdb. You can run the code that Listing 1 shows to create a database on three filegroups: one primary filegroup and two user-defined filegroups, FG1 and FG2. The code in Listing 1 also creates two tables, one on each user-defined filegroup, and inserts a row of data into each table. You might want to change the physical paths for the files to reflect where your SQL Server 2000 instance is installed or to correspond with your disk configuration.

After creating and populating the testdb database, you can use the following command to back it up:

BACKUP DATABASE testdb TO DISK = 'c:\backup\testdb.bak'
WITH INIT

Now suppose that someone unintentionally or maliciously destroys Table1. You can perform a partial restore of testdb by creating a copy of it that contains only filegroup FG1, which contains Table1, and the primary filegroup and log. Listing 2 shows a command to perform a partial restore of testdb into a new database called testdb_partial. To run this code, change the physical filenames or paths as appropriate for your system configuration.

You can now use the sp_help system stored procedure to examine testdb_partial; note that the database shows both a Table1 and a Table2. You can select from Table1 and see the data, but when you select from Table2, you get the error 8653 shown earlier.

Variations on a Theme

Let's look at a couple of other situations involving partial backup. What does SQL Server restore if the backup you're using was made in the middle of a transaction involving multiple rows from Table1? And what's restored if the backup was made after a completed transaction involving both Table1 and Table2?

To simulate the first situation, you need to use Query Analyzer to run a transaction in multiple batches by highlighting and executing only a few statements at a time. Because SQL Server won't let you execute a backup command from a session that's in the middle of a transaction, you need to use another connection window to run a backup command. So in one query window, execute the beginning of a transaction:

BEGIN TRAN
INSERT INTO table1
  SELECT 'This is before the backup'
GO

And in a second connection window, back up the testdb database, using the INIT option to overwrite the previous backup:

BACKUP DATABASE testdb
TO DISK = 'c:\backup\testdb.bak'
WITH INIT
GO

After the backup is complete, return to the first connection window and complete the transaction:

INSERT INTO table1
  SELECT 'This is after the backup'
COMMIT TRAN
GO

Now let's look at what the backup captured. Make sure you have no connections to the testdb_partial database, then rerun the script in Listing 2 to recreate testdb_partial. If you select the data from Table1, you'll see only the original row:

a	b
1	This is table1 in FG1

With SQL Server 2000 backup and restore, you're not left with inconsistent data. Because the transaction was in progress when the backup occurred, only part of the transaction was reflected in the restored database. However, the restore process applied the portion of the transaction log that you backed up along with the database backup. This log restore operation undid any changes that you hadn't committed when the backup was made, so the table returns to its original state.

You can use the script in Listing 3 to simulate the situation in which the backup was made after a completed transaction involving both Table1 and Table2. The script drops testdb_partial and recreates the tables in FG1 and FG2 in the original testdb database. The code in Listing 4 shows a transaction that spans Table1 and Table2, which are on two different filegroups. The transaction commits, then the script starts the backup. Let's assume that sometime after this backup, someone damages or modifies the data in Table1 and you want to return Table1 to its condition at the last database backup. To get back the data, you can use the script in Listing 2 to restore testdb_partial.

When you examine the data in Table1, note that it contains the new row you inserted during the transaction in Listing 4, but Table2, which had a row inserted during the same transaction, is completely unavailable. You're performing the partial restore because Table1 on filegroup FG1 was unavailable or damaged, so the only important thing to restore is Table1. You can now copy the data in Table1 from testdb_partial into the original testdb, where Table2 still exists with the data that the code in Listing 4 changed. After you perform the partial restore and copy the data from the partially restored database to the original, the original testdb now has both Table1 and Table2 in a consistent state—both reflecting the changes from the committed transaction in Listing 4.

Being able to restore only a subset of the files or filegroups into an existing database helps you recover from media failure without the time and space requirements of restoring an entire database. And the ability to partially restore a database into a new location, then copy that data to your original database, helps you efficiently recover from user errors.

Comments

Plain text