Special Snapshots

One problem associated with managing a healthy 24 × 7 database server is that so many maintenance activities can block access to the data. And the interference cuts both ways: Ongoing database modifications can prevent you from concurrently running system health reports, such as database consistency checks and fragmentation reports.

Related: What's a Snapshot?

SQL Server 2005's new database snapshots feature lets you create a point-in-time, read-only copy of a database. You can even create multiple snapshots of the same database at different points in time. The snapshot typically requires much less space than the original database because the snapshot stores only pages that have changed in the source database. When you read from a snapshot, pages that haven't changed since the snapshot was created are read from the source database.

You can run queries on a snapshot to your heart's content without causing problems for database users. The benefits of snapshots are clear, so let's move on and look at how they work.

Creating a Snapshot

To create a database snapshot, you simply use the CREATE DATABASE command with the AS SNAPSHOT OF option. Because no graphical equivalent to this command is available through SQL Server Management Studio Object Explorer, you must use T-SQL syntax.

In the CREATE DATABASE command, you need to include each data file in the source database, specifying the original logical name and a new physical name. You can't specify other file properties, and you can't use a log file. Listing 1 shows the syntax for creating a snapshot of the AdventureWorks database and putting the snapshot file in SQL Server 2005's default data directory.

One use for a snapshot is running DBCC SHOWCONTIG—or, in SQL Server 2005, selecting from the sys.dm_db_index_physical_stats Dynamic Management Function. (Although you might get slightly different fragmentation information from a snapshot than you'd get from the original database, the difference shouldn't be significant and the ability to run the report should make the tradeoff well worth it.) The locking behavior of SQL Server 2005 tables is much improved over SQL Server 2000, but conflicts can still occur. If you try to run a fragmentation report while an application updates a table and holds onto exclusive locks, you'll be blocked until the transaction is committed.

Snapshots help you avoid such problems. To show how, let's update a table in AdventureWorks without committing the changes, then try to run a fragmentation report in AdventureWorks and in AdventureWorks_snapshot.

First, run the command shown in Listing 1 to create a snapshot of AdventureWorks. Then run the code in Listing 2 to update the Sales.SalesOrderHeader table without committing the transaction. In another connection, use the code in Listing 3 to try to run a fragmentation report on Sales.SalesOrderHeader. The report will be blocked until you return to the connection in which you're running the UPDATE transaction and either commit the transaction or roll it back.

Now run the code in Listing 2 again, then run the fragmentation report in the snapshot database, as Listing 4 shows.You get the same output as when you first ran the report, but the report isn't blocked and the SELECT from the sys.dm_db_index_physical_stats function proceeds immediately.

You'll notice that even though the AdventureWorks database is several hundred megabytes or more in size, creating the snapshot is extremely fast. In fact, you might be able to create a new snapshot every time you need to run a fragmentation report and drop the snapshot as soon as you're finished with it.

You can determine the number of bytes that each of the snapshot's sparse files uses on disk by looking at the fn_virtualfilestats system table-valued function, which returns the current number of bytes in a file in the BytesOnDisk column. Alternatively, you can look in the new sys.dm_io_virtual_file_stats Dynamic Management Function. This function takes database_id and file_id as parameters.The database ID of the snapshot database and the file ID of each sparse file are displayed in the sys.master_files catalog view.

You can also view sparse file sizes through Windows Explorer. Right-click the filename and select Properties. The Size value in the Properties window shows the maximum size, and the Size on disk value is the same value that you see using the fn_virtualfilestats or sys.dm_io_virtual_file_stats function. The maximum size should be about the same size as the source database was when the snapshot was created; the Size column in the sysfiles view also shows the size. The code in Listing 5 uses sys.dm_io_virtual_file_stats and sys.database_files to show the current and maximum size of the sparse file for AdventureWorks_snapshot.

Hidden Snapshots

In SQL Server 2000, transactions that hold locks can also block DBCC validation commands (e.g., CHECKTABLE, CHECKDB, CHECKALLOC). SQL Server 2000 DBCC commands can't finish running until the locks are released.

In SQL Server 2005, however, all the DBCC validation commands use database snapshot technology to prevent the validation operation from interfering with ongoing database operations. The snapshot technology also lets the validation operation see the data as it was when the DBCC command was initiated, no matter how many changes are made to the data while the operation is under way. The SQL Server Database Engine creates a snapshot of the database when you initiate the CHECK command, and no locks are acquired on the objects being checked. The actual CHECK operation runs against the snapshot.

Unlike typical database snapshots, the "snapshot file" that contains the original page images isn't visible in the file system and its location is unconfigurable—it always uses space on the same volume as the database being checked. You can use internal database snapshots with DBCC validation commands only if your data directory is on an NTFS partition. If you don't use NTFS or don't want to use the space that the snapshot requires, you can prevent snapshot creation by using the WITH TABLOCK option in the DBCC command. In addition, when you use one of the REPAIR options to DBCC, no snapshot is created.

If you don't use the TABLOCK option, the DBCC validation commands don't interfere with other work taking place in a database and so are considered online operations. With TABLOCK, however, a Shared Table lock is acquired for each table as it's processed and blocks concurrent modification operations. Similarly, if modification operations are in progress on one or more tables, a DBCC validation command that uses the TABLOCK option is blocked until the modification transaction is completed.

Because snapshots are usually created very quickly and need minimal disk space, you might be able to create a new snapshot every time you run certain reports. However, there are always exceptions. For example, snapshots of a very heavily updated database might grow quite large. As updates occur on the source database, the original version of the page is written to the snapshot, so lots of updates could not only take lots of space, but could also incur noticeable overhead due to the extra write operation that takes place for every write in the source.

In a heavily updated database, the snapshot-might not be created instantaneously. The first thing SQL Server does when creating a snapshot is to run recovery on the source database, in case transactions are in progress when the snapshot is created. Because a snapshot comprises only committed data, in-progress transactions are effectively rolled back in the snapshot database to their earlier committed state. Not only could this recovery take time on a very busy system, but your snapshot might start out with a substantial amount of used space instead of being almost empty, as it would be if the database weren't so heavily updated.

Mirror, Mirror

An upcoming new technology in SQL Server 2005 that will increase availability is database mirroring. Although the initial release of SQL Server 2005 doesn't support mirroring, the capability should be available in the first service pack, which might be out by the time you read this.

Mirroring keeps a hot standby of a production database ready and available by duplicating all write operations to a mirror location. Every time SQL Server writes to the log of the source database, which is called the principal, it writes the same information to the mirror copy. Mirrors are meant to be used as a standby only if the principal fails and can't be accessed directly, even for reading. However, you can create and read from a snapshot of the mirror, basically turning the mirror into a reporting server. Because the principal server always maintains the mirror, creating a snapshot of the mirror puts no extra load on the principal. An extra load is put on the mirror, but because the mirror isn't being used for any other purpose, the impact might be barely noticeable.

Database snapshots aren't a solution for every read-only database need. You can't refresh the data in the snapshot, and if you want a snapshot to reflect recent changes, you must create a new snapshot. However, because snapshots are usually quick to create, they're often a nice solution when you have an immediate need for data access, can't wait for online activities to stop, and don't want to interfere with online activities.

Comments

Plain text