Shrinking Data Files

Downloads
45054.zip

One topic I see discussed frequently in public discussion forums is how to most effectively shrink a database, why it doesn't always shrink as much as you think it should, and how much overhead is involved. Most of the questions deal with the shrinking of the transaction log, probably because you have to understand many details about internal transaction-log usage to get the log file to shrink physically. I see far fewer questions regarding shrinking data files, even though the same command, DBCC SHRINKFILE, can shrink both data files and log files. Perhaps people don't have as many questions about shrinking data files because on the surface, shrinking a data file seems straightforward and the command doesn't have many subtleties. As long as you have free space in your data files, the SHRINKFILE command usually works for shrinking physical data files.

I think people should ask more questions about shrinking data files because even though the SHRINKFILE command appears to work in most cases, SQL Server is doing a lot of work behind the scenes that you might not be aware of. In addition, some ramifications of running this command can degrade performance. First, let's review how the shrink commands work on data files, then I'll look at some performance issues you need to be aware of.

Shrinking Files

Two commands—DBCC SHRINKDATABASE and DBCC SHRINKFILE—can shrink your database. In this article, I primarily discuss DBCC SHRINKFILE, but many internal mechanisms of the two commands are the same. In particular, with both commands, the shrinking of a data file takes place in two phases.

During the first phase—called the compaction phase—SQL Server attempts to move data from the end of the physical file to empty space near the beginning of the physical file. If it's trying to move indexes, SQL Server moves whole pages at a time, including data from the table itself, if the table has a clustered index. For data from a heap table, SQL Server moves individual rows wherever it can find room closer to the beginning of the file. During the second phase, SQL Server releases unused space at the end of the file back to the OS.

The syntax for both shrink commands contains two options that you can use if you want to skip one of the two phases. If you issue the command with the NO_TRUNCATE option, SQL Server skips the second step and performs only the compaction phase. If you specify TRUNCATE_ONLY, SQL Server skips the compaction and only removes any space that already exists at the end of the physical file.

Now let's look at three major trouble spots to watch out for when you're shrinking a data file. The first problem is with the amount of time and system resources required to perform all the data movement. SQL Server uses a lot of CPU resources to determine where space is available in the file to move data to, and it uses a lot of IO resources for reading and writing data into new physical locations. In addition, you should be aware that if a table has no clustered indexes, the nonclustered indexes use an address within the data file to keep track of the data in the table. So, if SQL Server has to move data from a heap table, it must adjust any nonclustered indexes for every row that it moved so that they reference the new physical location of the data. This increases the use of system resources considerably.

The second problem is the growth of the transaction log. SQL Server logs every data-movement operation, every page and extent allocation or deallocation, and every change to every index. Not only does this logging add to the use of system resources as part of the first problem, it creates additional problems if the log grows too large and you haven't accounted for all the space it will need. SQL Server MVP Tibor Karaszi has an excellent article on his Web site that demonstrates the enormous amount of log space required for data-file shrinking operations; you can read it at http://www.karaszi.com/SQLServer/info_dont_shrink.asp.

The third problem you run into when shrinking a data file is that the shrinking will likely introduce fragmentation into your tables and indexes. You should be aware that fragmentation isn't always a terrible thing, and many types of operations are unaffected by fragmented data. Microsoft has produced a detailed whitepaper, "Microsoft SQL Server 2000 Index Defragmentation Best Practices" (http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ss2kidbp .mspx), that provides information about the different types of fragmentation and the circumstances you need to be concerned about if you have fragmented data.

To see an example of data-file shrinking that causes severe fragmentation, you can run Listing 1's sample script against the Northwind database data file. If you want to be sure to get exactly the same behavior I did, start with a clean installation of the Northwind database. You can find a script to rebuild Northwind in your SQL Server 2000 installation directory. On my computer, the directory is c:\Program Files\Microsoft SQL Server\MSSQL\Install. You can load the file, instnwnd.sql, and run it in Query Analyzer to rebuild and repopulate the Northwind database.

You can then run the code in Listing 1 to create a large copy of the Northwind database's Order Details table (called OrderDetails, with no space in the name), containing about 275,000 rows. The script then copies the big table into one called BigDetails, adding an identity column to provide a primary key. Next, the script copies BigDetails into BigDetails2 and builds a clustered index on the identity column. Finally, it drops OrderDetails and BigDetails, leaving a lot of empty space in the data file.

The following command will try to shrink the data file to 10MB:

DBCC SHRINKFILE(Northwind, 10)

After running the SHRINKFILE command, I check the fragmentation of BigDetails2 as follows:

DBCC SHOWCONTIG (BigDetails2) WITH TABLERESULTS

I find a logical scan fragmentation of more than 98 percent, meaning that 98 percent of the pages in the table are out of order with respect to the previous page. If I needed to perform an ordered scan on the BigDetails2 table, SQL Server would have to jump all over the disk to access the pages in the desired sequence.

For this example, I knew which table would likely be fragmented; that table was also the only one in Northwind large enough for fragmentation to be a concern. However, in your own databases, you might not know in advance which tables the shrinking will affect. Listings 2 and 3 give you the DBCC SHOWCONTIG output for all user tables in your current database that have a clustered index. The script in Listing 2 builds a table, showcontig_results, that can hold all the output from DBCC SHOWCONTIG; the script in Listing 3 declares a cursor to access all the tables of interest. The code in Listing 3 then steps through the rows of the cursor, running DBCC SHOWCONTIG for each table the cursor accessed. After populating the showcontig_results table, you can search it as you would any other table. For example, to find the names of all tables that have a logical scan fragmentation value of more than 20 percent, you could execute the following query:

SELECT ObjectName FROM showcontig_results
WHERE LogicalFragmentation > 20

Of course, you can change the percentage value to something bigger or smaller than 20, depending on how much fragmentation your system can handle.

Other Shrinking Techniques

In addition to DBCC SHRINKFILE, you can run the command DBCC SHRINKDATABASE, which attempts to shrink all the files in a specified database, including the log files. Because the mechanisms of data-file and log-file shrinking are so different and I usually want to shrink the data files for different reasons than I want to shrink the log files, I prefer not to shrink everything at once. I use DBCC SHRINKFILE and specify which file I want to shrink. When I use DBCC SHRINKFILE, I feel that I have more control over the size I'm shrinking my files to because the second parameter is the number of megabytes I want to shrink the file to.

With DBCC SHRINKDATABASE, the second parameter is a number that tells SQL Server the percentage of free space you want to leave in the database after all the shrinking. You usually can't tell which input value will produce a given amount of shrinkage, or even what value you need to produce any shrinkage at all. The work SQL Server has to do to shrink each file is the same as with DBCC SHRINKFILE, but you have much less specific control over the final size of the database files.

Another drawback of the DBCC SHRINKDATABASE command is that it can't shrink a file smaller than the file's minimum size. The minimum size of a file is the size you specified when creating the file or the last explicit size you set by using a file size–changing operation such as ALTER DATABASE with the MODIFY FILE option or DBCC SHRINKFILE. The ALTER DATABASE with MODIFY FILE command can only make a minimum file size larger, so you need DBCC SHRINKFILE to reduce the file size if you want it to be smaller than the previous minimum size.

Shrink to Fit

A final mechanism for shrinking is the AUTOSHRINK database option. Enabling this option for a database means that every 30 minutes, SQL Server will attempt to execute DBCC SHRINKDATABASE if there's more than 25 percent free space in a database. When the autoshrink operation is initiated, SQL Server shrinks all data files and the log to the smallest size possible. The AUTOSHRINK option is intended only for use on personal or low-use systems where disk space usage is a critical concern. If system performance is more important, you probably don't want to enable the AUTOSHRINK option. And never enable this option on a production database.

Shrinking a file manually, even without the AUTOSHRINK option, should be a very carefully considered operation, in light of all the negative arguments I have presented. DBCC SHRINKFILE might be preferable to DBCC SHRINKDATABASE, but even DBCC SHRINKFILE should be used only in cases of dire necessity. The preferred option is to spend some time planning your expected database growth, so you can create your database files at an appropriate size. Only when you've severely overestimated your space needs and created a database much larger than you required should you ever need to issue one of the shrink commands to shrink your data files.

—Kalen Delaney

Comments

Plain text