Choosing the Right Defragmentation Method

Our company requires high availability, so we're concerned about the time we spend on defragmentation. We use the DBCC DBREINDEX command to defragment our SQL Server tables, but our transaction logs seem to fill up more frequently than before we used this method. Does DBCC DBREINDEX write more to the transaction log (i.e., send transactions to the log more often) than the DROP INDEX and CREATE INDEX statements or other reindexing approaches such as CREATE INDEX...DROP_EXISTING? Do the time savings and the optimizations DBCC DBREINDEX provides outweigh the increased number of transactions in the log? And can you recommend a better way to defragment tables and achieve better database performance?

Fragmentation is a natural side effect when you modify existing data. When you insert, update, and delete data, the physical pages that hold the table data and indexed data fill up. SQL Server reallocates new or expanded data to new pages. These new pages and the data on them can be physically distant from logically related data pages; this situation creates I/O inefficiencies when SQL Server scans the data. Fragmentation reduces the overall performance of applications and has the greatest effect on applications that perform frequent logical and physical scans of large data sets (e.g., online reporting systems, data warehouses). Fragmentation has less effect on transaction-based applications—the kind that usually perform random access of specific data rows. Defragmentation removes the physical distance between logically related data pages and improves I/O performance.

DBAs typically use the DBCC SHOWCONTIG command to track fragmentation and use the DBCC DBREINDEX command to eliminate fragmentation across a table and all its indexes. DBCC DBREINDEX is especially effective for defragmentation because it works with indexes created to enforce primary key and unique constraints.

Performance tests that the SQL Server development team has conducted internally show that defragmentation is most effective on large objects—objects that have 1000 or more pages. For objects with fewer than 1000 pages, the disk hardware and SQL Server read-ahead and caching features usually conceal fragmentation. Very small objects—those with fewer than 100 pages—can have some pages mixed in with pages from other objects without performance degradation, and DBCC SHOWCONTIG reports this harmless sharing as fragmentation. To be sure that you're benefiting from your defragmentation efforts, measure the performance of representative queries before and after defragmentation. You might discover that you can reduce the frequency of your defragmentation or even eliminate this data-maintenance task. (For more information about defragmentation, see the Microsoft white paper "Microsoft SQL Server 2000 Index Defragmentation Best Practices" at http://www.microsoft.com/technet/prodtechnol/sql/maintain/optimize/ss2kidbp.asp.)

Microsoft offers several options for defragmenting tables in SQL Server 2000. Table 1 compares the features of three defragmentation methods. DBCC DBREINDEX can increase the number of log entries if you're running SQL Server in Full recovery mode, which logs every index page. When you're indexing large objects, we recommend that you switch to Bulk-Logged recovery during the index operation, then switch back to Full recovery immediately after the operation. This switch causes SQL Server to log only space allocations. DBCC DBREINDEX makes parts of your database unavailable, so you must run it only during planned application maintenance windows.

In SQL Server 2000, Microsoft introduced a new defragmentation option called DBCC INDEXDEFRAG. This command works on fragmented data at the data- and index-page level and has the advantage of being an online operation (meaning you don't have to take your system offline to do this operation). DBCC INDEXDEFRAG lets you trade speed for data availability. Although the DBCC INDEXDEFRAG option is slower than DBCC DBREINDEX, users can access the data as usual while DBCC INDEXDEFRAG is running. You can also back up the transaction log while DBCC INDEXDEFRAG is running. Because the log is truncated during these log backups, you can limit the log-file size.

You can also stop and restart DBCC INDEXDEFRAG without losing any defragmentation work that you've already completed. However, DBCC INDEXDEFRAG doesn't migrate data between data files (although DBREINDEX does), so over time, you might end up with an unbalanced filegroup and have to rebuild the index. The "DBCC SHOWCONTIG" section of SQL Server Books Online (BOL) includes a script that lets you run DBCC INDEXDEFRAG automatically for fragmented indexes.

The third operation, CREATE INDEX...DROP_EXISTING, is similar to the DBCC DBREINDEX option in terms of logging and data availability. When you use the CREATE INDEX...DROP_EXISTING option against a clustered index, be sure the index key is unique; if it's not, you'll have to rebuild secondary nonclustered indexes. This command is more complex than DBCC DBREINDEX because it requires that you individually reference each index and code transaction logic to make multiple index statements atomic. In other words, you have to wrap each set of CREATE statements in a BEGIN transaction and a COMMIT or ROLLBACK transaction so that the code within each set works as a unit. Otherwise, you could end up with no clustered index, several secondary indexes, and terrible performance.

Comments

Plain text