Skip navigation

Piecing Together Fragmentation

SQL Server 2005 provides a wealth of information about fragmentation

Last month, we looked at the specifications for calling the new sys.dm_db _index_physical_stats() table-valued function (TVF) in SQL Server 2005. This function, which replaces DBCC SHOWCONTIG, returns information about the physical organization of your data structures.This function can return information about any SQL Server 2005 object—including tables, indexes, and indexed views—that needs physical storage. Let's look at the meaning of some of the returned information.

I've Got a Little List

To keep track of the pages that belong to a table or index, SQL Server uses separate lists of pages (organized according to the way SQL Server stores data on the page). In addition, SQL Server 2005 introduces the ability to create a table or index on separate partitions; each partition has its own lists of pages. A list of pages of one type for one partition is called an allocation unit. SQL Server 2005 supports as many as three allocation units for each table or index on each partition.The three allocation units map to three types of pages that SQL Server 2005 manages: IN_ROW_DATA (i.e., regular data or index rows), LOB_DATA (i.e., large object data such as text, ntext, or image data) and ROW_OVERFLOW_DATA (a new type of storage). In a future column, I'll discuss the data types that allow row-overflow data.

Return to Me

In addition to returning a row of information for each allocation unit on each partition for each table or index, the sys.dm_db_index_physical_stats() function also returns a row for each level of each index. Table 1 shows the columns that can help you identify what each row of output refers to.

In SQL Server 2000 and 7.0, DBCC SHOWCONTIG returns information about four types of fragmentation: internal fragmentation and three types of external fragmentation.(The Web sidebar"Going to Pieces" explains the difference between internal and external fragmentation in SQL Server and the relationship between the two.) But the values reported for two of these fragmentation types are meaningless when a table or index spans multiple files. In SQL Server 2005, all reported values are meaningful across multiple files. However, not every type of fragmentation is relevant to every structure.

In SQL Server 2005, the avg_page _space_used_in_percent value reports internal fragmentation.This float value is the average, across all pages, for one particular index level for one type of allocation unit in one partition.This is the only type of fragmentation that SQL Server 2005 reports for the LOB_DATA and ROW_OVERFLOW_DATA pages.

The avg_fragmentation_in_percent value reports external fragmentation.This float value represents the logical fragmentation for indexes or the extent fragmentation for heaps in the IN_ROW_ DATA allocation unit. SQL Server returns a value of 0 for LOB_DATA and ROW_ OVERFLOW_ DATA allocation units.

Logical fragmentation is the percentage of out-of-order pages in the leaf pages of an index. Each page in the leaf level has a pointer in the page header that indicates what should be the next page in logical order; an out-of-order page is one that has a lower page number than the previous page. For example, if an index contains page 86 and that page indicates that the next page in order is page 77, page 77 is out of order. If page 86 indicates that the next page is page 102, page 102 is in order. Logical fragmentation doesn't consider whether pages are contiguous, only whether the logical order of the pages matches the physical order on the disk. Logical order is meaningful only for an index, in which the datatype and collation of the index key column controls the order.

For heaps, the avg_fragmentation_in_ percent value represents extent fragmentation. Extent fragmentation measures the contiguousness of the data belonging to the heap. SQL Server allocates space to a table or index in units called extents, which are eight contiguous pages.The first page number of every extent is a multiple of 8, so the extent starting on page 16 is contiguous to the extent starting on page 8.The first extent in a file starts on page 0. Extent fragmentation counts the gaps between the extents that belong to the same object. If the extents starting on page 8 and page 24 belong toTable x but the extent starting on page 16 belongs to a different table,Table x contains one gap and two extents.The avg_fragmentation_in_percent value, whether it indicates logical or extent fragmentation, should be as close to 0 as possible for maximum performance when SQL Server is scanning a table or index.

You can observe a third type of fragmentation by looking at the fragment count and avg_fragment_size_in_pages values. A fragment consists of physically consecutive leaf pages in the same file for an allocation unit; every index has at least one fragment. The maximum number of fragments that an index can have is equal to the number of pages in the leaf level, assuming that the pages aren't contiguous or in order. If an index has larger fragments, it means that less I/O is needed to read the same amount of pages, because SQL Server can take advantage of Read Ahead. The larger the avg_fragment_size_in_pages value, the better the performance when scanning the data—up to a point.When avg_fragment_ size_in_pages is greater than 8 pages (i.e., 64KB—the size of an extent), scan performance will be reasonably good. However, performance improvement won't likely be measurable after the fragment size reaches more than 32 pages (i.e., 256KB).The fragment count and avg_fragment_size_in_ pages values are reported only for heaps and for IN_ROW_DATA allocation units for leaf levels of indexes. No fragment size information is stored or reported for nonleaf levels of an index or for LOB_DATA or ROW_OVERFLOW_DATA allocation units; the sys.dm_db_index_physical_stats() function will return NULL in those cases.

Last month I told you that you can call the sys.dm_db_index_physical_stats() function with a final parameter of LIMITED, SAMPLED, or DETAILED.The LIMITED mode scans all pages for a heap, but scans only the parent-level pages (i.e., the pages above the leaf level) for an index.Therefore, certain values—in particular, any value that requires SQL Server to examine the contents of the leaf-level pages—can't be computed and returned in LIMITED mode. For example, the function will return NULL for the avg_page_space_used_in_percent value in LIMITED mode. Other such values include record_count, min_record_size_in_ bytes, max_record_size_in_bytes, and avg_ record_size_in_bytes.

You can consider using a final parameter of SAMPLED when the table is very large, because a SAMPLED scan looks at only 1 percent of the pages. SQL Server basically just looks at the first out of every 100 pages to get this sample. However, if the table contains fewer than 10,000 pages, SQL Server considers it too small to worry about and will automatically convert a request for SAMPLED into DETAILED, then examine all the pages. If the table is a heap, a sampled scan won't be able to report any information about fragments or fragment size.You can analyze fragments only when SQL Server knows all the pages that belong to a table. For a table with a clustered index, the upper levels of the index give SQL Server the information it needs to determine the number of fragments and their average size, but for a heap, no additional structures provide this information.

The sys.dm_db_index_physical_stats() function returns two additional pieces of fragmentation information: forwarded_ record_count and ghost_record_count (or version_ghost_record_count). Forwarded records are possible only in a heap and occur when the size of a row with variable length columns increases because of updates and no longer fits in its original location. If the row were simply moved, all the nonclustered index pointers that referenced the row would have to be updated, making the operation expensive. Instead, SQL Server leaves a small pointer in the original location, indicating the location of the row in another page. If a table has lots of forwarded records, scanning the table won't be very efficient.

Ghost records are rows that physically still exist on a page, but logically have been removed.Until SQL Server's background processes clean up these ghost records, no new records can be inserted into the space that they occupy.When you have a lot of ghost records, your table has all the disadvantages and none of the advantages of internal fragmentation. The ghost_record_count value shows the number of rows that have been deleted but not yet physically removed from the pages. SQL Server typically cleans up these rows relatively efficiently.The version_ ghost_record_count counter shows the number of rows that an outstanding snapshot isolation transaction has retained. The background cleanup thread won't clean up these rows until all relevant transactions have been committed or rolled back.

Get Ready for Removal

The new dynamic management object sys.dm_db_index_physical_stats provides a lot of information about all kinds of fragmentation (both internal and external) that might exist in your tables and indexes. If you're performing ordered range scans and you detect fragmentation, you'll want to remove it. Next month, I'll tell you how.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish