More Fill-Factor Mysteries

In last month's column, "The Fill-Factor Truth," I demonstrated how you can use two undocumented commands—DBCC IND and DBCC PAGE—to answer the question, What is the difference between a fill factor of 0 and fill factor of 100? Using the DBCC IND command, I found the page numbers of the pages within each level of an index; I then used the DBCC PAGE command to examine each page's contents to see how many rows were on the page. For my investigations, I didn't even need to see the data on the page; I only needed to see the page header that contained a value called m_slotcnt, which denotes the number of rows on the page. Using the m_slotcnt value, I established that as of SQL Server 7.0, no difference exists between a fill factor of 0 and a fill factor of 100.

My next test was to run the same script I used to create the table and its clustered index with a fill factor of 100, but this time I added the PAD_INDEX option:

CREATE UNIQUE CLUSTERED INDEX
 indx1 on ff1(col1)
WITH FILLFACTOR = 100, PAD_INDEX
GO

Using DBCC IND and DBCC PAGE, I determined that the PAD_INDEX option caused SQL Server to completely fill all levels of the index instead of leaving an extra row—as it would if I didn't specify this option.

When you use DBCC PAGE to look at the m_slotcnt values on the pages in this new index, you'll see that all the pages except one are full (i.e., contain 20 rows). The table in my example had 1000 rows, so it should have 50 completely full data pages. The DBCC IND output from last month shows that this table did in fact have 50 data pages. (You can determine this fact by counting the number of pages in the output where the PageType value is 1.) However, the first level of index pages contains one index row for each data page, resulting in 50 index rows, so you'll have two full index pages and one page with 10 rows.

You can identify the last page of an index by looking at the m_nextPage and m_prevPage values in the page header. When I ran the above script and created a clustered index with a fill factor of 100 and PAD_INDEX enabled, the non-root pages of the index were 310, 321, and 284. However, page 321 isn't the last page of the index as you might expect it to be. Using DBCC PAGE to look at page 321, I found that the value for m_nextPage in the page header was (1:310), which means file 1 and page 310. The m_prevPage value for page 321 was (0:0), which means that 321 was the first page in its index level. Page 284 had an m_nextPage value of (0:0) and an m_prevPage value of (1:310), so it was the last page in the index level, and its value for m_slotcnt was 10, as I expected.

A New Mystery

Something interesting happened to me just as I was finishing last month's column. A colleague who teaches SQL Server classes called with a question from one of her students. According to the student, an instructor in a previous SQL Server class had told the class that if SQL Server inserts a row that causes a page to split but then rolls back the INSERT transaction, SQL Server usually leaves the page split. (I agreed with the statement.) But the former instructor had gone on to say that if the index into which you're inserting a new row was created with a fill factor of 100, SQL Server undoes the split if it rolls back the transaction. I was positive that the former instructor was incorrect in the second half of her premise, but I decided to prove it. For this month's column, I wanted to use some additional fill-factor examples about page splitting, so this problem was right on time.

First, let me give you some background about page splitting in SQL Server. SQL Server always keeps the keys in an index sorted, and an index's leaf level contains every key value in sorted order. Remember that for a clustered index, the leaf level is the data, so SQL Server sorts all the data by the clustered index key. If a new row is inserted into the table, that row must go in its proper sorted position in the leaf level of every index on the table. The problems I present in the next section are relevant both to the table itself, as the leaf level of a clustered index, and to the leaf level of any nonclustered indexes. But for simplicity, I'll discuss just the clustered index leaf level.

If a page is completely full and an INSERT operation tries to insert a new row on that page, SQL Server must split the page. This split involves finding another free page and linking it into the logical sequence of pages for this index. SQL Server needs to adjust the m_nextPage and m_prevPage values so that the new page now follows the split page, and the page that formerly followed the split page now follows the new page. Instead of using DBCC PAGE to look at the m_nextPage and m_prevPage values, you can refer to the output from DBCC IND, which shows the next page's Page ID in the NextPagePID column and the previous page's Page ID in the PrevPagePID column.

After the new page is part of the list of pages at the leaf level, SQL Server moves approximately half the rows from the full page onto the new page. Now both pages are about half full, and the new row can go onto one of the half-full pages.

Roll It Back

But what happens if, after all this work, the INSERT transaction is canceled and SQL Server must roll back the insert? In SQL Server releases before 7.0, SQL Server would undo the page split, put all the rows back where they came from, and unlink the new page from the list. But if another process immediately needed to insert into that full page, SQL Server would have to split the page all over again. Starting with SQL Server 7.0, after the page split has occurred and SQL Server has moved the rows from the full page to the new page, that activity is committed. Even if SQL Server rolls back the INSERT operation, the page remains split.

Here's what I suspect the other instructor might have been thinking. In general, you use a fill factor of 100 only for a table that is completely—or almost completely—static. Inserting a row might be a rare occurrence, and if SQL Server rolled back the insert after the page split, you'd no longer have perfectly full data pages. In that case, undoing the split might make sense because future inserts would be rare.

To demonstrate an example of splitting completely full pages, let's create a table like the one I created last month, but insert only even numbers as values. Then, insert rows with odd-numbered values between the even-numbered rows. The script that Listing 1 shows creates the table. On the table, the script builds a clustered index that has a fill factor of 100, so all the data pages will be completely full. We don't need many rows, so let's just insert rows containing all the even numbers from 2 to 100.

For this example, you only need to use the DBCC IND command and find out how many data pages are in the leaf level of the clustered index. Running DBCC IND on the table shows you five rows, representing five pages: One is the Index Allocation Map (IAM) page, one is an index page, and the other three are data pages. Listing 2 shows a transaction containing a statement to insert a row with a value of 11 on the first page of the table between the "10" row and the "12" row. Before terminating the transaction, the script executes DBCC IND again; you can see that the table now has four data pages. The script rolls back the transaction, and the table still has four data pages: SQL Server didn't undo the split.

Now run the script from Listing 1 again, but this time take out the comment markers to drop the previous table and set the fill factor to 100 for the clustered index. DBCC IND will initially show you three data pages again. Running the code in Listing 2 will give you identical results: The INSERT causes a page split and a new data page, and that new data page remains even after you roll back the transaction.

Listing 3 shows three more inserts you can use as a final test to see what happens after SQL Server splits the page. First, the script inserts the 11 again, then it inserts a 23. Because SQL Server had already split the first page of the table, it has room to insert both rows, so no additional page splitting occurs. However, when the script tries to insert a row containing 55, that row must go on the original second page of the table, which is still full. Another split occurs, and DBCC IND shows that the table now has five data pages.

One way you can know that a page has split is to run DBCC IND to determine how many pages are in the table after an INSERT or UPDATE operation. But you might want to know which rows moved to the new page and which ones stayed on the original page. You might also wonder where the split occurred. If SQL Server splits a page in half, does that mean half the number of rows or half the number of bytes? The two answers could be different if you have variable-length rows and the page contains both short and long rows. To answer these questions, I'll examine the DBCC PAGE command in more detail next month.

Comments

Plain text