Readers and colleagues frequently ask me how I learned everything I know about SQL Server. Some people assume I learned so many details by spending time with members of the Microsoft development team or from internal Microsoft documents describing the specifications for the SQL Server code. But I've learned substantially more than half of the details I know about SQL Server through more than 13 years of working with SQL Server every day—and by being challenged with student questions, which I always strive to answer on my own before going to other sources.
According to an old maxim, when you give someone a fish, you feed him for a day, but when you teach someone to fish, you feed him for a lifetime. Recently, I went on a fishing expedition to uncover an answer to a question that has popped up several times over the years. When I show you the tools I used to carry out my research and explain the meaning of the data the tools provide, you'll be able to add these tools to your own repertoire. And perhaps you can use them to perform your own research and learn how to figure out answers to the hard questions yourself.
Indexes and Fill Factor
The mystery involves SQL Server's fill factor, but before I tell you the exact question I answered, let's explore the concept of fill factor. First, you need to understand the basics of how indexes are organized. SQL Server organizes indexes as B-trees, with one page root as the starting point for index traversal. The root page might have pointers to two or more pages on the next index level, and each of those pages might have pointers to multiple pages at the next level. The last, or bottom, level of an index is the leaf level, which must maintain all the index key values in a sorted sequence. In SQL Server 2000, this sequence can be either ascending or descending order; earlier releases maintain all indexes in ascending order. In a clustered index, the leaf level is the data itself, so SQL Server stores the data in sorted order. In a nonclustered index, the leaf level contains pointers to the data.
The fill factor is a value you can specify when creating an index to tell SQL Server how full you want the index's leaf-level pages to be. You can specify fill factor in a CREATE INDEX statement or when you use a CREATE TABLE or ALTER TABLE statement to define a primary key or unique constraint on a table, because SQL Server automatically builds an index to enforce primary key and unique constraints. Of course, you can also specify a fill factor when you use Enterprise Manager to create an index or add one of these constraints to a table. If you don't specify a fill-factor value, SQL Server uses its serverwide default fill factor. To discover what the default fill factor is, you can run the command
EXEC sp_configure 'fill factor'
If you haven't changed the default value, you'll get a row of output showing that the current run_value (the value SQL Server is currently running with) is 0, as Figure 1 shows.
A fill factor of 0 means that the index's leaf-level pages will be completely full, but the index's upper levels will leave room for one or two index rows to be added. In fact, you can never explicitly assign a fill factor of 0 when building an index because 0 isn't a valid value for the CREATE INDEX command. The only way to get a fill factor of 0 is to rely on the default, if you haven't changed it. You can change the default fill factor by running the sp_configure command and adding a second parameter to serve as the new default fill factor:
EXEC sp_configure 'fill factor', 80 GO RECONFIGURE GO
You must stop and restart the SQL Server service for the change to take effect. The new fill factor will be in effect when you see it in the run-value column.
If you create an index with a fill factor of 80, the leaf-level pages will be 80 percent full and the upper levels of the index will have enough room left on them to allow for one or two more index rows. Note that SQL Server doesn't maintain the original fill-factor value as new rows are added to a table; the fill factor only controls how full your leaf-level pages should be when the index is initially created. SQL Server stores the initial fill factor in the sysindexes system table in a column called OrigFillFactor. If you're executing the command DBCC DBREINDEX, which rebuilds one or all indexes on a table, specifying a fill factor of 0 tells SQL Server to rebuild the indexes with the original fill factor that the index was built with, not necessarily with the default value that you configured for your server.
The fill-factor value can be important in an insert-intensive environment. An index's leaf level must maintain all the index key values in a sorted sequence, so if someone needs to insert a new row into a table, the index key value in that row determines the row's position in the index (or table, if the index is clustered). For example, if you have an index on last name, inserting a row with a last-name value of Marlin requires SQL Server to insert a new index row in the same page with the other names that start with Ma, possibly between Margolin and Martin. If the page where the new row belongs is completely full, SQL Server must split the page and link a new page into the page chain. SQL Server will move approximately half the rows from the original full page to the new page. Not only is page splitting a resource-intensive operation that can slow down the performance of your insert operations, but because the new page probably isn't physically contiguous to the original page, you introduce fragmentation into the index or table. (For a discussion of the various types of fragmentation and how to manage and remove them, see "Keep SQL Server Up and Running," December 2000.)
Creating an index with a low fill factor means that your table has room to grow before page splitting would become a cause for concern. However, if your pages are only partially filled, more pages are required to hold all your data, and the index can become quite large. Microsoft defined the fill-factor value of 0 as a compromise between having room to grow and not making the table and indexes much larger than necessary. With a fill factor of 0, the leaf level is full, but the pages in the upper levels of the index, which might also need to be split if they become full, have some room for growth.
In SQL Server releases before 6.5, a fill-factor value of 100 means that not just the leaf level of an index but all levels would be completely full. In general, you use this value only for tables that are primarily read-only. In SQL Server 6.5, Microsoft added another option to the CREATE INDEX command. If you also include the option WITH PAD_INDEX, whatever fill-factor value you specify applies to all levels of an index, not just the leaf level. So creating an index with a fill-factor value of 0 and including the PAD_INDEX option is the same as using a fill-factor value of 100. But what would a fill factor of 100 with PAD_INDEX mean? (Note that you can specify a FILLFACTOR value when defining a primary key or unique constraint on a table because SQL Server automatically builds an index to support those constraints. However, you can't specify the PAD_INDEX option along with a constraint definition; you can use it only in the CREATE INDEX statement.)
Starting with SQL Server 7.0, I began frequently seeing the question: What is the real difference between a fill factor of 0 and a fill factor of 100? Although SQL Server 2000 and 7.0 Books Online (BOL) specify that a fill factor of 100 means that all levels of the index are full, many people apparently don't think this is true. I kept meaning to figure it out for myself "one of these days," but that day took several years in arriving. Finally, on a recent cross-country flight, I ran some tests and found the answer to this question.
My testing consisted primarily of using two undocumented DBCC commands: DBCC IND and DBCC PAGE. These commands aren't listed in BOL, and Microsoft doesn't officially support them, although Microsoft articles occasionally mention DBCC PAGE. To use these undocumented commands, you must first enable trace flag 3604 to return all output from undocumented DBCC commands to the client screen. DBCC IND, which takes three arguments, tells you which pages belong to an index. The first two arguments are the database name and the table name; the third argument controls how much detail you want in the output. A value of -1 returns information about all levels of all indexes on a table. Listing 1 creates a table with 1000 rows in the Pubs database and generates sequentially increasing integer values from 1 to 1000. I wanted numeric values because they're easy to generate. However, I wanted to store them as character strings because the DBCC PAGE command displays character strings more clearly than numeric values, which would help if I used the DBCC PAGE command to see the pages' full contents.
After generating the 1000 rows, the script uses the default fill factor of 0 to create a unique clustered index, then uses DBCC IND to generate a list of all the pages that belong to the table and its indexes. DBCC IND returns 12 columns of output, but I was interested in only four of the columns for my testing:
- PagePID is the page ID; I used this value with the second DBCC command.
- IndexID is 0 for the data pages, 1 for pages in the upper levels of the clustered index, and greater than 1 for nonclustered indexes.
- PageType is 1 for data pages, 2 for index pages, and 10 for Index Allocation Map (IAM) pages.
- IndexLevel is the index level at which the page occurs. If IndexID is 0, IndexLevel is also 0. The highest IndexLevel value of any IndexID is for the root page; only one page will have that IndexLevel value.
The DBCC IND command in the script in Listing 1 returns page numbers for four pages with PageType 2; of those, three have IndexLevel 0 and one has IndexLevel 1. In addition, if you're working with a database that has more than one data file, you'll also be interested in the PageFID column, which identifies the file containing the page.
Even though the leaf level of a clustered index is the data, the output of the DBCC IND command doesn't treat the data pages as index pages. However, these data pages are the ones that the fill factor applies to because they're the leaf level of the clustered index. You can identify these data pages by their PageType of 1, their IndexID of 0, and their IndexLevel of 0.
On the Trail
The next step of my investigation was to choose one of the index's leaf-level pages and supply it as an argument to the DBCC PAGE command. In SQL Server 2000 and 7.0, this command takes four arguments. The first is the database name or database ID, the second is the file number, and the third is the page number, taken from the output of DBCC IND. The fourth argument is the desired format for the output. For example, a value of 0 means you want to see just the page header. Here's the command I ran:
DBCC PAGE (pubs, 1, 288, 0)
The value on the page header that I was interested in is called m_slotCnt; it tells the number of rows on that page. When I ran the script in Listing 1 and then used DBCC PAGE to look at one of the data pages, I saw that the number of rows on the page was 20. When I ran a similar DBCC PAGE on one of the index pages at the lower level (i.e., not the root), I saw that the number of rows was 19. This was a result I expected; the data pages were full. With rows containing 390 bytes of data, plus about 10 bytes of overhead, only 20 of them can fit on a page, and my data pages have that maximum number. The index rows are about the same size because the index is on the char(390) column, but instead of having 20 index rows on a page, I have only 19 because SQL Server left room for one additional index row.
Using these two tools, DBCC IND and DBCC PAGE, I was able to answer the question: What really is the difference between a fill factor of 0 and a fill factor of 100? Running the same script, but explicitly supplying a FILLFACTOR of 100, I found the number of rows in the data pages and the index pages to be the same as for a clustered index with a FILLFACTOR of 0. I believe I've given you enough information about how to use these tools so that you can confirm this answer for yourself and possibly do more thorough testing. You could also investigate what happens when you specify the PAD_INDEX option when you have a FILLFACTOR of 100. If you don't have the time right now and aren't planning any cross-country flights with your laptop, you can wait until next month to find out what happens with a FILLFACTOR of 100 and the PAD_INDEX option specified. I'll finish showing you the research I did and the results I came up with, and I'll also show you a couple of other questions I was able to answer by using the DBCC PAGE command.