Over the last few months, I've discussed how SQL Server 2005 keeps track of storage space that tables and indexes use and that the new metadata structures are a bit more complex than those in SQL Server 2000, but also offer greater flexibility. In "Managing Data Space," June 2006, I explain that each table (or index) can have data stored in three different forms: regular IN_ROW data, large object (LOB) data, and row-overflow (ROW_OVERFLOW) data, which is new in SQL Server 2005. You might find the row-overflow capability particularly useful if you have tables containing multiple variable-length columns, whose total size is usually within the maximum row size of 8060 bytes but may occasionally exceed that. Let's take a closer look at how SQL Server works with the new row-overflow data.
The new ability to store some of a row's data off the actual data page is the first change in the maximum row length that we've seen since SQL Server 7.0. In SQL Server 7.0, the size of data pages changed from 2K to 8K. Along with the increase in page size, the maximum size of a data row increased correspondingly: In SQL Server 7.0, the maximum row size increased to 8060 bytes from 1962 bytes in SQL Server 6.5. Both of these maximum-size values include several bytes of overhead stored with the row on the physical pages, so that the total size of all the table's defined columns needed to be slightly less than the maximum size. In fact, SQL Server 2005 changed the error message you get if you try to create a table with rows that exceed the maximum, which in some cases (as you'll see shortly) is still limited to the same maximum that it was in SQL Server 2000. For example, in SQL Server 2005, if you execute the CREATETABLE statement in Listing 1, using column definitions that add up to exactly 8060 bytes, you'll get the error message that Figure 1 shows. (Note that the error messages and some code lines in this article wrap to multiple lines because of space constraints.)
In this message, you can see the number of overhead bytes (7) that SQL Server 2005 wants to store with the row itself and realize that for some rows, the previous size limit still applies. SQL Server 2005 stores only variable-length columns in the special row-overflow pages, and only if all the fixed-length columns will fit into the regular IN_ROW data limit.The table in Listing 1 has all fixed-length columns.
Now let's look at a table that has all variable-length columns. Run the code in Listing 2 to create a table with rows that have a maximum defined length of much greater than 8060 bytes. In fact, if you ran this CREATETABLE statement on SQL Server 7.0, you'd get an error and the table wouldn't be created at all. In SQL Server 2000, the table will be created, but you'll get a warning.Although my columns are all varchar, varbinary, nvarchar and sql_variant columns, columns that use the (CLR) user-defined types can potentially also be stored on row-overflow data pages.
As you see when you run the code in Listing 2, you get no such warning in SQL Server 2005. If you then insert a row that has variable-length columns with sizes that add up to more than 8060 bytes, the row in the statement in Listing 3 will be successfully inserted.
Where's My Data?
To determine whether SQL Server is storing any data in row-overflow data pages for a particular table, we can use the DBCC IND command, which I discuss in detail in "Index Internal Information," January 2005.The SQL Server 2005 version of DBCC IND returns three additional columns that reflect the new structures, including partitions and row-overflow data. These three new columns are PartitionNumber, PartitionID, and iam_chain_type. iam_chain_type describes the type of data stored on the page and has one of these values: in-row data, row-overflow data, or LOB data.
The script in Listing 4 creates a new table, similar to one I created in "Index Internal Information," to hold the rows that DBCC IND returns. Note that the characters sp_ at the beginning of an object name in the master database mean that the object can be accessed from any database without qualifying it with the source database name. DBCC IND returns one row for each page belonging to a specified table or index. The following statement runs DBCC IND on the bigrows table and saves the result in sp_table_pages:
INSERT INTO sp_table_pages exec ('dbcc ind ( tempdb, bigrows, -1)' )
To see only relevant columns from the output—PageFID (the page's file ID), PagePID (the page ID), iam_chain_type, and pageType—you can run this statement:
SELECT PageFID, PagePID, iam_chain_type, pageType FROM sp_table_pages
Table 1 shows the output for the previous statement. (The actual page numbers you get will most likely be different.) In the DBCC IND output, both row-overflow pages and LOB pages come back with a pageType value of 3; thus we also need to see the iam_chain_type value to identify whether the page is row-overflow or LOB data. Index Allocation Map (IAM) pages report the same iam_chain_type value as the type of page they're keeping track of, and we need the pageType value to see the 10 that indicates an IAM page.The output in Table 1 shows that there's one page of regular in-row data, with its IAM page, and one page of row-overflow data,with its IAM page.
SQL Server stores variable-length columns on row-overflow pages only under certain conditions. The determining factor is the length of the row itself. It doesn't matter how full the regular page is into which SQL Server is trying to insert the new row. SQL Server will construct the row normally, and only if the row itself needs more than 8060 bytes will some of its columns be stored off the row and on the overflow pages. Each column in the table is either completely on the row or completely off the row.This means that a 4000-byte variable-length column can't have half its bytes on the regular data page and half on a row-overflow page. If a row is less than 8060 bytes and there's no room on the page where SQL Server is trying to insert it, SQL Server will apply its usual page-splitting algorithms.
In some cases, if a large variable-length column shrinks, SQL Server moves it into the regular row but doesn't bother checking whether the updated row will fit on the data page if the column isn't being reduced by more than 1000 bytes in length.You can see that behavior with this next example,assuming you've created the bigrows table and inserted only the one row in the earlier example that has 2100 characters in each column. The UPDATE statement in Listing 5 reduces the size of the first column by 500 bytes, thereby reducing the row size to 7900 bytes, which should all fit on the one data page.
However, if you check the DBCC IND output by using the code in Listing 6, you'll see that a row-overflow?type page still exists. Now run the UPDATE statement in Listing 7 to reduce the size of the first column by more than 1000 bytes, and check the DBCC IND output again.You should see only three rows of output now, since a rowoverflow data page no longer exists. The IAM for the row-overflow data pages hasn't been removed, but you should no longer have a page with pageType 3.
When a column is stored on a rowoverflow page, SQL Server has to store a pointer to the row-overflow page in the regular data row, which includes the offset value of the column data on the rowoverflow page. This pointer can take up to 24 bytes in the original row. For this reason, any variable-length columns of less than 24 bytes will never be stored on row-overflow pages.
My Data Runneth Over
Row-overflow data storage applies only to columns of variable-length data that don't exceed the normal variable-length maximum of 8000 bytes per column.Also,to store a variable-length column on a row-overflow page, the following conditions must be met:
- All the fixed-length columns, including overhead bytes, must add up to no more than 8060 bytes (and the pointer to the row-overflow data adds 24 bytes of overhead to the row).
- The actual length of the variable-length column must be more than 24 bytes.
If you have single columns that might need to store more than 8000 bytes, you need to use either LOB (text,image,or ntext) columns or use the new SQL Server 2005 varchar (MAX) data type. I'll tell you about LOB data and varchar(MAX) next month.