Inside ALTER TABLE

Readers often ask where I get the ideas for my columns. One plentiful wellspring of inspiration is the public newsgroups, where I frequently answer anywhere from 5 to 20 questions a day. If I find myself composing a long, detailed reply to a question, I wonder whether the topic is worth a column. In other cases, if I see similar questions repeatedly within a few days, I take that as a sign that many people want to know about the topic. Recently, I responded to questions on three different forums about what goes on inside SQL Server when it performs an ALTER TABLE operation. I realized I also had questions about which variations of ALTER TABLE require SQL Server to do more work than others do, so I rolled up my sleeves and started testing.

You can use an ALTER command in SQL Server 2000 and 7.0 to modify almost every kind of object that you can create with a CREATE command (e.g., database, procedure, view). For most objects, ALTER is a better choice than using DROP and CREATE to change an object, mainly because using ALTER maintains the object permissions. The new definition for procedures, functions, triggers, and views overwrites the old definition, but the permissions remain because you're not modifying the system table that stores permissions. Using the ALTER command to change a table is different than altering other kinds of objects—you can use ALTER TABLE to change just one table property without changing everything about the table.

SQL Server 2000 and 7.0 let you use the ALTER TABLE command to make many kinds of changes to a table, far more changes than earlier releases did. Before SQL Server 7.0, the only option available for the ALTER TABLE command was adding a new column, and then only if the new column allowed NULLs. Starting with SQL Server 7.0, ALTER TABLE lets you make any of the following changes inside one command:

Adding new columns, which don't allow NULLs if you specify a default
Adding new constraints
Changing a column's data type or NULL property
Dropping columns
Dropping constraints
Enabling or disabling foreign key or check constraints
Enabling or disabling triggers

When you change an existing column's properties, you have to work within certain restrictions, which appear in SQL Server Books Online (BOL). One important thing to remember about modifying a column is that all changes must be compatible with existing data.

How Does It Do That?

Let's look at what SQL Server does internally when performing an ALTER TABLE command. SQL Server can carry out an ALTER TABLE command in any of three ways:

SQL Server might need to change only metadata.
SQL Server might need to examine all the existing data to make sure it's compatible with the change but then change only metadata.
SQL Server might need to physically change every row.

For years, I've wanted to compile a definitive list of which ALTER TABLE options fall into each of these categories. This column is a start at building that list. I ran tests in an attempt to determine which of the three types of alterations was happening.

The first type is by far the easiest to track down; you just need to set STATISTICS IO ON in Query Analyzer before issuing an ALTER TABLE command. If SQL Server is changing only metadata, you get no output back for STATISTICS IO. The ALTER TABLE command updates metadata only when SQL Server makes the following changes to a table: dropping a column or constraint, disabling a constraint, or disabling or enabling a trigger. In addition, SQL Server changes metadata only when you add a new column to a table without specifying a default value (the new column must allow NULLs in that case) or when you modify a column definition to allow NULL values. After setting STATISTICS IO ON and executing the ALTER TABLE commands, you'll see that this batch returns no statistics output.

The second and third possibilities are more difficult to distinguish because both require SQL Server to read all the pages in a table. I originally thought that if STATISTICS IO reported the same number of logical I/O operations as the number of pages in the table, SQL Server was only looking at the data. However, sometimes it's doing more. SQL Server might read each page only once even if it's changing every row. But consider the converse. If STATISTICS IO reports that SQL Server is accessing many more pages than are in the table, you can assume that SQL Server is doing more to the pages than reading all the rows.

So, how can you tell whether the table data (and not just the metadata) has changed? One way is to use fn_dblog() to look at the number of log records, as I did in my June 2003 column, "Inside Recovery Models." If SQL Server is only examining each row without making any changes, the log should contain few new records. However, if you're running tests in a database that other people are using concurrently, their operations will affect the log, and checking the number of new log records might not be useful. An alternative method is to examine the data pages, as I did in my April 2001 column, "The Fill-Factor Truth." That article contains details about using the DBCC IND command, which tells you which pages belong to a table, and using the DBCC PAGE command to see the actual data in the table.

Testing, Testing

You need to find the page that has no value for previous page but has a next page value—that will be the first page of the table. Use that page number in place of XXX in the DBCC PAGE command. Now let's look at five tests I ran to see what SQL Server does when it performs different types of ALTERs.

The first test changes a data type from int to smallint in NewOrders:

ALTER TABLE NewOrders
  ALTER COLUMN EmployeeID
    smallint
GO

And the following command changes a column in NewOrders to not allow NULLs:

ALTER TABLE NewOrders
  ALTER COLUMN EmployeeID
    smallint NOT NULL
GO

In both of these tests, note that SQL Server needs to scan the table because it must ensure, in the first case, that no values are too big for a smallint and, in the second case, that no rows already contain NULLs. If any rows contain unacceptable data, the ALTER TABLE command will fail. Using DBCC PAGE to examine the page shows that the ALTER command made no changes on the page; it just changed the metadata describing the table structure.

Another change that requires SQL Server to examine every row is adding a new constraint. For unique and primary key constraints, SQL Server checks for existing duplicates and, if it finds any, rejects the ALTER. SQL Server must ensure, for check and foreign key constraints, that no existing data violates the constraint. For check and foreign key constraints, you can alternatively include the WITH NOCHECK option, which tells SQL Server not to verify existing data.

The next test adds in NewOrders a new column that has a default value:

ALTER TABLE NewOrders
ADD big_column CHAR(200) NOT NULL 
  DEFAULT 'big_column'
GO

And the following test changes a data type in NewOrders from int to bigint:

ALTER TABLE NewOrders
ALTER COLUMN ShipVia bigint
GO

The last test changes an int data type to char(10):

ALTER TABLE NewOrders
ALTER COLUMN EmployeeID char(10)
GO

For these three changes, SQL Server must change every row because the row will increase in size. As the altered columns grow, the new row might not fit on the original page, and the page will need to split. You'll notice a lot of page accesses in these cases, perhaps several hundred for the 21-page NewOrders table.

When I ran these tests, I noticed another interesting fact about the rows moving as pages need to split. If in a previous test I'd added a fixed-length column that allowed NULLs, SQL Server didn't change the rows at that time to include the space for the new column. But when a later ALTER TABLE command resulted in page splits and row movement, SQL Server adjusted the rows that were moving to new pages, making space for the previously added column. Rows that didn't move didn't reflect the space for the new column until SQL Server received an UPDATE for the row.

Keep in mind that an int column takes only 4 bytes of disk space even though it can store a 10-digit decimal number. So if you change an int column to char(10), the column must grow. However, if you change an int column to char(4) (assuming all existing data values are less than 10,000), the column doesn't need to grow. SQL Server will modify the rows, as you can verify by using DBCC PAGE, but the rows don't need any new space.

I also tested changing a char column to varchar and vice versa. For these tests, the number of pages STATISTICS IO returned was much higher than the number of pages in the table, so I knew that SQL Server was modifying the pages. I also used the syscolumns table to examine what kinds of changes SQL Server made. SQL Server stores fixed-length and variable-length columns in different ways within the row. All fixed-length columns are in the first part of the row, at a known byte offset within the row. You can find this offset in the syscolumns table's xoffset column. For example, you'd use this query to find an offset for the NewOrders table:

SELECT * FROM syscolumns
WHERE id = object_id('NewOrders')

SQL Server stores variable-length columns at the end of the physical row; they don't have the same byte offset in every row in the table. The syscolumns table shows their offset as a negative number, indicating a position counting from the end of the physical row in which SQL Server stores the column. For example, a column that has an offset value of -1 in syscolumns.xoffset is the last column; the column with an offset value of -2 is the next to last, and so on. (Additional metadata in the row reports exactly where each variable-length column starts, but I won't go into that level of detail here.) When changing a char column to varchar or vice versa, I noticed in the syscolumns table that the xoffset value changed from a fixed number to a negative offset position or from a negative offset position to a fixed number.

This list of tests isn't complete, but it's a good starting point for you to create tests of your own. For an exhaustive list of how every ALTER TABLE variation behaves internally, you'd need to run tests both on tables that have clustered indexes and on heaps—particularly those cases where making the requested change to the table requires moving rows. For a table that has a clustered index, SQL Server splits a page if a new row doesn't fit on the original page. But in a heap, SQL Server moves the updated, enlarged row to a new location and leaves a forwarding pointer in the original location.

Other Changes

SQL Server 2000 and 7.0 also provide another command that you can use to make metadata changes to tables: the sp_tableoption stored procedure. I discussed the text in row option to this procedure in my March 2003 column, "Text in Row Internals." I showed that when you enable this option, the changes to the table are just metadata changes, and when you disable the option, SQL Server physically updates the pages. The other options to sp_tableoption (pintable and table lock on bulk load) change only the table's metadata when the sp_tableoption procedure is executed.

Be aware that if you use Enterprise Manager to change tables, SQL Server might end up working much harder than if you had used Query Analyzer to run the T-SQL statement. Sometimes, particularly in SQL Server 7.0, SQL Server changes an existing table by completely rebuilding the table, moving all the data, dropping the original table, and renaming the new one. Although this process produces the same result as if you merely altered the table, SQL Server does a great deal of extra work, especially if your table contains a lot of data.

Comments

Plain text