Diving In to Object-Storage Metadata

Downloads
50257.zip

As you learned last month in "Managing Data Space," InstantDoc ID 50009, SQL Server 2005 manages the storage space that database objects (i.e., tables, indexes, and large objects—LOBs) use differently than SQL Server 2000 does. The main difference is that SQL Server 2005 provides no single table or view that contains information about space-consuming objects that's analogous to the sysindexes system table in SQL Server 2000. Instead, SQL Server 2005 provides several new views that contain information formerly available through sysindexes. Let's take a closer look at these views and some examples of queries that you can use to examine the storage-related metadata the views contain.

A New View on Indexes

SQL Server 2005 replaces the sysindexes table with the sys.sysindexes compatibility view and the sys.indexes catalog view. As I explain in "Seek and You Shall Find," September 2005, InstantDoc ID 46929, SQL Server 2005 provides backward-compatibility views of SQL Server 2000 system tables that you can use to ensure that your old applications work when they're run against SQL Server 2005.The sysindexes view returns all the same columns as the old sysindexes table; however, many of the columns aren't even populated.

You can see the definition of the SQL Server 2005 sysindexes view by running the following statement:

SELECT object_definition (object_id('sys.sysindexes'))

(Some code lines in the article and listings wrap to multiple lines because of space limitations.) In the statement's results, notice that many of the columns are set to 0 or null, including all the reserved columns, plus pgmodctr, keys, and statblob.

You can also see an undocumented property value called keycnt80 that's used with the INDEXPROPERTY function. The sysindexes table in SQL Server 2000 had a property value called keycnt, but none of the SQL Server 2005 views that replace sysindexes have such a column. Although SQL Server 2005 Books Online (BOL) doesn't show keycnt80 as a possible value for INDEXPROPERTY, the definition of the sys.sysindexes compatibility view uses keycnt80 with INDEXPROPERTY to populate the keycnt column in the view. SQL Server 2005 provides key information in the sys.index_columns view but has no one column that keeps track of the number of keys, so you might find this undocumented INDEXPROPERTY parameter useful.

The new sys.indexes catalog view essentially shows only property information about each index. The view columns indicate whether the index is clustered or nonclustered, whether it's unique or nonunique, whether the index supports a primary key or unique constraint, what kinds of locks are allowed, and other similar information. Search sys.indexes in BOL to see the complete list of columns in the view. Note that sys.indexes contains no information about the index's structure or size or the location of the index pages.

Views on Partitions

As I discuss in "Managing Data Space," the sys.partitions view has one row for each partition of each table and each index in the database, along with the number of rows in that partition. All tables and indexes have at least one partition even if they aren't explicitly partitioned. To help familiarize you with the information in this view, I'll create a partitioned table, then compare the metadata for partitioned and nonpartitioned tables.

The code in Listing 1 creates two tables, each of which has three columns and 1000 rows of random data. The partition function PF1 defines five partitions based on the four range endpoints 1, 2, 3, and 4. The partition schema PS1 places all the partitions on the primary file group. (Be aware that putting all the partitions on the same file group isn't something you'd typically do to take full advantage of partitioning, but this partitioning structure is much easier to set up and still suits our purposes in examining the metadata in sys.partitions.) Test1 is created as a partitioned table, applying the partition scheme PS1 to the values in column a. The table test2 is a copy of the data in test1 but is nonpartitioned.

The following query examines the rows in sys.partitions for these two tables and returns the data that Table 1 shows. Be aware your data will be slightly different, due to the random data being generated.

SELECT object_name(object_id)
  AS name, partition_id, 
  partition_number AS pnum, 
  hobt_id, rows 
FROM sys.partitions 
WHERE object_name(object_id)
  LIKE 'test%'

As you can see, the query returns five rows for test1, which was created on five partitions, and one row for test2. Notice that the partition_id value is the same as the hobt_id value for all rows; you should see this relationship for all your objects. Of course, the actual values for hobt_id and object_id that you'll get will probably be different, and your number of rows in each partition can be different since they're generated randomly.

The reason that SQL Server 2005 has both the partition_id and hobt_id columns is to allow for future expansion. You can think of a partition as being a rowset, or a set of rows, that's stored in one location and a heap or B-tree (HOBT) as being a part of table or index stored in one location. In SQL Server 2005, there's a one-to-one relationship between rowsets and HOBTs, but future versions might let you have multiple HOBTs for the same rowset.

As I explained in "Managing Data Space," you can join sys.partitions with the sys.allocation_units view to determine the amount of space used for each rowset. Listing 2 shows the join of sys.partitions and sys.allocation_units; it turns out to be a one-to-one join because each row in sys. partitions has only one row in sys.allocation_units for the in-row data in the test1 and test2 tables. Table 2 shows the output for the join; again, the actual values you receive might be slightly different.

Also notice that Listing 2 returns four columns from sys.allocation_units: allocation_unit_id (aliased to au_id), a column called type (for which the value 1 indicates in-row data), container_id, and the number of pages in that allocation_unit. Note also the container_id matches the hobt_id from sys.partitions. Each row in sys.allocation_units has a unique allocation_unit_id.

To make the join between sys.partitions and sys.allocation_units a one-to-many join, you'll need to alter the test tables to allow them to contain additional kinds of data. The code in Listing 3 alters test1 to add a large varchar column and alters test2 to add a text column. The large varchar column will make the row size exceed the maximum of 8060 bytes, so it needs to be stored as out-of-row data, which is indicated in sys.allocation_units as type 3.The text data is also stored on its own pages and is indicated in sys.allocation_units as type 2. After running the code in Listing 3, run the code in Listing 2 again to see that the sys. partitions rows for the test1 table now join with 10 rows in sys.allocation_units. Each of the five partitions in sys.partitions joins with two allocation units, one for the in-row data and one for the off-row data. The one row for the single partition in test2 joins with two rows in sys.allocation_units, one for the in-row data and one for the text data.

Where's the Data Stored?

You've seen examples of queries that retrieve metadata about tables; the next step is to determine where in your database files the data for those tables is stored. SQL Server 2000's sysindexes table contains three columns that indicate where data is located: call first, root, and firstIAM. (For more information about these undocumented columns and how to interpret them, see "Index Internal Information," January 2005, InstantDoc ID 44572.) In SQL Server 2005, not only are these columns undocumented, the catalog view that contains them is also undocumented.

If you really want to see these columns containing page locations in a hexadecimal format, you can join sys.partitions with sys.system_internals_allocation_units instead of with sys.allocation_units. This view has the same columns as sys.allocation_units, plus three more: first_page, root_page, and first_iam_page. I'll leave it to you to write the query and examine the values on your own, if you're interested. I will point out that, just as in SQL Server 2000, it's unnecessary to retrieve page-location information in hex format because we can use the DBCC IND command to get page numbers in decimal. (I discuss DBCC IND in "Index Internal Information" and other Inside SQL Server articles.) I'll revisit this command in upcoming articles dealing with accessing physical-page information.

New and Better Metadata

As you start exploring SQL Server 2005's metadata, you might feel that accessing needed information is much more awkward than doing so in SQL Server 2000. I suspect that most of that awkwardness is merely a learning curve as you figure out new ways to perform old tasks. The fact that the data is in multiple views instead of just one system table gives SQL Server much greater flexibility and room for expansion. And the fact that almost all the metadata is available in views and property functions, instead of nontabular DBCC commands, means that you can use SQL queries to combine and filter the data any way you choose to create the queries that will be most useful for you.

Comments

Plain text