Accessing File and Filegroup Metadata

In "Don't Abandon System Tables," March 2002, I described a problem involving database metadata reporting that you can't solve by using information schema views or other Microsoft-supplied procedures or functions: You can extract the definition of a computed column in SQL Server 2000 only by directly accessing the system tables. A possible reason that no information schema view solution exists is that the construct of computed columns isn't part of the ANSI SQL standard, and information schema views are an ANSI-defined construct. Another construct in SQL Server that isn't part of the ANSI standard is the physical organization of a database into files and filegroups. SQL Server lets you group data files into filegroups and specify the filegroup that the table's data, large objects, and indexes will be located in. None of the information schema views provides an administrator with information about this database structure. Some system stored procedures such as sp_helpfile and sp_helpfilegroup provide such structural information. And if you run sp_help and provide the name of a table as an argument, the procedure will return the filegroup that contains the table (in addition to other information about the table). However, if you want information about files and filegroups besides the results that these system stored procedures supply, you have to directly access the system tables. Let's examine how SQL Server uses files and filegroups, then address some frequently asked questions about how to access file and filegroup metadata.

Database Files

A database file is simply an OS file. A database spans two or more database files, and the DBA specifies the names and locations of these files when creating or altering a database. Every database must span at least two files: one for the data, indexes, and allocation pages and one for the transaction log. SQL Server 2000 and 7.0 support the following three types of database files:

Primary data file—Every database has one primary data file that, in addition to storing data, keeps track of the rest of the files in the database. By Microsoft's convention, the name of a primary data file has the extension .mdf.
Secondary data files—A database can have zero or more secondary data files. You might want to add secondary data files on separate physical disks to exploit parallel I/O or when the disk that houses your primary data file is full. By convention, the name of a secondary data file has the extension .ndf.
Log files—Every database has at least one log file that contains the information necessary to recover all transactions in a database. By convention, a log file has the extension .ldf.

You can group one or more of a database's data files into filegroups for allocation of data or index space and administration purposes. In some cases, you can improve performance by controlling the placement of data and indexes into specific filegroups on specific disk drives. For example, when you create a table or index, you can specify in which filegroup SQL Server should allocate space for that object. You can specify only a filegroup, not an individual file. If you want to place an object in a particular file, you can create a filegroup that contains only that file. If you place an object in a filegroup that contains multiple files, SQL Server will allocate the file space proportionally. So if one file has twice as much free space as another, SQL Server will allocate two extents in the larger file for every one extent it allocates in the smaller file.

The filegroup that contains the primary data file is called the primary filegroup. In addition to the primary filegroup, a database can have one or more user-defined filegroups. Don't confuse the primary filegroup and the primary file. The primary file is always the first file listed when you create a database, and it has the file extension .mdf by convention. The one special feature of the primary file is that the system table sysfiles1, which contains information about all the other files in the database, is constrained to be located wholly in the primary file. In addition to the primary file, the primary filegroup contains files for which you don't explicitly specify another filegroup when you create the database. All pages for system tables are always allocated from files in the primary filegroup.

The Default Filegroup

One filegroup always has the property of DEFAULT. Note that DEFAULT is the filegroup's property, not its name. Only one filegroup in each database can be the default filegroup. Unless altered, the primary filegroup is also the default filegroup. When you create the database or add a file to it, the default filegroup contains the pages for all tables and indexes for which you don't explicitly specify a filegroup. If you don't assign the DEFAULT property to a filegroup other than the primary filegroup, all tables that have no filegroup specified will go in the primary filegroup along with all the system tables. This default setup creates a problem only if placing a table in the default filegroup doesn't suit your needs. You should determine where to place tables and indexes according to your maintenance and performance needs.

You can create user-defined filegroups by using the FILEGROUP keyword in the CREATE DATABASE or ALTER DATABASE statement. You can also use the CREATE DATABASE and ALTER DATABASE commands to add files to a filegroup. A database owner (DBO) can change the default filegroup by using the ALTER DATABASE statement. SQL Server Books Online (BOL) contains the syntax for the CREATE DATABASE and ALTER DATABASE commands. BOL also has some detailed examples that show how to create and alter databases that contain multiple files and filegroups. Although file and filegroup manipulation is too complex to cover completely in this article, you should be aware of a few syntax details that you might encounter when you use the ALTER DATABASE command.

Typically, the ALTER DATABASE command lets you make one type of change to a database. For multiple types of changes, you have to run ALTER DATABASE multiple times. For example, you can run one ALTER DATABASE command to add a data file or multiple data files, but to add a log file, you must run another ALTER DATABASE command. You can run ALTER DATABASE to add a filegroup, but to add any files to that filegroup, you must run a separate ALTER DATABASE command.

You can change a filegroup property only by using the ALTER DATABASE command. You can't specify one of the filegroup properties (i.e., READONLY, READWRITE, or DEFAULT) when you first create a database. However, if you try to use ALTER DATABASE to change a property of a filegroup to which you haven't yet added any files, as the code in Listing 1 shows, you'll get the following error:

Server: Msg 5050, Level 16, State 1, Line 1
Cannot change the properties of empty filegroup
'user_fg1'. The filegroup must contain at least one file.

Although SQL Server creates the database DB1 and the filegroup user_fg1, the error occurs when you try to change user_fg1 to the default filegroup. (For information about why an administrator might want to create a database on multiple files or filegroups, see the SQL Server Magazine article "The Road to Recovery," September 2001, and Michael D. Reilly's Certifiably SQL articles "Backup Strategies," August 21, 2000, and "Restoring Databases," September 24, 2000.)

Retrieving Metadata

As the DBA, you often need to investigate database structure, including file and filegroup layout and the placement of tables and indexes in filegroups. You might need to document your system, re-examine your database to make sure it's configured optimally, re-evaluate your database maintenance procedures, determine where to place a new table or index, and so on. To accomplish these tasks, you need to examine the system catalog for file and filegroup information. The following system stored procedures give you information about files and filegroups:

Sp_help—When you specify a table name, sp_help tells you which filegroup the object is in. In the section of the output that lists the table's indexes (which is the same output that sp_helpindex returns), the description field contains the name of the filegroup that the index is in.
Sp_helpdb—When you specify a database's name, sp_helpdb lists all the files belonging to that database and the filegroup that contains each file.
Sp_helpfile—When you don't specify a filename, sp_helpfile returns the section of the sp_helpdb results that lists all the files, their filegroups, and their properties. If you do specify a particular file, you get only one row of output that includes the specified file's properties and the filegroup that the file belongs to.
Sp_helpfilegroup—When you don't specify a filegroup, sp_helpfilegroup returns a list of all the filegroups and the number of files in each one. If you do specify a filegroup, the output includes all the filegroup's files.

But these system stored procedures don't give you all the information you might want about filegroups. As I mentioned earlier, some information isn't available unless you access the system tables directly. Let's investigate some questions that you can't answer without using the system tables.

How can I get a list of all the objects in a database that includes the filegroup that each object is in? How can I get a list all the indexes on a table that includes the filegroup that each index is in? The system table sysindexes has a column called groupid that's a smallint value referencing a row in the sysfilegroups table. You can join sysindexes with sysfilegroups to get the filegroup name for each object (table, index, or large object). The sysindexes table has a row for each table and each index. The column indid in sysindexes shows whether the row represents a table or an index. If indid's value is 0, the row is a table stored as a heap (i.e., a table without a clustered index). If indid's value is 1, the row is for a table with a clustered index. And if indid has a value from 2 through 250, the row represents a nonclustered index. Sysindexes also includes a column called id. If indid is 0 or 1, values in the id column are the object IDs for the tables; if indid is greater than 1, values in the id column are the object IDs for the index or large objects.

Listing 2 shows the code you can use to get a list of all the tables in a database and their filegroups. If you want to include the system tables, you can remove the condition that requires the IsMSShippedProperty value to be 0. If you want to include all the indexes for each table, you can add another column to the SELECT statement to get the name from sysindexes and remove the condition requiring that indid be either 0 or 1. Although sp_help and sp_helpindex give you the filegroup that each index is in, the filegroup's name is part of a larger description string. If you want only the name of the filegroup (e.g., for aggregation or other summary purposes), you need to parse that description string.

How can I list all the objects in a particular filegroup? Again, to produce this type of list, you can access the sysindexes table. Assuming that the filegroup name is in a variable, you can use the code that Listing 3 shows to list all objects in that filegroup. If you want to turn the code in Listing 3 into a stored procedure, you can easily convert the variable that contains the filegroup names into a parameter.

Besides the questions I've addressed in this article, you might ask many other file and filegroup questions that don't have obvious answers. However, not all the file and filegroup problems you encounter require that you directly access system tables. In my next column, I'll show you some T-SQL code that you can use to obtain other information about your files and filegroups.

Comments

Plain text