Focus on Files and Filegroups

If you're like most SQL Server DBAs today, you often wonder how many data files or filegroups you should have when you create a new user database. If you're not sure about the difference between a file and filegroup and the purpose of each, a data file is an OS file in which SQL Server stores the contents of the database (e.g., data, indexes, metadata) on the physical storage medium, and a file group is a logical container for managing or grouping one or more data files. You can choose which file group you want to associate a database object to, and SQL Server then stores the data or indexes in the associated file or files in that group. (For more information, refer to SQL Server Books Online—BOL—for these topics before proceeding.)

I generally see one of two scenarios regarding the number of files and filegroups for a given database. In the first, most common scenario, users utilize the default settings and create a single file in a single, primary filegroup. In the second, more extreme scenario, users create many, many files in one or more filegroups. Of course, these scenarios represent two ends of a spectrum of possibilities, but these are the most popular methods people use to create new databases. As you'll see, neither is desirable for most situations.

When Defaults Aren't Enough
Because this article series is geared primarily toward beginners, I'll assume that your databases aren't extreme either in size or usage. Most of you can operate effectively with just a single data file per filegroup. Having said that, I want to address the first scenario, in which the user creates databases with a single file in the default filegroup. To minimize recovery times and maximize uptime, you should avoid this scenario when using SQL Server 2000 or later.

Let's say you're doing a piecemeal restore: You must restore the primary filegroup first; so, if you have user data in the primary filegroup, you might hamper future actions or limit your possibilities by making the process longer or more complicated than necessary. At a minimum, you should create a secondary filegroup and make it the default for all user objects at the time of or immediately after you create the database. Because all the system objects get created in the primary filegroup when the database is first created, you'll effectively separate user objects from system objects, which is a sensible thing to do anyway.

So, does that mean if you create a secondary filegroup with a single data file, you're always good to go? The Microsoft article "Storage Top 10 Best Practices" says otherwise: "It is recommended to have .25 to 1 data files (per filegroup) for each CPU on the host server." Most DBAs read that and immediately begin creating lots and lots of files. If you've gone this route, you're certainly not alone. But the header actually states that the advice is for "Allocation Intensive Workloads"—which most SQL Server applications aren't. If you were creating thousands of tables or allocating thousands of new extents each second, the recommendation would apply to you. However, most well written applications don't resort to such behavior. Something that often does, however, is the tempdb database, and it's a good practice to have multiple files under those conditions, as outlined in the Microsoft article "Working with tempdb in SQL Server 2005".

Groups Are Good
Now that I've pointed out some of the fundamentals and misconceptions about files in the database, I want to focus on filegroups. In my opinion, you need to spend considerable time thinking about filegroups before you create the database in the first place. Filegroups give you the ability to separate user objects from one another. Because you can't span a filegroup with any one file, you can never have data from objects assigned to different filegroups residing in the same physical file. What does this buy you? Potentially, a lot.

Suppose you have multiple physical disk arrays available to your SQL Server system. You can place the file from filegroup A on one disk array and the file from filegroup B on another. You've now isolated the physical I/O—at the disk level—of objects in one filegroup from that of objects in another filegroup because an object can only reside in one filegroup. This scenario would be impossible if there were only a single filegroup for all the user objects in the database.

A common practice is to have one filegroup for clustered indexes and another for nonclustered indexes, which is fine—but the possibilities are endless. You need to predetermine your goals for the database and decide how you want to break out your objects. If improved performance is your key goal, and you have more than one disk array, you might want to place the tables that are joined the most onto separate filegroups. That way, you can take advantage of different I/O paths when you read from disk. If your goal is to maximize uptime or minimize backup and restore times, you can place static tables into a read-only filegroup and negate the need to back up the entire database each night.

Differing Needs
This is all food for thought; everyone has different needs. But clearly, most of you can benefit from focusing on how to better utilize filegroups than on how many files you need.

Comments

Plain text