A Practical Approach to Managing Database Size

The most important element of a database is its data, which is stored in data files. The size of these data files affects database availability, performance, and support. Effectively managing their size involves:

Keeping size in mind when creating the database.
Creating additional file groups for nonsystem objects.
Proactively managing database growth.

To demonstrate this practical approach to managing the data file size, I'll walk you through an example. You can follow along by downloading the 129260.zip file, which contains the code and the other files I discuss. Go to the top of this page and click the Download the Code button.

Creating the Database

Creating databases in SQL Server is easy. In its simplest form, you can issue the CREATE DATABASE statement, with the database name as the only parameter. SQL Server will create the database using the default settings for various other parameters. Although the simplicity of this approach is appealing, it's best avoided in production systems.

One parameter that you need to define is SIZE. How large should you make the database? SQL Server Books Online (BOL) makes this recommendation: "When you create a database, make the data files as large as possible, based on the maximum amount of data you expect in the database". In other words, the initial file size should match the expected size of the database. To estimate the expected database size, you need to estimate the size of the tables and their indexes. The "Estimating the Size of a Database" web page can help with these estimations.

In most production systems, database size is an upward moving target, so you also need to factor in expected growth. Again, BOL offers some guidance: "Permit the data files to grow automatically, but put a limit on the growth by specifying a maximum data file growth size that leaves some available space on the hard disk". To do that, you need to define the FILEGROWTH and MAXSIZE parameters. By default, the database will automatically grow in 1MB increments in SQL Server 2005 and later. (In earlier versions, the default is 10 percent.) However, you should determine whether this growth is acceptable for your database and, if necessary, customize the autogrow setting by using the FILEGROWTH parameter. In "Considerations for the 'Autogrow' and 'Autoshrink' Settings in SQL Server", Microsoft notes that the general rule is to set it to one-eighth the size of the file for testing purposes. The FILEGROWTH parameter's value can be expressed in bytes or as a percentage.

The MAXSIZE parameter's value should be smaller than the size of the disk hosting the file. By "disk" I mean whatever storage you use. These days this is often a SAN mount point.

The code in Listing 1 includes a CREATE DATABASE statement that defines the SIZE, FILEGROWTH, and MAXSIZE parameters. As callout A shows, the TestDB database is initialized to 10MB and will grow in 15 percent increments until it reaches 500MB.

--BEGIN CALLOUT A
CREATE DATABASE TestDB
ON PRIMARY
  (NAME = TestDB_dat,
    FILENAME = 'C:\TestDB_PRIMARY_dat.mdf',
    SIZE = 10MB,
    MAXSIZE = 500MB,
    FILEGROWTH = 15%),
--END CALLOUT A

--BEGIN CALLOUT B
FILEGROUP DATA1 DEFAULT
  (NAME = TestDB_Data1_dat_01,
    FILENAME = 'C:\TestDB_DATA1_dat_01.ndf',
    SIZE = 10MB,
    MAXSIZE = 500MB,
    FILEGROWTH = 15%),
  (NAME = TestDB_Data1_dat_02,
    FILENAME = 'C:\TestDB_DATA1_dat_02.ndf',
    SIZE = 10MB,
    MAXSIZE = 500MB,
    FILEGROWTH = 15%),
FILEGROUP DATA2
  (NAME = TestDB_Data2_dat_01,
    FILENAME = 'C:\TestDB_DATA2_dat_01.ndf',
    SIZE = 512KB,
    MAXSIZE = 520KB,
    FILEGROWTH = 10KB),
  (NAME = TestDB_Data2_dat_02,
    FILENAME = 'C:\TestDB_DATA2_dat_02.ndf',
    SIZE = 512KB,
    MAXSIZE = 520KB,
    FILEGROWTH = 1KB)
--END CALLOUT B

LOG ON
  (NAME = TestDB_log,
    FILENAME = 'C:\TestDB_log.ldf',
    SIZE = 5MB,
    MAXSIZE = 25MB,
    FILEGROWTH = 5MB);

Note that when a database reaches full capacity, any attempt to insert or update records will result in an error, so MAXSIZE isn't something that you should ever allow to be reached. Even before the file is full, you should avoid relying solely on autogrow. The main reason is that a transaction requiring the file to be expanded has to wait for this to happen before it's committed. Transactions that usually execute quickly could suddenly take a long time—as long as it takes for the I/O-intensive file growth operation to complete. This can have negative implications on performance and could even result in deadlocks and timeouts. Indeed, in "Considerations for the 'Autogrow' and 'Autoshrink' Settings in SQL Server," Microsoft states, "For a managed production system, you must consider autogrow to be merely a contingency for unexpected growth. Do not manage your data and log growth on a day-to-day basis with autogrow."

If autogrow is bad, autoshrink is even worse, especially since it's often unnecessary. After all, a database will most likely have to grow again after being shrunk. As a general rule, the autoshrink option should be turned off in all production systems and file shrinking should be done only when absolutely necessary. This should always be done during a maintenance window and index defragmentation should be performed afterward.

Adding File Groups

When discussing SQL Server databases, you can't avoid the topic of file groups. File groups are used to group files together. As a minimum, every database has one primary file group consisting of a single primary file. Microsoft recommends creating additional user-defined file groups to store user tables and indexes while leaving the primary file group for system objects only . This approach offers several advantages:

You can move files containing nonsystem objects as an online operation. For an example, see "Move Database Files Without Taking the Database Offline."
You can improve database performance by placing objects in different file groups. For example, you can isolate an I/O-intensive table by moving it to its own file group on a dedicated disk.
You can improve database performance by spreading a file group across multiple files residing on different disks. Because SQL Server uses a proportional fill strategy for multiple files, this allows for a greater degree of I/O parallelism. For example, you can create eight files and place them on different disks. As data is read and written, eight physical reads/writes will be performed in parallel, improving performance. (You shouldn't expect an eightfold improvement, though, as SQL Server uses caching to minimize physical reads.)

Callout B in Listing 1 shows a sample implementation of this approach. This code creates two additional user-defined file groups (Data1 and Data2), each of which has two files. Data1 is made the default file group to prevent accidental placement of objects in the primary file group, which is automatically created. (Note that I intentionally initialized Data2 to a very small size for a reason that will become apparent shortly.)

After the database is created, you can place tables and indexes in the desired file groups. From a storage perspective, a table and its clustered index are one in the same, as the clustered index leaf node contains the actual data pages of the underlying table. As a result, a table and its clustered index always reside in the same file group. If you try to specify a different file group for a clustered index, the table will be moved, too. This is very useful because SQL Server doesn't provide a Data Definition Language (DDL) statement for moving tables. Instead, you need to recreate a clustered index on a different file group to achieve this. If the table doesn't have a clustered index, you can move the table by creating a new table in the desired location and migrating the data. Alternatively, you can create a clustered index for the table just for the purpose of moving it, then drop the index after the move.

The code in Listing 2 creates two tables (one in each user-defined file group), then populates them with some sample data. Note that although you can control which file group the table is created on, the SQL Server engine uses a proportional fill algorithm to fill the files, so you have no control over that process.

USE \\[TestDB\\]
GO
CREATE TABLE \\[dbo\\].\\[tblData1\\](
  \\[Field1\\] \\[int\\] NOT NULL,
  \\[Field2\\] \\[varchar\\](100) NULL,
  PRIMARY KEY CLUSTERED
  (
    \\[Field1\\] ASC
  ) ON \\[DATA1\\]
) ON \\[DATA1\\];
CREATE NONCLUSTERED INDEX IX_tblData1_Field2
  ON \\[dbo\\].\\[tblData1\\](\\[Field2\\]);
CREATE TABLE \\[dbo\\].\\[tblData2\\](
  \\[Field1\\] \\[int\\] NOT NULL,
  \\[Field2\\] \\[varchar\\](100) NULL,
  PRIMARY KEY CLUSTERED
  (
    \\[Field1\\] ASC
  ) ON \\[DATA2\\]
) ON \\[DATA2\\];
CREATE NONCLUSTERED INDEX IX_tblData2_Field2
  ON \\[dbo\\].\\[tblData2\\](\\[Field2\\]);
GO
DECLARE @counter int;
SET @counter = 1
WHILE (@counter < 10000)
BEGIN
  INSERT INTO dbo.tblData1 (Field1, Field2)
    VALUES (@counter, 'Some text ' + CAST(@counter as varchar(20)))
  INSERT INTO dbo.tblData2 (Field1, Field2)
    VALUES (@counter, 'Some text ' + CAST(@counter as varchar(20)))
  SET @counter = @counter + 1;
END

Proactively Managing Growth

Now that you have a database that has multiple file groups that contain multiple files, you need some tools and processes to proactively manage its growth. SQL Server has some very good standard reports you can use to monitor and manage databases. The Disk Usage report is particularly helpful for monitoring and managing database size. It displays total space usage, autogrow and autoshrink events, as well as disk space used by data files, as Figure 1 shows. To run this report, you just need to right-click the database in SQL Server Management Studio (SSMS) and select Reports, Standard Reports, Disk Usage.

Figure 1: Disk Usage report

Besides the Disk Usage report, I use two custom reports—the File Group Placement report (FileGroupPlacement.rdl) and the File Group Space report (FileGroupSpace.rdl)—that I created in SQL Server Reporting Services (SSRS). The File Group Placement report shows table and index placement. As Figure 2 shows, you can see how many objects exist on each file group and all the tables and indexes under their subtype grouping. This report uses SQL Server system views to retrieve the relevant information. (In case you want to see what this code looks like, it's in the File_Group_Placement_Report_Code.sql file in 129260.zip.)

Figure 2: File Group Placement report

To run the File Group Placement report, open the 129260.zip file and copy the FileGroupPlacement.rdl file into a folder. In SSMS, right-click the database, choose Reports, select Custom Reports, then browse to FileGroupPlacement.rdl. Select that file, and click Open. In the Run Custom Report dialog box, click Run. Note that the next time you want to run the report, it will be listed in the main Reports menu. Repeat this process for the FileGroupSpace.rdl file.

The File Group Space report shows space usage for every file group in the database. The report uses the DBCC SHOWFILESTATS command to retrieve a variety of information. (In case you want to see what this code looks like, it's in the File_Group_Space_Report_Code.sql file in 129260.zip.)

As seen in Figure 3, the File Group Space report includes maximum size, current size (also known as space reserved), and the actual space used. The report also calculates how much space is left until the next autogrow interval occurs and how much space is left until the maximum size is reached. Color coding is used to alert you when the space remaining is below 25 percent (pink) and 10 percent (red). The color coding is based on reserved space, not maximum space available. In other words, the line will go red when the autogrow operation is approaching. In Figure 3, the Data2 file group is flagged as critical. This is the file group that I intentionally initialized to a very small size.

Figure 3: File Group Space report

Reports are useful when you're working with a specific database. However, in a large environment, it isn't practical to regularly visit every database, so you need an alerting mechanism. You can use the usp_FileGroup_Space_Alert stored procedure for this purpose. It uses SQL Server system views and the DBCC SHOWFILESTATS command to find file groups that are approaching capacity. If the user-specified threshold is reached, it sends out an email.

The usp_FileGroup_Space_Alert stored procedure addresses a problem that commonly occurs when using percentages to monitor the size of very large files: If you have a very large file group, a 75 percent full threshold can work out to hundreds of gigabytes, which might be too large. To avoid this problem, you can specify two alerting thresholds: megabytes remaining and percentage used. You also need to define what you consider a large file group. For the large file groups, the stored procedure uses the megabytes remaining threshold. For all other file groups, it applies the percentage used threshold. For example, you can tell the stored procedure to alert you when a file group is 75 percent full, except for file groups larger than 20GB. For those file groups, you can tell it to alert you when the space remaining drops below 5GB. The command to execute the stored procedure with these settings would be

USE TestDB;
EXEC dbo.usp_FileGroup_Space_Alert
  @PercentageThreshold = .75,
  @MBRemainThreshold = 5000,
  @LargeCutoffMB = 20000,
  @EmailAddress = '[email protected]';

The 129260.zip file includes the code for the usp_FileGroup_Space_Alert stored procedure. You need to have database mail configured to use it.

You can use a SQL Server Agent job to call usp_FileGroup_Space_Alert frequently (e.g., every half hour). That way, you'll know as soon as the database starts approaching the autogrow event.

The final piece of the growth management puzzle is a plan of action for when alerts occur. If additional storage is available, you can schedule files to be expanded during the next maintenance window. Listing 3 shows how to resize the Data2 file group.

USE \\[master\\]
GO
ALTER DATABASE \\[TestDB\\]
MODIFY FILE ( NAME = N'TestDB_Data2_dat_01', SIZE = 10MB );
ALTER DATABASE \\[TestDB\\]
MODIFY FILE ( NAME = N'TestDB_Data2_dat_02', SIZE = 10MB );
GO

If additional storage isn't available, you can add a disk and create more files in the newly allocated space. Listing 4 shows how to add an additional file to the Data2 file group. This approach has advantages over resizing a file group. The database will be faster and perform better because the I/O will be spread over more disks.

USE \\[master\\]
GO
ALTER DATABASE \\[TestDB\\]
ADD FILE ( NAME = N'TestDB_Data2_dat_03'
  , FILENAME = N'C:\TestDB_DATA2_dat_03.ndf'
  , SIZE = 10240KB
  , FILEGROWTH = 1024KB )
TO FILEGROUP \\[DATA2\\];
GO

A Practical Approach

Managing the size of your databases isn't that difficult. You just need to take a practical approach. That approach involves initializing the database files to a size sufficient to store all your data, creating user-defined file groups for nonsystem objects, and proactively managing growth instead of relying on the autogrow option. You can manage growth by using reports, using alerts, and manually resizing a file group or creating a new one when necessary.

Comments

Plain text