Have your customers or managers ever asked you how much their databases grew during the past year? Have you needed to plan how much disk capacity you'll need for the next year based on your database's average growth rate during the past 12 months? How long will your existing unallocated disk space last based on your current growth rate? To answer these kinds of database-growth questions or similar disk-space questions, you need some historical space-usage information about your databases. I've developed a process that you can use to automatically collect space-usage statistics for each of your databases. You can then use the collected space information to perform a simple growth-rate calculation.
Several months ago, I decided to build a process to capture space-usage information for each database on a system so that I could track disk-space consumption over time. I wanted to find the amount of space allocated and used for both the data and the log files. I was looking for the same information that you see in Enterprise Manager's Database Details pane when you're viewing Space Allocated information, but I needed the information to be available to T-SQL code. Using SQL Server Profiler, I discovered that Enterprise Manager obtains space-allocated information by using two DBCC statements. One of the statements, SQLPERF, is documented; the other DBCC statement, SHOWFILESTATS, isn't. By manually running DBCC SHOWFILESTATS on each database and comparing the output with what Enterprise Manager displayed, I determined that this command would provide me used disk space information by database.
Both SQL Server 2000 and SQL Server 7.0 use these DBCC statements to populate Enterprise Manager's Space Allocated display. The DBCC SQLPERF(LOGSPACE) statement returns transaction log space information—the allocated log size for each database in megabytes and the percentage of log space used for each database—for all databases. With some simple math, you can easily convert the percentage of log space used into megabytes. This DBCC statement helped me obtain the log file space information I wanted to track.
I used the undocumented DBCC SHOWFILESTATS statement, which returns space-usage information for one database's data, to obtain the rest of the disk-space statistics I wanted. This statement returns one record per physical data file. Each statistics record returned appears in six columns: Fileid, FileGroup, TotalExtents, UsedExtents, Name, and FileName. You can use the TotalExtents column to determine the total space allocated to data and the UsedExtents column to determine the total space used for data. By summing the TotalExtents and UsedExtents values of all files within a database, then converting the number of extents into megabytes, I calculated the total space allocated and total space used for data. These calculations gave me the data space usage information I wanted to track over time. Figure 1 shows sample output of the DBCC SHOWFILESTATS command after you run it against the master database.
I've built these two DBCC statements into a process that automatically collects space information by database. This process runs periodically and saves space-usage statistics in a database table. The process consists of a SQL Server Agent job that contains two steps. The first step executes a stored procedure named usp_get_dbstats, which generates a T-SQL script. The resulting script consists of a DBCC SQLPERF(LOGSPACE) statement to gather the log information for all databases, a DBCC SHOWFILESTATS statement for each database, and some code to manipulate the DBCC data into the right format for saving the space-usage information in. The second step executes the T-SQL script that the first step generates. After extracting the space-usage information from SQL Server and formatting the data, this script populates a permanent database table with the current data and log space-usage information. You can then use this permanent table to answer a wealth of disk space allocation questions.
This process of gathering space-usage statistics is an example of using T-SQL code to generate T-SQL code. I used this two-step process to minimize the complexity of writing a stored procedure that would need to issue a USE statement to let me run the DBCC SHOWFILESTATS command against each database. Now, let's look at my homegrown disk-space collection process in a little more detail.
The Stored Procedure
The usp_get_dbstats stored procedure, which Listing 1 shows, is the main body of the space-usage statistics-gathering process. The stored procedure queries the system tables and programmatically generates and executes PRINT statements to produce a T-SQL script that, when executed, uses two DBCC statements to extract current space-usage information. Let's walk through this stored procedure one section at a time.
The code at callout A in Listing 1 gathers the log-space usage information. This block of code, like the others in Listing 1, dynamically generates and executes a series of PRINT statements that become the T-SQL script that gathers the space-usage statistics. The code at callout A produces a set of T-SQL statements that create a temporary table called #tmplg, then populates the table with the output from DBCC SQLPERF(LOGSPACE). The INSERT INTO statement that has the EXECUTE option puts the DBCC statement's output into the #tmplg table, which will eventually contain one record for each database on the server. Each record will contain information that goes into columns labeled DBName, LogSize, LogSpaceUsed, and Status. You can find the definitions of each of these columns in SQL Server Books Online (BOL) under the heading "DBCC SQLPERF."
Callout B shows the code that creates the #tmp_stats temporary table. Each record in this table will hold both the data and log space-usage information for a database. Later code blocks will populate and use this temporary table. This section of the code executes a series of PRINT statements to append to the T-SQL script that the code at callout A started.
The code at callout C generates the DBCC SHOWFILESTATS statement for each database. This chunk of code also generates the T-SQL statements that merge the DBCC SHOWFILESTATS information with records in the #tmplg table for each database to produce one record per database containing data and log disk-space statistics. Again, the code uses PRINT statements to generate T-SQL code that will gather DBCC SHOWFILESTATS information.
Remember that the DBCC SQLPERF statement generates log-size information for all databases, whereas the DBCC SHOWFILESTATS statement gathers the data sizes for only the current database. This scope limitation of the DBCC SHOWFILESTATS command requires that the stored procedure generate code that will execute the DBCC SHOWFILESTATS statement against each database on the SQL Server box.
The code at callout C uses a cursor, AllDatabases, to hold a list of the databases on the system. This cursor lets the stored procedure iterate through the list of databases inside a WHILE loop to generate a DBCC SHOWFILESTATS statement for each database. Inside the WHILE loop, the stored procedure generates code to create a temporary table, #tmp_sfs, to hold the output of the DBCC SHOWFILESTATS statement that's followed by an INSERT INTO statement. Again, I used the INSERT INTO statement with the EXECUTE option to insert the DBCC SHOWFILESTATS statement information into the temporary table. The last several lines of code in this section generate the code that will convert the LogSpaceUsed column in table #tmplg from a percentage of used space into a megabyte value. Then, the code populates the #tmp_stats table with current data and log space-usage statistics for the current database.
The code at callout D generates the T-SQL statements to put the data and log space-usage statistics into a permanent table. This section uses a simple INSERT INTO statement to populate a permanent table, DBSTATS, with the current calculated database space-usage statistics that the temporary table #tmp_stats holds.
Web Listing 1 shows a sample of what the T-SQL script would look like if you executed usp_get_dbstats on a server that had only a few databases. This output was produced on a system that had only the standard SQL Server installed databases (master, model, msdb, Northwind, Pubs, and tempdb), plus one user-defined database (DBA). Note that in this listing, one chunk of code collects transaction log space information by using the DBCC SQLPERF(LOGSIZE) command. Seven sections of similar code, one for each database, use the DBCC SHOWFILESTATS statement to gather data space-usage statistics.
The Permanent Table
The usp_get_dbstats stored procedure assumes that the space-usage statistics it gathers will be stored in a permanent table called DBSTATS. So before executing the T-SQL script that usp_get_dbstats generates, you'll need to create the DBSTATS table. Running the code that Listing 2, page 30, shows creates the DBSTATS permanent table that will hold all the historic database space-usage information. Records are appended to the DBSTATS table each time you execute the commands that usp_get_
In my shop, we have a DBA database that contains the DBSTATS table and the usp_get_dbstats stored procedure. If your shop has a database that your DBAs use to hold stored procedures and tables such as DBSTATS, you can change the default database at the beginning of usp_get_dbstats (variable @DBSTATS_DB) to a database appropriate to your site. Note that if you do change the default database name, you need to change the @DBSTATS_DB declaration to match the size of your database name.
The SQL Server Agent Job
You could manually execute the usp_get_dbstats stored procedure to generate the T-SQL script to gather the space-usage statistics, then copy the generated T-SQL script into Query Analyzer to get the current statistics into the DBSTATS table. However, this manual approach would quickly become boring and waste your valuable time. Instead of generating your statistics manually, I recommend that you build a SQL Server Agent job like the Get DBSTATS job that Figure 2 shows.
The Get DBSTATS SQL Server Agent job has two steps. The first step, which Figure 3 shows, uses the osql command to execute the usp_get_dbstats stored procedure. Using osql lets the second step place the output from usp_get_dbstats into a file for execution. The -o option tells the usp_get_dbstats stored procedure to write output to a file called c:\temp\get_dbstats.sql. This file is the T-SQL script that the second step of the SQL Server Agent job will execute.
The second step of Get DBSTATS, which Figure 4 shows, executes the statements that usp_get_dbstats generated, extracting and saving disk-space usage information. Figure 4 shows the osql command that executes the script that the first step produced. The input (-i) parameter feeds into the osql process the T-SQL script that the first step built.
In my shop, I've scheduled the SQL Server Agent job to run once a week so that I can capture the database space-usage statistics and monitor the growth of our databases week by week. You need to determine how frequently you should gather space-usage statistics for your environment. Capturing disk-space usage lets me perform several kinds of disk-space usage analysis. I can track monthly and yearly disk usage, both by individual databases and overall, and how much additional disk space was used when we migrated data related to a particular project.
If you don't have any disk-space usage information, predicting an average database growth rate is extremely difficult. After you've implemented a disk-usage collection method such as the one I've outlined, you have statistics available to help you calculate a database's average growth rate. I produce a simple Microsoft Excel chart monthly that tracks our disk-space usage over time.
Figure 5 shows the monthly disk-space usage for one of our production servers, SQLPROD1. This graph represents the amount of disk space that all our production databases on SQLPROD1 were using on different dates over a period of 7 months. Note that I recorded several spikes in the graph. Over time, I can associate the peaks and valleys with specific events that cause unusual growth in our database, so I can better predict growth rates for upcoming database work. In Figure 5, you can see when we added DB_TEST: The used space on server SQLPROD1 grew almost 3GB.
Although this graph represents disk-space usage statistics starting only in July 2001, getting a picture of the average disk-space growth rate for a more recent or longer period on this server is easy. I can determine the monthly growth rate by using the following simple formula:
MONTHLY_GROWTH_RATE = (SPACE_USED_END - SPACE_USED_BEGIN) / NUMBER_OF_MONTHS
The amount of disk space occupied on July 1, 2001 (SPACE_USED_BEGIN), was 6.5GB. By February 4, 2002 (SPACE_USED_END), the used disk space had grown to 7.66GB. The number of months between the July and February data points is a little more than 7. According to this formula, the monthly growth rate for our SQLPROD1 box is a little more than 0.16GB per month. Now that I can calculate the monthly growth rate, based on statistics, I can predict the number of months before our database growth consumes our available free disk space and I'll have time to acquire more disk space in advance.
Calculating a monthly growth rate for our SQLPROD1 server would be impossible without collecting statistics over time. This homegrown solution, using documented and undocumented DBCC statements, meets my organization's needs. Other organizations might find they need to collect more historical space-usage information, such as space usage by tables within a database. Whether you acquire canned software to track space usage or choose a homegrown solution, gathering database-growth information over time can give you valuable insight into the growth patterns of your databases. Without historical growth-rate information, you have no way to adequately understand a database's disk usage. Knowing the current growth rate of each database will help you more accurately plan for future disk acquisitions.