Use SSRS and SSIS to Create a DBA Repository

I work as the primary DBA on a network of more than 60 SQL Server machines. When I began work on a repository project to centrally store information about each of those machines, I had two goals in mind. First, I wanted to be able to use T-SQL or a SQL Server 2005 Reporting Services (SSRS) report to quickly answer any questions about the machines. This turned out to be the loftier of my two goals. Second, I wanted a solution that I could enhance over time, by modifying the schema if necessary, to accommodate new categories of information. This second goal led me to combine the twin powers of SSRS and SQL Server Integration Services (SSIS). I wrote about my solution in “SQL Server Reporting Services” (June 2007) and “SQL Server Integration Services” (May 2007).

Readers responded so well to these articles, and my original solution has progressed enough since then, that I wrote this follow-up article. In this article, I discuss the enhancements I’ve made to the original SSIS package, as well as the new SSRS reports that I use daily.

Background

My original solution includes a core set of tables in a single SQL Server 2005 database repository called DBA_Rep. With the exception of the table called ServerList_SSIS, the DBA_Rep database is populated via an SSIS package, called DBA_ Server_Load. You can populate the ServerList_SSIS table manually, which I recommend for initial testing. Alternatively, you can use a tool such as Sqlcmd. For more information about using Sqlcmd, see “Sqlcmd” (April 2007). Figure 1 shows the Server List_SSIS table that I used SQL Server Management Studio (SSMS) to populate.

Although you might find it helpful to refer to my original articles to understand the enhancements I describe in this article, my approach here is to assume that you’re not familiar with the original solution. Thus, the code for the enhanced projects and for the DBA_Rep database schema changes includes all the code from the original solution. The downloadable code for the new solution also includes detailed instructions for configuring the updated DBA_Rep database and for deploying the SSIS and SSRS projects. (To download the new solution, click the Downloads link above.) Within the article, I also provide tips to help you easily implement the solution. Once you have the new project up and running, you can download and use the previously published reports with the new solution. I made the following enhancements to the DBA Repository solution:

Added support for SQL Server 2005 and SQL Server 2000 queries to the msdb database
Added support for SQL Server authenticated domains via multiple data sources and duplicate objects
Added support for server location
Added information for reports and ad-hoc queries: ° SQL Server users ° SQL Server logins ° SQL Server group memberships ° Disk space allocation (the most useful enhancement by far)
Made enhancements to the SQL Server Agent job schedule

Support for SQL Server 2005 and SQL Server 2000

My original SSIS package was limited because it wasn’t designed to take full advantage of both SQL Server 2005 and SQL Server 2000. This limitation was most evident when it came to the schema changes in the msdb database that occurred with the release of SQL Server 2005.

Among other things, the msdb database stores information about all SQL Server Agent jobs. Previous queries of the msdb database for the DBA Repository solution worked fine for SQL Server 2005 and SQL Server 2000. But I ran into a problem when I tried to include information about job schedules. In SQL Server 2005, adding the sysschedules table to query the msdb database required a change to the SSIS package. Because this table doesn’t exist in SQL Server 2000, additional steps were necessary for me to acquire the schedule information that was crucial to my DBA analysis. I had to use two separate queries that returned equivalent data to the Jobs repository table. Specifically, these queries gather job frequency information.

The two queries return the same number of columns. I was most interested in the columns regarding the schedule frequency, such as freq_type and freq_interval. As I explain later, this information is vital for producing reports that show when the jobs are scheduled to run. In addition, combining this information with the data captured from the last_run_duration field lets you isolate jobs that aren’t performing as expected for their scheduled times.

In order to segregate the SQL Server 2005 and SQL Server 2000 servers so that the queries would execute against the correct version, I had to make a few crucial changes to the flow of the SSIS package. First, I set up multiple additional data sources to support SQL Server 2005 (version 9.X) and SQL Server 2000 (version 8.X) in demilitarized zone (DMZ) and non- DMZ environments. I also set up two additional variables to support SQL Server 2005 connections. These variables, as you might recall from the original solution, are populated by a SELECT query that reads the server name from a table (ServerList_SSIS) stored in the repository database. The result is stored in an object data type variable. This variable is then converted to a string data type variable so that it can be used with the ForEach Loop container objects as well as the ServerName expression for each data source. Adding support for SQL Server 2005 and SQL Server 2000 was simply a matter of repopulating the variables at several stages during the package’s execution. The following code contains the SELECT queries that repopulate the variables:

--Run for SQL Server 2000 (version 8)
SELECT LTRIM(RTRIM(Server)) AS servername
FROM ServerList_SSIS
WHERE Connect = 1) AND (Version = 8) AND
(DMZ = 0)

--Run for SQL Server 2005 (version 9)
SELECT LTRIM(RTRIM(Server)) AS servername
FROM ServerList_SSIS
WHERE (Connect = 1) AND (Version = 9) AND
(DMZ = 0)

Next is the code to update the SQL Server version :

Update Serverlist_SSIS set version =
cast(left(dbo.SQL_Servers
.ProductVersion,1) as smallint)
From ServerList_SSIS Inner Join SQL_
Servers on ServerList_SSIS.Server = SQL_Servers
.Server

This code takes advantage of an additional column, called “version,” added to the ServerList_SSIS table. This change allows automatic qualification for the version of newly added servers.

Now let’s look at a report that uses the accumulated data for the SQL Server Agent jobs gathered from the preceding package enhancements. This report, called Job Interval, is an enhanced version of a previously published report. The new report shows job schedule information and job status (e.g., whether the job is scheduled to run, whether the job failed or succeeded). I use parameters in the actual report to filter jobs based on this type of information. For example, in Figure 2, the report shows only jobs that are scheduled to run and only those of the type backup. The term “backup” here is a subjective value; in my organization, all our backup jobs are named “backup.” Thus, my report will contain jobs that are titled with the word “backup” in their names. Using runtime parameter values to apply a filter to a report is fairly straightforward. (For information about this procedure, see “SQL Server Reporting Services,” June 2007.) The end result is that I now have one location to query all SQL Server Agent job schedules and status information, regardless of the SQL Server version. The downloadable code includes this updated report.

DMZ and SQL Server Authentication

My next enhancement (i.e., support for SQL Server authentication for non-Windows domain accounts in DMZ environments in which Windows credentials might not pass through) was a rather simple addition. I added two data sources that would use SQL Server authentication to connect to the DMZ and duplicate the existing Control Flow tasks.

When the updated package runs, the first phase truncates the local repository tables (as in the original solution). Next, the Windows-authenticated objects execute and complete. Then, the package populates the same variable that was used for the Windows-authenticated server list, but now only the servers in the DMZ require SQL Server authentication. The following code contains the query that populates the DMZ servers:

SELECT LTRIM(RTRIM(Server)) AS Servername,
FROM ServerList_SSIS
WHERE 1(Connect = 1) AND (DMZ = 1)

Notice the DMZ flag is set to 1 for true.

After you load the DBA_Server_Load SSIS project that populates the DBA_Rep database, you must configure the appropriate DMZ SQL Server-authenticated credentials for this enhancement to work in your environment. Because the Connection Manager Server name property is dynamically configured at runtime, you can overwrite the two Connection Manger sources (i.e., MultiServer_DMZ and Multi- Server9_DMZ) with valid SQL Server account names at design time simply to save the SQL Server user account password. In fact, doing so is actually necessary the first time you run the SSIS package.

You need to enter a valid server name and SQL Server authenticated user name and password to initiate the first connection. I chose ‘sa’ because it’s a known, usable SQL Server administrative account. Your account might be a different SQL Server authenticated administrative account. After you save this account information, it will be used at runtime. However, the server is dynamically set based on the variables you use for the Server name connection property. Web Figure 1 shows the entire DBA_Rep SSIS package with annotations for the package flow.

Location

I’ve been hoping for a long time that future versions of SQL Server would include an area for descriptive data about the installation—information that can’t easily be gleaned elsewhere, such as a server’s geographical location. DBAs typically work around the unknown location problem by adhering to a naming standard that includes OS version, application, and location. For example, a Windows Server 2003 SQL Server machine located in Denver might be called W2K3SQLDEN1. Some quick string parsing would reveal DEN in the server name and could be used as a location trigger.

The problem with this solution is that even though you might have 90 percent compliance with the nomenclature, you’ll always have a few stragglers that lack a location-defining name. So until SQL Server includes this description or categorical storage area, you must manually track servers’ locations. Thus, I added a location field to the updated DBA_Rep SSIS package and repository database so that you can use locations in reports or queries.

The Server_Location table consists only of a LocationID and Location (string) column, which refers to the city in which the server resides. The LocationID values are stored in the ServerList_SSIS table—and although this table is initially populated manually, the package will update the SQL_Servers table after the package completes execution with the values from the ServerList_SSIS table. The following code contains the query that runs as the last step of the package to update the SQL_Servers table from the LocationID value stored in the ServerList_SSIS table:

UPDATE dbo.SQL_Servers
SET dbo.SQL_Servers.LocationID = dbo
.serverlist_ssis.LocationID
FROM dbo.serverlist_ssis
INNER JOIN dbo.SQL_Servers ON dbo
.serverlist_ssis.Server = dbo.SQL_
Servers.Server

New Data to Play With

In addition to the schema changes necessary to provide SQL Server 2005 and SQL Server 2000 support and DMZ connectivity, I added several new tables to the DBA_Rep repository database. These tables categorically store two of the most frequently requested pieces of information: users’ permissions and disk space utilization.

In any large organization, especially one that has a lot of turnover, quickly ascertaining database or server access rights at the user or group level is essential to safeguarding data. Large companies with many users typically have numerous SQL Server machines throughout the network. And even when preventive measures are implemented, unmitigated database and log file growth can occur because of poor planning, unexpected large data load volumes, or failed maintenance tasks to reduce file sizes.

To address both security and disk space management, I added several key reports to the repository. But before I discuss these reports, let’s look at how one of the tables, SQL_Drives, is populated within the SSIS package. The same technical methods are used to populate the new SQL Server user tables.

As you can see in Figure 3, two tasks are encapsulated in a ForEach Loop container. This container rolls through the tasks for each server derived from the Server name variables, and in this case initiates a SQL Server execution task called Fixed Drives to populate a temporary table, then runs a Data Flow task called Fixed Drives Load to pull the data from the temporary table to the DBA_Rep table called SQL_Drives. The reason I had to use a temporary table was because of the need to execute an extended stored procedure, xp_fixeddrives, for which there wasn’t a valid result set without a temporary table.

Figure 3: SSIS ForEach Loop Container to Populate Drive Space Usage

It wasn’t possible to execute the xp_fixeddrives stored procedure directly in the Data Flow task and provide all the data I wanted—specifically, the Server, Disk_Drive, and MBFree columns. The xp_ fixeddrives stored procedure returns only the latter two pieces of information. But because I had no way to tie a server name to the repository, this information would be meaningless. My solution, which Web Listing 1 shows, was to create a temporary table in an Execute SQL task and populate it with the required data. Next, I fed the results of this query as input columns to the Data Flow task called Fixed Drives Load, which populates the SQL_Drives table in the repository. I included the code that executes the xp_fixeddrives stored procedure into the temporary table, RESULT_STRING_FD, then uses SERVERPROPERTY(‘Servername’) to update the server information. The results of the query in Web Listing 1 are fed as input to the OLE DB destination, which is the SQL_Drives table in the DBA_ Rep database, via the Data Flow task in the DBA_Server_Load SSIS package.

After the package executes, you can use the following code to query the results from the SQL_Drives table:

select SD.Server,
Drive_Letter,
MbFree
from sql_drives SD

shows the results of this query. With the data already gathered for database information (which was part of the original solution), you can now tie the database sizes to the space remaining and create a threshold value that you can monitor.

Putting the Data to Use with Reporting Services

I needed an at-a-glance report that would tell me (1) which of the drives on each server was potentially getting low on free space and (2) which server was hitting a threshold of data and log file sizes as it pertained to the free space. The driving query for this report, which Web Listing 2 shows, returns only a few columns, mostly calculations of the free space and database file sizes per drive per server.

Note that the query includes the Size field from the Database_Info table. This size is shown in 8KB pages that require division by 128 so that the size matches the megabyte value of the MBFree column. The percentage calculation is derived by dividing the sum of the database size by the total of MBFree, plus the sum of the database size again, times 100. Note, also, that the outer join between the two tables, Database_Info and SQL_Drives, is accomplished by linking the Drive_ Letter field from SQL_Drives with the first character of the FileName field in Database_Info. The first letter, derived with the LEFT function, is the drive letter where the database or log file is located.

The SQL Drive Space report incorporates a couple of interesting uses for report parameters and expressions to display a bar chart and table containing the drive space inormation. The High_Usage parameter lets users select all servers or only the servers that have a threshold event (which I arbitrarily set to a disk file usage above 30 percent). If you select the value High Usage from the parameter dropdown list, you’ll see only those servers that have potential drive space issues. Otherwise, you see all servers by default if you don’t select a value. As Figure 4 shows, color coding helps you determine at a glance which of the servers has a threshold value exceeded.

Figure 4: Color-Coded SQL Drive Space Report

The SQL User Permission report has saved me hours of time investigating where access was granted—especially when a user was placed in a group and thereby hidden from the DBA. Even with access to Active Directory (AD), analyzing Windows domain group membership is tedious. Instead, I wanted one big result set that returned all the user information for all the SQL Server machines being reported on (especially production servers). The resultant query, which Web Listing 3 contains, and the subsequent report let you select a user by name or part of a name, as well as by group or by server.

The report, although not pretty, is quite useful. Web Figure 2 shows a report run for Location 4 (from the Location field), with a breakdown of permissions for each database. You can analyze Windows group and account names, although the report in Web Figure 2 doesn’t list any. Finally, the user parameter supports wildcard lookups. For example, you could look for a combination of all the servers in Location 4 and users with a name like “gue,” to return only the “guest” user. This option is handy when you know only part of a name.

Reader Kudos

Thanks to everyone who read my original articles and provided feedback to help improve my solution. I’ll continue to make enhancements to the tool (which I’ve dubbed SQL Snapper, for snapshot data analysis— plus, I live near the beach and I like to fish!). Although I created the tool for straight reporting, it’s useful for many other projects as well. Have fun tinkering with the code (in a test environment of course), and feel free to email me with any ideas you have for modifying or expanding it.

Comments

Plain text