SQL Server Integration Services

Suppose that you hire a new DBA and assign him or her to a large-scale SQL Server infrastructure that consists of more than 100 servers divided not only by purpose and criticality but also by geography. This new DBA would likely spend the entire first week getting to know the lay of the land—using SQL Server Management Studio (SSMS) to connect to each SQL Server machine, one by one, to gather essential information such as version, edition, server configuration, existing databases, and scheduled backup jobs. A daunting task indeed.

I learned early in my career that spending time up front to automate otherwise manual and time-consuming tasks can preserve your sanity. I therefore developed a fairly simple solution that connects to each available SQL Server machine, pulls information into a central repository database, and feeds the combined data to a report for DBAs and other IT staff to use. In this article I describe the solution I used. (Note that the solution I offer is intended to enhance, not replace, a preexisting full monitoring and alerting solution.)

Tool Time

Several tools are available to the intrepid DBA setting out to discover the uncharted server landscape. For example, you can use the Microsoft SQL Server Health and History Tool to populate a repository database. However, this tool is outdated and not very flexible. (For more information about SQLH2, see "SQL Health and History Tool," November 2006.)

For my project, I employed the command-line tool Sqlcmd, SQL Server Integration Services (SSIS), a SQL Server database for the repository called DBA_ Rep, and SQL Server 2005 Reporting Services (SSRS). (For more information about Sqlcmd, see "Sqlcmd"; for related articles about SQL Server 2005 and SSIS, see the Learning Path below.) In this article, I explain how to use Sqlcmd and SSIS to construct and populate the DBA_Rep repository database. In a later article I'll explain how to build and deploy the three SSRS reports designed to deliver the data from this repository. (Note: This article makes several nonstandard recommendations, such as querying system tables directly, employing pseudo temp tables, and using xp_cmdshell to run commands. If you stringently adhere to best practices and use only supported techniques, you'll need to withhold your judgment temporarily—until you see that the nonstandard methods I use are efficient and aren't detrimental.)

Create the Repository Database

Now that you have a list of servers to use as input, you might wonder how you can use that input directly in an SSIS package. But don't get ahead of yourself—first, you must store the information somewhere. As most DBAs know, the best place to store a list of data is in a table. Before we examine the SSIS package, let's take a look at the database that will be the repository for the SSIS load.

The table that will store the list of servers on the network from the Sqlcmd command is called ServerList_SSIS. In addition to this table, six other base tables in the DBA_Rep database store data to give a DBA enough basic information at a glance to answer some fundamental questions about the state of each server. These tables are SQL_Servers, Database_Info, Databases, Backup_History, Jobs, and Jobs_Archive. Each of these six tables holds specific information about each SQL Server instance. Web Listing 1 contains each table's schema; this listing also serves as the script to build the database for the SSIS package to populate the database. To run this script successfully, you need to create a blank database called DBA_Rep in your SQL Server 2005 instance. After you create the DBA_Rep database, the script that Web Listing 1 contains will create the necessary tables in the database.

The table ServerList_SSIS is empty, waiting to be filled with server names. Although I didn't automate this process, you can use SSIS techniques similar to the following techniques that I discuss to easily do so. I used the special stored procedure xp_ cmdshell to run the Sqlcmd shell to return the server list. Assuming that you've run the script to create the DBA_Rep database that contains the ServerList_SSIS table, you can easily use an Insert Into statement with the Sqlcmd /Lc command to insert the server data. In SSMS, open a new query window and enter the following command:

USE DBA_Rep 
   GO 
Insert Into ServerList_SSIS
  Exec xp_cmdshell ‘sqlcmd /Lc'

At this point you need to consider the following caveats. First, the result set returns NULL records. The table can accommodate NULL records, and the SSIS package's logic will in turn filter out these rows. You could build in logic to take care of the NULL values before the insertion, but I chose to do it as part of the SSIS package. Because this table has no defined indexes that require unique values, truncating the table also ensures that no duplicate rows occur each time the Sqlcmd /Lc command loads the table. You also need to ensure that xp_cmdshell is configured with the proper permissions to execute. By default, xp_cmdshell is disabled in SQL Server 2005; you can use the SQL Server Surface Area Configuration tool to enable xp_cmdshell.

Obain the Data

The information that's most important to DBAs on a day-to-day basis is server information, such as version (e.g., SQL Server 2005 or 2000), service pack level, edition (i.e., Standard or Enterprise), and default collation. Fortunately, SQL Server 2005 and 2000 store this information in handy system tables. My solution queries each server and stores this information in the SQL_Servers table.

Database information is just as important or even more important than server information; crucial database information includes the two tables Databases and Database_Info. The Databases table simply holds the name of the server and the name of the database. Database_Info holds details, such as size, file location, and recovery model for each database. You can use the Server field to join the Database_Info table to the Databases table. (For ease and simplicity, I used the Server field for joins rather than a key field or a server ID field that I would need to manually update. Although this solution wasn't ideal, it was sufficient for my purposes.)

The final three tables store information about SQL Server Agent jobs and database backups, which I believe is the most important information for any DBA to have. For example, knowing which jobs are failing and need attention is imperative when you're working with hundreds or even tens of database servers and databases. Jobs often fail—and because most jobs perform routine full and transaction log backups if they fail, response must be swift. The Backup_History table holds detailed information about backups that occurred in the past n days. I've found that 5 is a good number of days of history to analyze.

Populate the Repository

Now that you have the list of SQL Server machines, and the repository database is defined to store information about these servers for databases, jobs, and backup history, it's time to use SSIS to populate this simple repository. SSIS might seem like a complex design environment if you've never used it. Many DBAs use DTS for SQL Server 2000 to cut their teeth for extraction, transformation, and loading (ETL).

Figure 1 shows the full package that I used to populate the repository database. This package consists of three areas: (1) migrating and/or truncating repository tables to maintain the repository, (2) populating a variable with an ADO record set of server names derived from the commandline utility Sqlcmd, and (3) using this variable to programmatically connect to each server, one by one, and pull information into the repository.

Figure 1: Full package to populate the repository database

Truncating tables and migrating data to archive tables occur first in the SSIS package. The Execute SQL Task objects that run the Truncate Table statements are grouped together in a sequence container at the top of the package. When the package runs, all the tables are initially truncated; the only exception is the Job stable. Before truncating the Jobs table, a Data Flow task moves the data from the Jobs table to the Jobs_Archive table. I wanted to maintain a history of job successes and failures to analyze over time. The other tables need the most current data—and in my opinion, starting fresh each time for this semi-static information is cleaner. As Mentioned previously, I'm pulling 5 days' worth of backup history that will repopulate with each run. Figure 2 shows the dialog box to configure the Execute SQL Task object to truncate the SQL_Servers table. When all the objects in the Truncate Tables and Populate Archives sequence container complete successfully, the package moves to the second phase, which is to populate a variable with an ADO record set.

Figure 2: Configuring the Execute SQL Task object to truncate the SQL_Servers table

Before I explain how to populate a variable from a SQL Server query to use within an SSIS package, let me explain why you might want to do so. If you have fewer than 10 servers, creating a separate connection to each server might be feasible. In SSIS you'd have 10 Connection Managers, each pointing to one SQL Server machine. More than 10 servers is problematic, but a tenacious DBA might be willing to create separate connections for, say, 20 servers—as long as no additional servers are expected, or the DBA is willing to manually add servers and maintain the package indefinitely. In my case, I had more than 100 servers, so I needed a better solution.

The task to populate the variable uses a simple SELECT statement to query the ServerList_SSIS table containing the server names previously derived from Sqlcmd. The query is:

SELECT RTRIM(Server) AS 

FROM
ServerList_SSIS

Figure 3 shows the dialog box to configure the Execute SQL Task object for the Populate ADO Variable task, including the Connection, SQLSourceType, and SQLStatement property values. A direct input query to the DBA_Rep database resides on the local SQL Server machine named QASRV, which is also where the package will run. To use the query results to populate a variable, you need to have a variable already set up.

Figure 3: Configuring the Execute SQL Task object for the Populate ADO Variable task

For my solution, I needed to set up two variables: one for the Execute SQL Task object, and one for the final third of the package, which uses Foreach Loop container objects.The five Foreach Loop containers (i.e., one for each database table in the DBA_Rep database) employ a combination of techniques to meet the defined goal of programmatically changing a connection string iteratively for each server and executing a Data Flow object to retrieve server-specific information.

To open the Variables toolbar, right-click anywhere in a blank area of the package and select Variables from the list. Figure 4 shows the Variables toolbar with two defined variables of two different data types. The first, SRV_Conn, is a simple string variable. The second, SQL_RS, is an object data type. These distinctions are important. Because the result set from the SELECT statement contains multiple records, a string variable doesn't work. I needed to use the SQL_RS object with a value of System. Object to hold the results, then map the two variables, object to string, in the Foreach Loop container. I used the following four simple steps to accomplish this task.

Figure 4: The SRV_Conn and SQL_RS variables

Under Result Set in the Populate ADO Variable task, set the Result Name to 0 and the Variable Name to User::SQL_RS, as Figure 5 shows.
In each Foreach Loop container, set the enumerator in the Collection area to Foreach ADO Enumerator, and set the ADO object source variable to User:: SQL_RS, as Figure 6 shows.
Under Variable Mappings in the Foreach Loop containers, set the Variable to User::SRV_Conn and the Index value to 0, as Figure 7 shows.
Use a property expression in the Connection Manager object to assign the string variable to one Connection Manager. When the Foreach Loop container executes the Data Flow tasks it contains, the connection string is dynamically built with each enumeration of servername. In this package, the Connection Manager called MultiServer serves this purpose. Setting the variable to the ServerName property, as Figure 8 shows, causes the connections to set themselves correctly for each server.

After the variable mappings are in place, you can use Data Flow objects within each Foreach Loop container to load the tables. You can place Data Flow objects on the SSIS package's Control Flow tab, but these special objects have their own tab on which you can define their properties and sequencing. In general, a Data Flow task consists of a source and a destination object. In my solution, both the source and destination are OLE DB connections to a SQL Server machine. I configured the source to use the MultiServer connection that would enumerate through the list of servers, and I configured the local DBA_Rep connection as the destination to hold the data. Source and destination columns are mapped together. The source can be an object, such as a table or view, or as in my package, it can be a SQL query to be used as a derived table.

For four of the five tables, I sent one query to select values and loaded the results from the remote servers into the local DBA_Rep database. To examine the tables' source queries, right-click the Data Flow object within the Foreach Loop container and select Edit. Then, on the Data Flow table, right-click the OLE Data Source object and select Edit again to display the source query. Web Listing 2 contains the query to load the Databases table.

I needed to handle the Database_Info table, which loads detailed information about each database, a bit differently. Because the SQL Server 2000 Master database stores basic information about each database in the sysdatabases tables, and each database stores the remaining important information, I needed to query each database individually. I could have used the stored procedure sp_MSForEachDB, which has been available since SQL Server 7.0, to easily accomplish this task. However, using this stored procedure doesn't return a solitary result set. I needed a full result set, so I considered other alternatives. Using temp tables or table variables would have given me the full result set I needed, but setting up and maintaining temp tables is difficult and requires special considerations. For example, you need to create the temp table beforehand, and you must set a value to retain the connection.

My solution was to query each database to return database-specific information in a table that I created in the TempDB database. The table I created wasn't a true temporary table with a # or ## prefix. Although the table resides in the TempDB database, its size and location have minimal effect on the source server. Web Listing 3 contains the code to create and populate this table, called HoldForEachDB. Notice the syntax of the sp_MSForEachDB stored procedure, which substitutes a question mark for the database name. This command is fairly useful, without requiring you to wrap logic into cursors to provide similar functionality. (Note: When you save the task, you might receive a message such as "The DROP table construct statement is not supported." However, the query will still delete the table and repopulate it with crucial database information such as status, recovery model, size, and updateability.)

Troubleshooting

The package runs on both SQL Server 2005 and 2000 servers. If no errors generate, the package will run in less than 2 seconds for 2 servers and in less than 2 minutes for 30 servers with varying amounts of data. If the package runs successfully and each package object turns green, you're ready to query the repository with custom queries and reports. Even if the package objects don't all turn green, don't worry—each task has a maximum error value that you can set to let the package continue running even if it can't connect to an individual server, for example.

Servers fail to connect for various reasons—for example, they might not be online, or the user running the package doesn't have access. You can use event handling to send an email message or write to a log to report server failures. Event handling, logging, and scheduling the SSIS package for automatic runs are beyond the scope of this article. For more information about these tasks, see SQL Server Books Online (BOL).

After you load the database, you can run a simple query such as the following:

SELECT \[Server\]
,\[ProductVersion\]
,\[ProductLevel\]
,\[IsIntegratedSecurityOnly\]
,\[EngineEdition\]
,\[Collation\]
,\[IsClustered\]
,\[IsFullTextInstalled\]
,\[SqlCharSet\]
FROM \[DBA_Rep\].\[dbo\].\[SQL_ 
Servers\]

This query provides the data that Table 1 shows, which includes useful information about the two servers I used to test the package.

You've got the Power

The package I created is just one example of what you can do with SSIS. Combining the variable and expression values gives you unprecedented control and efficiency in designing packages. For my package, I used variables and expressions to programmatically control a connection string to query multiple servers without needing to manually define each connection.

Comments

Plain text