Consolidate Data on Executed SSRS Reports for Easy Querying

If you're a DBA and your company uses SQL Server Reporting Services (SSRS), in some capacity you're probably also a report administrator. Because SSRS is a SQL Server technology, DBAs often are asked to manage SSRS just as they would manage SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS), and SQL Server itself. I've been in this position. Outside of routine tasks such as deploying reports, setting permissions, and creating report subscriptions, I've been called on to answer questions about how certain reports are performing or how many times they're run and by whom. I've also been asked to discover which reports are no longer being used.

If you have one or two SSRS servers, you can easily answer such questions with a quick ad hoc query of the views or tables provided in the SSRS metadata database, ReportServer. But what if you have many SSRS servers? Further, what if you want to easily compare how one server might be faring against another server for performance load? Having this data centralized for proper analysis would be ideal. I'll show you how to use an SSIS package to pull catalog and execution log data from any number of SSRS instances into a central repository. I'll also provide some queries that you can use to answer fundamental questions about report statistics.

I'd like to point out that the technique I'll use in the SSIS package to retrieve data from each SSRS ReportServer database is a derivative of the DBA Repository solution. In this solution, I populate a package variable with a list of servers from an underlying table named serverlist_ssis in the DBA_Rep database. To take full advantage of the solution in this article, having the basic DBA_Rep database is required. The DBA Repository solution is simple to set up. I included two scripts and instructions in the ReadMe.txt file so that you don't have to wade through previous articles to begin collecting SSRS data. I also included a sample record and description in the ReadMe.txt file for the serverlist_ssis table to help you get started. You can download these files by clicking the Download button. If you want to learn more about the DBA Repository solution, check out the SQL Server Pro articles "DBA Repository 2010" (July 2010) and "Use SSRS and SSIS to Create a DBA Repository " (February 2008).

The Queries for the SSIS Package

Let's begin by looking at the queries that you can use to gather report server data and that you'll ultimately plug into the SSIS package to retrieve the data from every SSRS server. Depending on what version of SSRS you're currently running, the ReportServer database has tables, views, or both that contain report execution information. In SQL Server 2000 and SQL Server 2005, a table named ExecutionLog contains all the execution information. In SQL Server 2008, a view named ExecutionLog2 was added, and with SQL Server 2008 R2 came another view, ExecutionLog3. In SQL Server 2008 and later, ExecutionLog is still available as a view for backward compatibility. (It's also available as a table, which has been renamed to ExecutionLogStorage.) Although there's additional useful information in ExecutionLog2 and ExecutionLog3, most of the information of interest is available in ExecutionLog, so that's what this solution uses. Using ExecutionLog has an added benefit: If you have a mix of SQL Server versions across your SSRS infrastructure, you don't have to write multiple queries and objects to accommodate the different column names and data types across all the versions.

The ExecutionLog columns of interest include InstanceName (what report server processed the report), ReportID, UserName, RequestType, Format, Parameters, TimeStart, TimeEnd, TimeDataRetrieved, TimeProcessing, TimeRendering, Source (e.g., live, cache, session), Status (e.g., success, abort), ByteCount, and RowCount. Most of the columns are self-explanatory, but I'll expand on a few of them later in the article.

Listing 1 shows the query that you'll use to populate the DBA_Rep table in the repository database. Notice that there's a ReportID, which has a data type of uniqueidentifier. To make ReportID useful, you need to also collect the report names and the path to where the reports reside in the folder structure.

SELECT [InstanceName]
  ,[ReportID]
  ,[UserName]
  ,[RequestType]
  ,[Format]
  ,[Parameters]
  ,[TimeStart]
  ,[TimeEnd]
  ,[TimeDataRetrieval]
  ,[TimeProcessing]
  ,[TimeRendering]
  ,[Source]
  ,
  ,[ByteCount]
  ,[RowCount]
  FROM [ReportServer].[dbo].[ExecutionLog] WITH (NOLOCK)

To get that information, you need to pull data from the Catalog table in the ReportServer database. That table's ItemID column will be the joining key field to the ReportID field from ExecutionLog. Be warned that Microsoft doesn't condone this activity, so it's with a heavy heart that I point you to the query that I use to get the catalog data. Listing 2 shows this query.

SELECT Cast(@@Servername AS nvarchar(100)) AS Server,
  [catalog].ItemID,
  [catalog].[Path],
  [catalog].Name,
  [catalog].Description,
  [catalog].CreationDate,
  [catalog].Modifieddate
FROM catalog WITH (NOLOCK)
WHERE [catalog].TYPE IN (2)
ORDER BY [catalog].[Path]

There are a few things to note about this query:

I added the server name to distinguish where the reports are coming from.
I added a NOLOCK table hint to avoid any contention while the SSIS package is running.
I included only the reports themselves, as indicated in the WHERE clause (Type = 2).

You can execute the queries in Listing 1 and Listing 2 against any ReportServer database, no matter what version of SSRS is installed. To consolidate the data, you can use an SSIS package that loops through all the report server databases and places the retrieved data into two local tables, which can be joined together through the ReportID and ItemID fields.

The SSIS Package

The SSIS package SSRS_Data_Consolidate.dtsx queries each ReportServer database. I kept this package as simple as possible so that I could use it as a template for other types of multi-server data collection processes in the future. To that end, the SSIS package has four core steps:

1. Populate an ADO variable with the list of report servers to query.

2. Truncate the local staging tables.

3. Gather the data from each report server one by one, using Data Flow tasks in a ForEach Loop container.

4. Merge the data in the staging tables with the data in the main tables.

Figure 1 shows these four core steps. Let's take a closer look at each one.

Figure 1: SSIS Package to Gather Report Data

Step 1. In the first step, an Execute SQL task named Populate ADO Variable For SQL Server Names uses a SQL query to assign a result set to the first of two variables that will be used to provide a list of server names to the ForEach Loop container. The first variable to get populated is User::SQL_RS, which has an object data type. It's configured on the Result Set tab, as you can see in Figure 2.

Figure 2: Configuration of the User::SQL_RS Object Variable

The User::SQL_RS variable will hold the results of the query shown in Listing 3.

SELECT DISTINCT
  serverlist_ssis.server
FROM databases
  LEFT OUTER JOIN serverlist_ssis
  ON databases.SERVER = serverlist_ssis.SERVER
WHERE databasename = 'ReportServer'
  AND serverlist_ssis.DMZ = 0
ORDER BY server

This query gets the server names by simply returning the serverlist_ssis.server column from the DBA_Rep database. The WHERE clause filters the server list so that it includes only servers that contain a database named ReportServer. It determines this by joining the serverlist_ssis table to the Databases table using the Server column.

To effectively use this query, you need to have the DBA_Rep database installed and populated. However, you can modify the query to include a list of your SSRS servers from whatever source you want. Note that the criteria used to determine whether a server is an SSRS database server isn't perfect because the ReportServer name could've been changed during a custom installation.

You'll ultimately have to determine the best query to populate your list of servers. I offer some guidance in the ReadMe.txt file to help you through any initial setup problems. Plus, I'm always available at [email protected] if you're really stuck. Hey, what can I say -- I want this to work for you. SQL Server Pro has entrusted me to deliver a useful solution, so I'll do all I can to make sure that happens.

Step 2. In the next step, a Sequence container encapsulates two Execute SQL tasks that do nothing more than truncate two local staging tables in the DBA_Rep database. These tables are SSRS_ExecutionLog_Stage and SSRS_Reports_Stage. They're nearly identical in design to the main tables SSRS_ExecutionLog and SSRS_Reports. Their function is to temporarily hold the combined result set of each server until that data is merged with the data in the main tables as part of the last step in the package.

Step 3. The ForEach Loop container named SSRS Report Reader is really the heart of the package. It contains two Data Flow tasks: one for pulling the report (or catalog) metadata from each ReportServer database and one for retrieving ExecutionLog data. In each Data Flow task, there's only one OLE DB source and one OLE DB destination. The destination is the DBA_Rep database, which is typically on the local instance of SQL Server 2008 or later. The destination will use a Connection Manager object named DBA_Rep. The source will use another Connection Manager object named MultiServer. To allow the ForEach Loop container to cycle through each server, you can use the variable User::SQL_RS as the Foreach ADO Enumerator and map the collection value to the second variable User::SRV_Conn, which has a data type of string. You can use the User::SRV_Conn variable to assign a value to an expression. In this case, the server name is assigned to an expression. The expression is then used as the value for the ServerName property of the MultiServer Connection Manager object. Once you understand the concept, you can use this technique to pull data from any number of servers and consolidate it locally. This was the concept behind the DBA Repository.

Step 4. The final two Execute SQL tasks merge the data from the staging tables with the main tables, which will be used for analysis. The data is merged mainly to avoid duplicates. However, the merge also lets you know if a report has been deleted. As I mentioned previously, the staging and main tables are nearly identical. The only difference is that one of the main tables, SSRS_Reports, has a column named Deleted, whereas its counterpart staging table, SSRS_Reports_Stage, doesn't have it. If the SSRS_Reports table has a record that isn't in the SSRS_Reports_Stage table, the Deleted column is assigned the value of 1 to signify that the report might have been deleted on the source ReportServer database. I didn't want to automatically delete the row so that I'd have a chance to investigate further to make sure it was an approved event. Listing 4 shows the code to merge the data in the staging and main tables for SSRS_Reports.

MERGE SSRS_Reports AS Target
USING SSRS_Reports_Stage AS Source
ON (Source.Server = Target.Server
  AND Source.ItemID = Target.ItemID
  AND Source.Path = Target.PATH
  AND SOURCE.NAME = TARGET.Name)

WHEN NOT MATCHED BY TARGET THEN
INSERT (Server,
  ItemID,
  [Path],
  Name,
  Description,
  CreationDate,
  Modifieddate)
VALUES (Server,
  ItemID,
  [Path],
  Name,
  Description,
  CreationDate,
  Modifieddate)

WHEN NOT MATCHED BY SOURCE THEN
UPDATE SET TARGET.DELETED = 1 ;

In my environment, the SSIS package ran in under 5 minutes for 42 servers. Depending on the amount of execution log data you have to pull, the times can be much longer. You can adjust the amount of time to retain execution log data on each SSRS server if needed. I recommend testing with one or two nonproduction servers to get a feel for how the package runs in your environment. Also, there's currently no error trapping or event handling. The package assumes that the servers will be online and available. For my run, I ended up with 5,226 reports in the SSRS_Reports table and 50,989 records in the SSRS_ExecutionLog table.

Queries to Run Against the Populated Tables

After the SSRS_Reports and SSRS_ExecutionLog tables are populated with data from your servers, you can answer some of the questions you receive regarding reports. One common question is, "How many reports are there on each server?" To answer that question, you can run the following query:

SELECT Server,
COUNT(Name) AS Num_Reports
FROM SSRS_Reports
GROUP BY Server
ORDER BY Num_Reports DESC

Figure 3 shows the results of this query when I ran it. The results revealed that there was a large disparity among the distribution of reports. In addition, only 23 servers out of the 42 servers polled had reports. In other words, 19 servers were deployed and configured as SSRS report servers but they never had reports deployed to them, which is a waste of resources and administration. So, with one query, I uncovered two problems that I needed to rectify.

Figure 3: Sample Results Showing the Number of Reports Per Server

It also might be worthwhile to answer the questions: Which reports are being run the most? When are those reports being run and by whom? How are those reports impacting server resources? The query in Listing 5 helps answers these questions by returning the total time to run each report (processing + rendering + data retrieval) across all users, reports, and servers.

SELECT SSRS_Reports.SERVER,
  SSRS_Reports.NAME,
  SSRS_ExecutionLog.ReportID,
  SSRS_ExecutionLog.username,
  COUNT(SSRS_ExecutionLog.ReportID) AS Run_Count,
  MAX(TimeStart) AS LastRun,
  MAX(SSRS_ExecutionLog.timedataretrieval
    + SSRS_ExecutionLog.timeprocessing
    + SSRS_ExecutionLog.timerendering) AS TotalTime,
  MAX(SSRS_ExecutionLog.timedataretrieval
    + SSRS_ExecutionLog.timeprocessing
    + SSRS_ExecutionLog.timerendering) / 1000
    AS Time_in_Seconds
FROM SSRS_Reports
  INNER JOIN SSRS_ExecutionLog
    ON SSRS_Reports.ItemID = SSRS_ExecutionLog.ReportID
WHERE SSRS_ExecutionLog.STATUS = 'rsSuccess'
GROUP BY SSRS_Reports.SERVER,
  SSRS_ExecutionLog.ReportID,
 SSRS_Reports.NAME,
  SSRS_ExecutionLog.username
ORDER BY SSRS_Reports.Server,
  SSRS_ExecutionLog.username,
  TotalTime DESC

It also shows the last time each report was run as well as how many times it has been run by the user on each server. For long running reports, the query adds a column for seconds, which is derived from the milliseconds that the report times are stored in. Figure 4 shows sample output.

Figure 4: Sample Query Results

Simple Yet Helpful

Although the SSIS package and the queries I showed you are simple, they provide a lot of information to help you uncover what changes you need to make to your reporting environment. For example, you might discover that you need to better balance the loads among your report servers or you might find out that you can remove some report servers that are no longer being used. I made several interesting discoveries when I put this package together and analyzed the data. In the future, I'd like to create queries that find duplicate reports across servers, determine which reports are being run live versus from a cache or snapshot, and find when and where reports are failing. Until then, if you have any feedback on the SSRS_Data_Consolidate.dtsx package or an idea for other queries, I would love to hear it.

Comments

Plain text