SQL Server 2008: Enter the Data Collector

The Performance Data Warehouse (PDW) isn't a new idea; however, it's rarely implemented. A PDW is the idea or concept of applying business intelligence (BI) solution architectures to a DBA’s informational requirements. A DBA needs information, such as query statistics, server activity, and disk usage that's relevant to his or her database systems.

The benefit of using a PDW correctly is that the DBA can routinely monitor and analyze various relational database metadata to assist him or her in performing database tuning and capacity planning. As with modern BI architectures, PDWs can vary in how they're implemented. Some PDWs actually collect data and place it into separate database servers, some don't. Some leverage a formal star or snowflake schema, some don't. Some use cubes (i.e., multidimensional structures) for analytical reporting, some don't (i.e., they use relational tables for reporting).

SQL Server 2008 includes many new relational engine enhancements. Data Collector is one such improvement; it's an easy-to-configure, out-of-the-box PDW that can help you manage your data more efficiently than ever.

An Out-Of-the-Box PDW
Data Collector is yet another composite feature built using other SQL Server components—namely SQL Server Agent and SQL Server Integration Services (SSIS). Data Collector is installed with the relational engine and three system (default) collection sets. A collection set is a logical container that represents a silo of metadata being collected. The three system collection sets are disk usage, query statistics, and server activity.

You use SQL Server Management Studio's (SSMS) Object Explorer to configure Data Collector. When you do so, Data Collector's metadata is stored in the msdb, SQL Server Agent jobs are created, and additional tables are defined in the Management Data Warehouse (MDW).

To create additional or custom collection sets, you must leverage T-SQL. The following are four collector options (i.e., data-sources) you can use to create custom collection sets:

T-SQL
SQL Trace
performance counters
query activity (which collects the same information as the query statistics system collection set)

You can use Data Collector for a many-to-one (M:1) server environment by configuring Data Collector to collect information from multiple SQL Server 2008 instances into one centralized repository.

When configuring Data Collector, you have a major decision to make: Do you want the collection sets to run in cached or non-cached mode? Cached mode essentially means that the generated SSIS packages will run on two different schedules (i.e., collection, upload) and use temporary storage on the file system.

Under the hood, this temporary storage is actually the SSIS binary file format. In non-cached mode, the entire data collection and loading process is accomplished in one generated SSIS package and doesn't use a temporary cache. Instead, the data is collected into "snapshot" tables inside the MDW; the MDW essentially serves as the Data Collector's PDW.

You facilitate reporting by hosting default SQL Server Reporting Services (SSRS) reports in SSMS. These reports are accessible in Object Explorer by right-clicking a system collection set and selecting the Report context option. Some of the default reports provide drill-through capabilities as well. In addition, you can use SSRS to create custom reports.

Data Collector's Pros
Data Collector is a great first step towards out-of-the-box PDWs for SQL Server. Configuration is easy, requiring just a few clicks in SSMS. Data Collector provides the M:1 configuration that lets you consolidate multiple collection systems into a centralized repository. In addition, Data Collector is designed to minimally absorb system resources. Last (but not least), the Data Collector is extensible; it gives you ability to create custom collection sets and reports.

Data Collector's Cons
New features often have several shortcomings, and Data Collector is no exception. First, there are no formal cubes or underlying de-normalized schemas in the MDW because it's a relational data warehouse. Second, the default reports don't facilitate a M:1 Data Collector configuration via a server parameter. Third, the Data Collector works only on SQL Server 2008 systems.

A Good Start
Data Collector as a new feature is pretty solid. There's quite a bit of room for improvement, but the feature’s current state is good. If the Data Collector could be configured to collect data from SQL Server 2000+ instances (i.e., clients), it would be much more attractive.

Comments

Plain text