If you need to parse Microsoft SQL Server server-side trace information so that you can aggregate the data to find your worst-performing and most-called queries, you might try turning to the T-SQL user-defined function called SQL_Signature.
You’re already capturing SQL trace information through the use of server-side traces, which send results directly to files on a locally attached disk on the server. Server-side traces not only minimize server load during the tracing process but also give you an easy way to parse and analyze the information later. By copying the files to another machine for processing, you can negate any impact on the production server while you analyze the data.
But what if you need to parse the trace information in a way that will let you aggregate the data to find your worst-performing and most-called queries, stored procedures, and so on? One simple solution is to use a T-SQL user-defined function called SQL_ Signature (popularized by Microsoft’s Tom Davidson). There are more efficient but slightly more complicated methods for tackling this problem—I’ll follow up in the SQL Server Magazine Performance Tuning and Optimization forum (sqlforums.windowsitpro.com/web/forum/) with a few—but this solution should get you started.
In “Generating Server-Side Traces”, I established that you can use the fn_ trace_gettable() function to read events and columns directly from the trace files. This function will be your primary means of retrieving trace data for direct processing (or placing it in a table for later processing).
SELECT * FROM ::FN_TRACE_GETTABLE('C: YourTraceFile.trc', DEFAULT)
The goal of the SQL_Signature function, which Listing 1, shows, is to replace certain pieces of information (which might vary between calls to the database) with a constant—in this case, the # symbol. These pieces of information are typically the values in a search argument of the WHERE clause that vary from one call to the next, while the rest of the statement remains constant.
Replacing these values will let you get a core list of what I call Query Classes and will let you aggregate the data to get meaningful metrics. In the following example, the three statements are effectively the same except for the CustomerID value. If you tried to aggregate these with a simple GROUP BY clause, you would get three separate groups.
SELECT * FROM \[dbo\].\[OrderHeader\] WHERE CustomerID = 397 SELECT * FROM \[dbo\].\[OrderHeader\] WHERE CustomerID = 398 SELECT * FROM \[dbo\].\[OrderHeader\] WHERE CustomerID = 398
However, if you replace the IDs with a constant, you can group these together to get more meaningful metrics.
SELECT * FROM \[dbo\].\[OrderHeader\] WHERE CustomerID = #
The same is true for stored procedure calls. For example,
EXECUTE \[dbo\].\[cp_test\] 70 EXECUTE \[dbo\].\[cp_test\] 71 EXECUTE \[dbo\].\[cp_test\] 72
EXECUTE \[dbo\].\[cp_test\] #
To use the function, you’d simply pass in the text you want to parse, along with a second parameter to tell it how many characters from the beginning of the statement to use during the parsing.
SELECT dbo.SQL_Signature('select * from orders where orderid = 10255 AND MyDate = '20060101',1000)
You can also use the function directly when reading the data from the trace file; to do it properly for many rows, you need to place the data into a table. You can use the following code to create the table, insert the data, and parse the TextData column:
SELECT \[EventClass\],\[TextData\],\[DatabaseID\], \[Duration\],\[Reads\],\[Writes\],\[CPU\], dbo.SQL_Signature(\[TextData\],1000) AS \[SQLSignature\], CAST(0 as INT) AS \[HashCode\] INTO YourTable FROM ::FN_TRACE_GETTABLE('C:\YourTraceFile. trc', DEFAULT)
Next, to make the aggregations easier, compute a hashed value of the parsed text for each row. You might want to add an index on HashCode after the update:
UPDATE YourTable WITH (TABLOCK) SET \[HashCode\] = CHECKSUM(\[SQLSignature\])
Now, you can aggregate the data:
SELECT \[EventClass\],SUM(\[Duration\]) AS \[Total Duration\], SUM(\[Reads\]) AS \[Total Reads\], SUM(\[Writes\]) AS \[Total Writes\], SUM(\[CPU\]) AS \[Total CPU\], COUNT(*) AS \[Total Counts\], \[HashCode\], CAST(' AS NVARCHAR(4000)) AS \[SQLSignature\], CAST(' AS NVARCHAR(4000)) AS \[Sample_ Statement\] INTO AggTable FROM YourTable GROUP BY \[EventClass\],\[HashCode\]
Finally, add the query class (SQLSignature) and a sample statement for each class, and you’re ready to go.
UPDATE AggTable SET \[SQLSignature\] = (SELECT TOP 1 a.\[SQLSignature\] FROM YourTable AS a WHERE a.\[HashCode\] = AggTable.\[HashCode\]), \[Sample_Statement\] = (SELECT TOP 1 a.\[TextData\] FROM YourTable AS a WHERE a.\[HashCode\] = AggTable.\[HashCode\])
SQL SERVER MAGAZINE RESOURCES
“Generating Server-Side Traces,” InstantDoc ID 99940
“SQL Server Profiler or Server-Side Trace?” InstantDoc ID 99365
“Dissecting SQL Server’s Top Waits,” InstantDoc ID 98112
“Are Your SQL Server Statements Performing Well?” InstantDoc ID 97761
“9 Steps to an Automated Trace,” InstantDoc ID 43014
You can easily add further metrics (e.g., averages, max) to this simplified example as you see fit. Now, to find your top worst performing queries, you can select the aggregated results in any order you want— by Reads, Writes, CPU, or Duration. Also, take into account the number of times the query class was run. For example, if you execute a query once a day, and it has 1,000,000 reads, is that considered one of the top 10 queries you should be concerned with? It isn’t if you have other queries that use 5,000 reads each but are called once per second. Slice and dice it as you want, but if you tackle the top 10 offenders in your system, you’ll probably tackle 90 percent of the overall problems and get the biggest bang for your effort.