Running Profiler on a Busy Server

I tried to use SQL Server Profiler to tune a busy SQL Server database for one of my customers. The 8GB database was running on an 8-CPU server running at 70 percent to 80 percent CPU utilization. The server promptly became nonresponsive, and I had to restart the server. Microsoft Developer Network (MSDN) support told me that Profiler can place a heavy burden on busy SQL Servers. To tune the database, I need the SQL text and reads that the queries generate. How do I gather this information from a busy, multiprocessor SQL Server without running the server into the ground?

I consider Profiler the most important tool in my SQL Server tuning arsenal, and I use it constructively almost every day. In the past few years, I've used Profiler at more than 100 customer sites, including on servers that perform tens of thousands of transactions per second. I've never been in a situation in which I couldn't use Profiler as long as I was careful. But like any powerful tool, Profiler can cause problems.

First, understand that Profiler is nothing more than a GUI front end that calls a series of functions and procedures collectively called SQL Trace. In most cases, using Profiler to capture system information is more expensive than calling SQL Trace procedures directly. To make sure you don't miss any events on a busy server, you need to select the server processes trace data option from the Profiler GUI. However, this option starts two separate traces that capture the same set of events. SQL Server sends one event stream to the Profiler GUI and the other event stream to a local file on the server. Worried about the effect of running a trace? If so, you certainly don't want to run two traces!

Another Profiler option lets you write directly to a SQL Server table. But don't use this option if you're concerned about performance. SQL Server will write to the table each event that Profiler captures and will also record each event in the transaction log, creating a huge load on a busy server. Instead, you should capture event data to a local file, then use the fn_trace_gettable() function to load the trace data into a table for analysis. Alternatively, you can write the trace data to a network drive, but I can usually write trace data to a local drive without creating additional I/O bottlenecks.

I rarely run Profiler on a busy production server because of the performance effects. Instead, I use a series of custom procedures that directly call SQL Trace procedures to stop, start, and control traces. Although I don't have space to explain my custom procedures in this answer, learning how to write your own procedures to control traces is fairly simple. You can use Profiler to understand how tracing works. First, start one instance of Profiler to remove the default filter that prevents the capture of Profiler activity. Then, start another instance of Profiler and create the trace you want to run manually. The first instance of Profiler will capture the procedures the second instance uses to perform your trace, and you'll have a nice model for creating your own trace procedures. You can also use the Script Trace option directly from Profiler to script running traces.

Traces, whether you run them through Profiler or through direct calls to SQL Trace, rarely degrade performance unless they grow too big too quickly. I've seen trace files that grew 20MB per second, which is likely to hurt performance. However, I often run traces on extremely busy servers without a significant performance penalty because I keep the growth rate of my trace files to a manageable level. You'll need to experiment to determine when traces are growing too big too fast for your particular hardware. I always specify a maximum file size when tracing on a busy server. If a trace starts to grow uncontrollably large and surpasses the maximum file size limit I set, the trace will stop. I find that 50MB is a safe maximum size; in my experience, if a trace is expensive enough to degrade performance, it will hit the 50MB limit within a few seconds. The proper maximum size for your situation will depend on your hardware and transaction volume.

Another way to control the trace file's size and reduce the potential for a trace-induced performance problem is to include only the events and data columns you need. Finding the proper mix of events and data columns is a complex subject; you can read more about selecting the best events and data columns for your needs in Kalen Delaney's "Tracking Down Event Clues," May 2001, InstantDoc ID 20159, and in my article "Working with Trace Filters," April 2003, InstantDoc ID 38040. You can also control the trace size by specifying a minimum CPU duration for transactions you want to capture. Most of your transactions will be fast, running in less than 20ms, so setting a minimum duration filter of 20ms would cut out a lot of the data you're writing to the trace file without losing many significant events.

My final word of advice about managing trace file size is to watch out for user-defined functions (UDFs), which can create massive amounts of trace data. If you have a SELECT statement that uses a UDF to return 10,000 rows and the UDF contains 10 statements, you just generated 100,000 trace events. And if 10 people run the SELECT statement at the same time, the UDF generates 1 million trace events. Often, you'll want to include trace filters to eliminate UDF activity once you're sure a particular UDF isn't contributing to the performance problem.

Comments

Plain text