I have two servers, Production and Development, that are identical in hardware and setup, although the execution plans are different. A query that I run on the Production server takes 3 seconds to complete; the same query on the Development server takes 66 seconds! I've tried everything I can think of to troubleshoot the slowdown, including using DBCC INDEXDEFRAG, DBCC DBREINDEX, and DBCC SHOWCONTIG. Please advise.
I often see questions like this in the newsgroups and hear them from my customers. It's impossible to provide a precise answer without being able to perform some live detective work on the system. But in the spirit of helping you learn to fish rather than serving you a plate of fish sticks, I'll walk you through a few of the areas that I'd investigate if I were researching the problem myself.
First, when someone tells you that everything is the same, it's almost always safe to assume that something is different. Troubleshooting performance problems often requires creative thinking more than it requires specific knowledge of SQL Server. I don't know how big the databases are in this case, but I recommend using BACKUP and RESTORE if possible to move the Production database to the Development server and the Development database to the Production server, then run the queries to see what happens. If the query on the "slow" Development database is suddenly fast when moved to the Production server, we might assume that some difference at the hardware level accounts for the difference in performance between the servers.
Subtle hardware differences might be likely if we were getting the same execution plan but experiencing widely different performance between the two servers. But in this case, performance is almost certainly slower on the Development server because the execution plans are different. So the real question isn't why performance differs, but why do the execution plans differ? If we could force the Development server to use the same plan as the Production server and Development's performance is then comparable to Production's performance, we'd have learned that the execution plan is likely causing the slowdown. Unfortunately, a full discussion of query hints that would force the same plan on both servers is beyond the scope of this column, but answering the following questions will get you started in understanding why the plans might differ:
- Are the numbers of rows in the key tables the same or comparable? Plans can be very different if data sizes vary.
- Do both servers have the same service pack? It's common for different service packs to have subtle optimizer differences.The latest service pack often produces the best performance, but that's certainly not always the case.
- Is the distribution of data different, even if the tables are the same size? For example, a query can behave quite differently if you're running it against a Last Name field where many of the last names are the same versus where there are many different names. Data cardinality can have a big effect on the plan.
- Is one plan parallel and the other plan serial? Seeing a plan that uses multiple CPUs versus a plan that uses one CPU can be a big clue to solving your problem. Don't assume that the parallel plan is always the best.
- Are the indexes on all involved tables the same?
- Are statistics—whether manually created or autocreated—the same on all relevant tables?
- How up-to-date are the statistics, and what are the default sampling rates on the statistics on the relevant tables in each database?
Most often, performance anomalies such as the one you're seeing boil down to subtle differences in indexes or statistics, assuming that the number of rows are comparable across the tables. It's important that you check both databases for differences, either manually or by using one of the schema comparison tools on the market.
I know that this discussion doesn't solve the problem directly, but I hope it helps you do your own troubleshooting.