I'm trying to execute a 180,000-line query through Query Analyzer in SQL Server 2000 Service Pack 3 (SP3). The query, which Listing 1 shows, is as simple as you can get, but it never seems to do anything. I killed it off after 54 hours, and it hadn't updated any records!
The ControlCounter table has approximately 6 million records and its primary key is RefNo. I tried running the query in sections, but the time it took each section to complete didn't seem to correlate to the number of lines in the section. When I profiled a section containing 25,000 lines, nothing seemed to happen for the first hour and 40 minutes, then suddenly, all of the procedure calls fired. I'm guessing the delay is caused by parsing, but I'm surprised at the results.
The original code is supposed to be running through a Visual Basic 6.0 ADO connection. I'm assuming both Query Analyzer and ADO share the same database technology. The only difference between the Visual Basic and Query Analyzer code is that the Visual Basic version is wrapped in a transaction. Is there anything I can do to force SQL Server to accept what it's given? Or is there another way around this problem?
You've correctly guessed that running this query in batches is a good first troubleshooting step. Each line in the batch must be syntactically verified before execution, so grouping the query into smaller batches separated by GO statements should improve performance. Also, running the query in the Query Analyzer environment means you might need to change the Set nocount setting from the default OFF to ON to reduce client/server traffic.
In addition, I'd rewrite the procedure slightly, as Listing 2 shows. Then, to further improve performance, you can try the following tips:
- Consider ensuring that RefNo sorts the data input. This change ensures that the query processes the data in physical order.
- Think about using explicit transactions. Right now, every call is generating a BEGIN and COMMIT transaction, which is expensive. Try combining each smaller batch with a BEGIN or COMMIT transaction (I usually start with about 500 statements as a reasonable batch size). With this change, the recovery interval setting comes into play because it controls data-flushing to the data files (log writes are synchronous).
- Locking is also coming into play. Creating and tearing down each row lock for this query will create overhead. If you can, you might want to use page locks (by means of a hint) or even a table lock if no one but you works with the table, Remember that a COMMIT will release locks. Also, if you have constraints that reference this table or that this table references, consider the constraint relationships and make sure that adequate indexing is in place. Alternatively, you can disable the constraints before running the query and enable them again afterwards.
- If the data is originating outside of SQL Server, you can BULK INSERT the data from the source file into a new scratch table, then perform an UPDATE..FROM operation to apply the updates to the base table. This approach provides efficient loading of the new values and an atomic SQL statement to apply the changes.
Finally, consider using the sp_executesql stored procedure rather than the EXEC statement because in some situations, it's faster. However, because you're calling a stored procedure directly, a more generic solution is to monitor syscacheobjects to ensure that you aren't suffering from recompilation.