I didn't intend to write this article. Let me rephrase that: I did not set out to write about this subject when I sat down in front of my computer this morning. I ended up doing so as I was working on an article about poor uses for the DISTINCT keyword when I had an unexpected discovery in the sample code I was constructing for that subject. Consider the following two situations:
Option 1: Mis-Matched Data Types
Consider the two tables below that are used in a SQL metadata collection database—pay special attention to the Server column in each table (which happens to also serve as the primary key in each table):
You'll notice that in dbo.ServerList_SSIS, the table listing which servers to collect metadata against, it's typed as an nvarchar(256) column. In the table used to store collected instance metadata, dbo.SQL_Servers, it's typed as a varchar(100) column. If we were to run the following query to return information from dbo.SQL_Servers for any database that is marked for collection and is listed as a SQL 2000 instance we'd see the execution plan and I/O statistics that follow the code:
--Inconsistent Types --(Causing Explicit Conversion Between nvarchar(256) and varchar(100) SELECT SS_BAD.Server , SS_BAD.ProductVersion , SS_BAD.ProductLevel , SS_BAD.EngineEdition FROM lifeboat_BAD.dbo.sql_servers AS SS_BAD INNER JOIN lifeboat_BAD.dbo.serverlist_SSIS AS SSIS_BAD ON SS_BAD.Server = SSIS_BAD.Server WHERE SSIS_BAD.Connect = 1 AND SSIS_BAD.version = 8 ORDER BY SS_BAD.Server;
(8 row(s) affected) Table 'SQL_Servers'. Scan count 1, logical reads 41, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'ServerList_SSIS'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Option 2: Matching Data Types
By contrast what happens if we change the data type for the Server column in the dbo.Serverlist_SSIS table to match that of the Server column in dbo.SQL_Servers so that they're both matching as varchar(100)? I've staged a duplicate database and altered the data type in that table accordingly:
Now, if we run the same query using those tables we get a vastly different plan:
--Matching Data Types SELECT SS_GOOD.Server , SS_GOOD.ProductVersion , SS_GOOD.ProductLevel , SS_GOOD.EngineEdition FROM lifeboat_copy.dbo.sql_servers AS SS_GOOD INNER JOIN lifeboat_copy.dbo.serverlist_SSIS AS SSIS_GOOD ON SS_GOOD.Server = SSIS_GOOD.Server WHERE SSIS_GOOD.Connect = 1 AND SSIS_GOOD.version = 8 ORDER BY SS_GOOD.Server;
Not only did we convert the clustered index scan on dbo.SQL_Servers to a seek, but we also eliminated two sort operations because the results in each of operations touching the two tables brought back ordered results which remained ordered when running through the nested loop operation. The I/O statistics were also much better as a result, with a total of 22 logical reads versus 49—half the number of logical reads and then some:
(8 row(s) affected) Table 'SQL_Servers'. Scan count 0, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'ServerList_SSIS'. Scan count 1, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
The Importance of Planning Out Your Schema
Considering this is a small query, imagine the benefits on larger tables and result sets when your data typing is consistent. Decisions made at the beginning of the database development process are critical and long-lasting. Take the time to look at the type of data you're collecting and type it accordingly—then be consistent in those typing decisions.
We were lucky here—the data lent itself to changing type and alligning columns properly between objects without truncation.
When looking at performance issues it's not always about "bad code;" be sure to take a look at the structure of the database objects when tuning. You may be surprised what you find.