Query performance tuning is key to ensuring that SQL Server database applications are running within acceptable performance metrics. The beauty of SQL Server is that it has everything you need about the most recently run queries stored in the procedure cache.
Performance tuning is half science and half art form. The science half comes from looking at the execution plans and seeing where the problems lie. The art half comes from being able to identify the indexes that need to be created by looking at the queries. The science half of the process is what I'll discuss. Specifically, I'll discuss how to find and eliminate unwanted parallelized queries, when to add filtered indexes, and when to compress data.
Parallel Plans Must Die
You should first look for parallel execution plans in the procedure cache. Typically, parallel plans indicate that parallelized queries are taking large amounts of CPU, disk resources, or both, which usually doesn't happen within an OLTP application. In a typical OLTP application, there should be almost no parallel execution plans unless reports are being run against the OLTP database. The easiest way to query for parallel execution plans is with a query similar to that shown in Listing 1 (below).
The output from this query will include the execution plan, the text of the statement that's being parallelized, and the cost of the parallel execution plan. If you click the execution plan XML in the query result set, you'll see the graphical execution plan. By looking at the execution plans for the most expensive queries, you can begin looking for the indexes to add to the database. Some of the operations that you want to correct first include Key Lookups, Clustered Index Scans, and Table Scans.
Key Lookups are usually the easiest problem to resolve. That's because a Key Lookup occurs when an existing nonclustered index is simply missing the columns being returned as a part of the key or bookmark lookup. For example, looking at the portion of the execution plan shown in Figure 1, you can see that the Document.IX_1245364_QUERY index is missing one or more columns.
By hovering the mouse pointer over the Key Lookup, you can quickly see which columns are being returned through it instead of the Index Seek, which should be returning them. As you can see in Figure 2, the Key Lookup is using the Document.PK_Document object to find the columns. The Output List, also shown in Figure 2, reveals that the Key Lookup is returning the ParentArtifactID_D column. The Seek Predicates section shows how the Key Lookup is identifying the values that it needs to find. In this case, the table's primary key column is also the clustering key, which is being used by the Key Lookup to find the rows needed in the clustered index.
By modifying the Document.IX_124564_QUERY index to include the ParentArtifactID_D column, you can remove the Key Lookup from the operation, saving the query a potentially large amount of logical I/O, physical I/O, or both. If you look at the rest of the output information for the Key Lookup, you'll see that it's being executed 217 times, as Figure 3 shows. This will translate into at least 434 I/O operations being performed against the database. Although they might be logical I/O operations (and therefore coming directly from the buffer pool), there's the possibility that they could be physical I/O operations coming from the physical disk, which would cause an even greater performance problem.
Unfortunately, the execution plan doesn't contain the actual number of I/O operations generated by the query. The only way you can accurately determine that number is to run the query against the database with the SET STATISTICS IO setting set to ON. Each execution will create at least two I/O operations because the Key Lookup will need to navigate the root page of the clustered index's B-Tree to get to the data page. In reality, the B-Tree for this table is probably more than two levels deep. If the B-Tree is:
- Two levels deep, the Key Lookup would generate at least 615 I/O operations.
- Three levels deep, the Key Lookup would generate at least 868 I/O operations.
- Four levels deep, the Key Lookup would generate at least 1,085 I/O operations.
- Five levels deep, the Key Lookup would generate at least 1,302 I/O operations.
As you can see, as the B-Tree gets larger, the number of I/O operations being generated increases quite quickly, causing the Key Lookup operation to run slower.
Something that you can't see in this execution plan is that this Key Lookup isn't the only one in the query that will be resolved by modifying the Document.IX_124564_QUERY index to include the ParentArtifactID_D column. There's another Key Lookup against another part of the query. It's being executed 508 times and returning the same column each time. So, by fixing the one Key Lookup, you're fixing the second Key Lookup automatically, with no additional cost to the database engine.
Clustered Index Scans and Table Scans
Clustered Index Scans and Table Scans are effectively the same thing. The main difference between them is that if there's no clustered index, the Table Scan will be used, whereas if there's a clustered index, the Clustered Index Scan will be used. In either case, unless the table is very small (a few hundred rows at most), you want to get rid of these scans. Depending on how large the table is and how expensive the other operators within the query plan are, the Clustered Index Scan might have a low percentage cost or high percentage cost. In the case of this sample query, the Clustered Index Scan has a cost of 0 percent as it relates to the rest of the query, as shown in Figure 4. However, the number of I/O operations being generated might still be quite high.
If you look at the specific statistics and the column information for the Clustered Index Scan in Figure 5, you can easily see what lookup is being performed. The predicate used to search the clustered index is BatchSetArtifactID = 10288078. The Output List shows that the ArtifactID column is being returned for use within the query.
In this case, the Clustered Index Scan is pretty easy to resolve by creating a small nonclustered index, such as:
CREATE INDEX mrdenny_BatchSetArtifactID (BatchSetArtifactID) INCLUDE (ArtifactID) WITH (FILLFACTOR=70, ONLINE=ON)
Because the Output List in Figure 5 includes the clustering key (ArtifactID) for output, technically speaking the clustering key doesn't need to be an included column. This is due to the fact that SQL Server automatically includes the clustering key in all nonclustered indexes. However, for easy code readability, it's recommended that the clustering key be included in the CREATE INDEX statement.
Adding filtered indexes to a query can greatly improve query performance as well as reduce the amount of I/O being generated. However, creating filtered indexes requires detailed knowledge of the queries that will be sent to the database engine. To properly design the filtered indexes, you need to know all the possible values that will be statically searched against. For example, suppose you have an authentication table designed like this:
CREATE TABLE dbo.Users (UserId INT IDENTITY(1,1) PRIMARY KEY, UserName varchar(40) NOT NULL, Password nvarchar(40) NOT NULL, Active BIT NOT NULL) GO
The stored procedure that will be run against this table is:
CREATE PROCEDURE dbo.VerifyAuthentication @UserName varchar(40), @Password nvarchar(40) AS SELECT UserId FROM dbo.Users WHERE UserName = @UserName AND Password = @Password AND Active = 1 GO
Adding the following index would speed up the authentication process:
CREATE INDEX SpeedUpAuthentication ON dbo.Users (UserName, Password, Active) INCLUDE (UserId) GO
Although the performance of the queries using the dbo.VerifyAuthentication stored procedure would probably be acceptable, the index contains both deactivated and active accounts. If you remove the rows that have an Active value of 0 from the index, the index will contain fewer records, have a smaller B-Tree, and therefore take less memory from the buffer pool. You can also make the index narrower by removing the Active column from the indexed columns because you don't need it anymore. The filtered index looks like:
CREATE INDEX SpeedUpAuthentication ON dbo.Users (UserName, Password) INCLUDE (UserId) WHERE (Active = 1) GO
Another technique to save space and thereby improve performance is to remove non-nullable bit fields from indexes. Although removing a non-nullable bit field in a narrow index doesn't save a massive amount of space, the technique is sound and becomes even more useful on wider columns. For example, suppose you have two tables -- dbo.StudentGrades and dbo.Semester -- created with the code in Listing 2 (below). You want to query only the SemesterId column and only for the most recent SemesterId value.
When you query the dbo.StudentGrades table using the dbo.QueryGrades stored procedure shown in Listing 3 (below), you want to keep the index as small as possible, because the college has more than 100 years of student data in the table. (A business requirement mandates that data never be removed from tables.) The most efficient index that can be created to assist this query includes placing a filter on the index and simply changing the filter every semester.
Introduced in SQL Server 2008, data compression can improve query performance. However, it can be hard to see when data compression will help query performance from the execution plan alone. The execution plan is needed to find the objects that have the highest I/O cost. From there, you can use the system stored procedure sp_estimate_data_compression_savings to figure out approximately how much space will be saved. Saving space by compressing the index or heap table will also save the number of I/O operations needed to run T-SQL statements. Although the compression will slightly increase the amount of CPU power used on the server, this is typically acceptable because SQL Server machines usually become I/O-bound and memory-bound long before they become CPU-bound.
SQL Server can compress data at the row level or the page level. Let's look at each type of table compression.
Row compression. The first thing to know about row compression is that it isn't actually what it sounds like. Row compression is useful only when used on rows that have columns with numeric, float, or decimal data types -- and the data types must be defined for very wide values but the values being stored are very narrow. For example, the following table would make good use of row compression:
CREATE TABLE dbo.StoredValues (StudentId INT, ClassId INT, Grade numeric(18,9))
Row compression works by storing only the bits that are actually needed, stripping off the rest. For example, in the case of the dbo.StoredValues table, the Grade column requires 18 bytes of storage for each row in the table. This allows the storage of a very large value. If smaller values are put into the column, the same 18 bytes of storage is needed. So, if you insert a row into the table with a value of 3, the column still takes up 18 bytes of space. When you turn on row compression, only the few bits needed to store the number 3 are stored and the other bits are ignored. Row compression has no effect on character or Unicode data.
Page compression. In SQL Server, page compression is more like the traditional compression that has been around for years outside of the database engine. Page compression takes the existing data within the data page and compresses the rows to allow more rows to be placed within the data page. For an in-depth look at the internals of page-level data compression, see my blog post "Not All Data Compression is Created Equal ."
Page compression takes the text data within the rows stored on the page and looks for compressible data within the page. This frees up data within the page, allowing more rows to be written to the data page. There are a few things to remember about page compression:
- Page compression only compresses full pages. As new pages are allocated and data is written to them, these pages aren't compressed. After the pages are filled, the page is compressed.
- Page compression doesn't compress any data that's stored out-of-row. This includes any varchar(max), nvarchar(max), and varbinary(max) data that overflows from the row to its own pages. Although the data stored in-row is compressed, the data stored out-of-row isn't compressed.
A Variety of Tuning Techniques Available
As you've seen, there are a variety of ways to tune queries within the SQL Server database, including adding indexes to fix Key Lookups, Clustered Index Scans, and Table Scans. You can also use filtered indexes when appropriate. Finally, you can use the sp_estimate_data_compression_savings stored procedure to find data that can be compressed.
Listing 1: Query That Returns the Parallel Plans from the Execution Plan Cache
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') SELECT query_plan AS CompleteQueryPlan, n.value('(@StatementText)', 'VARCHAR(4000)') AS StatementText, n.value('(@StatementSubTreeCost)', 'VARCHAR(128)') AS StatementSubTreeCost, dm_ecp.usecounts FROM sys.dm_exec_cached_plans AS dm_ecp CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS dm_eqp CROSS APPLY query_plan.nodes ('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qp(n) WHERE n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1 GO Listing 2: Code to Create the dbo.StudentGrades and dbo.Semester Tables
CREATE TABLE dbo.StudentGrades (StudentId BIGINT, ClassId BIGINT, SemesterId BIGINT, Grade numeric(5,2) CONSTRAINT PK_StudentGrades PRIMARY KEY (StrudentId, ClassId, SemesterId)) CREATE TABLE dbo.Semester (SemesterId BIGINT IDENTITY(1,1) PRIMARY KEY, Year INT, Semester INT, Completed BIT)
Listing 3: The dbo.QueryGrades Stored Procedure
CREATE PROCEDURE dbo.QueryGrades @StudentId BIGINT AS DECLARE @SemesterId BIGINT SELECT TOP (1) @SemesterId = SemesterId FROM dbo.Semester WHERE Completed = 1 ORDER BY Year DESC, Semester DESC SELECT ClassId, Grade FROM dbo.StudentGrades WHERE StudentId = @StudentId AND SemesterId = @SemesterId GO