Recently I had a reader reach out to me directly to ask me a question regarding perhaps my favorite topic about Microsoft SQL server: Dynamic Management Objects.
The question was a simple one:
“What tuning decisions can be derived from sys.dm_db_index_usage_stats?”
While the question was simple the answer can run long and deep. That’s why I’ll be partitioning my response across a two part article here at SQL Server Pro. In part one we will look at the information returned in its raw form and scope; building to that from the simple SELECT * syntax and improving the query through elimination of unnecessary columns. We will look at what information is (and is not) returned by default and how you can tie in additional system views such as sys.indexes and sys.tables to fill in those holes.
In part two I will show you the many decisions that can be gleaned from the output of sys.dm_db_index_usage_stats and how we can alter the base query to make additional tuning decisions. That article will cover:
- Indexes that are not used to satisfy reads (and why that matters.)
- Identification of heaps (tables with no clustered index) and explain both why it’s typically “best-practice” to have a clustered index on a table and also when a clustered index is not warranted.
- Identify your most-utilized indexes at both the database and table level of granularity.
- Identify poor clustering key choices and possible alternatives.
- Compare read and write activity between duplicate indexes.
- Identify non-clustered indexes with included columns that don’t provide significant value compared to the overhead they incur.
- Identify indexes and tables with no activity (read nor write.)
The Indexing Dynamic Management Objects
Sys.dm_db_index_physical_stats is just one of the seven (current) dynamic Management objects:
There are additional “indexing” DMOs but those are associated with Full-Text Indexing and In-Memory OLTP (aka “Hekaton”) and are far outside the scope of this series. In future articles I will look at each of these – the last four focused on missing index metadata will be covered as a single topic as they’re not valuable individually.
Transitory Nature of Results from dm_db_index_usage_stats
The aptly-named dm_db_index_usage_stats DMV provides information regarding index usage (and by omission non-usage) for every index on every database on a SQL instance. Like most DMOs the information does not persist after a service restart so usage information needs to be consumed with a cautious eye. I always like to present or identify the amount of time that has passed since the metadata was flushed through one of the following code constructs:
Using tempdb create_date:
SELECT create_date , DATEDIFF(dd, create_date, GETDATE()) AS days_metadata FROM sys.databases WHERE name = 'tempdb';
Using session_id login_time:
SELECT login_time , DATEDIFF(dd, login_time, GETDATE()) AS days_metadata FROM sys.sysprocesses WHERE spid = 1;
Either one is going to get you within the same minute and since we’re discussing making production decisions on indexing choices based upon this timeframe if you’re results are measured in minutes then you may want to question your servers’ uptime requirements.
You’ll want to also have a good understanding of the business environment/rules these databases server against. Some indexes may only be put into use for periodic operations (monthly, quarterly, seasonally, or annually.) Making a decision to drop an index because it has not been utilized for the last 4 months only to find out it’s used annually during tax season may be a critical error. Then again it may not be. Depending on the scale of the index involved in terms of both row count and activity for reads v. writes it may be beneficial to drop the index when not used and re-create it prior to that periodic use cycle.
Persisting Results from sys.dm_db_index_usage_stats
This is one of those DMVs that I tend to also persist to a physical table where I’m dealing with environments with periodic cycles of activity that differ from daily activity. If this is the case you may wish to periodically persist your usage statistics to a physical table on a scheduled basis with a simple SELECT INTO with a date stamp. I’d recommend using the raw data out of sys.dm_db_index_usage_stats rather any of the variations shown through the rest of this article because you’ll vary this base query to fit specific tuning inquiries (as will be demonstrated below and in part two.)
Structure of sys.dm_db_index_usage_stats
The results from sys.dm_db_index_usage_stats are segregated by:
- user v. system activity
- reads v. writes
Reads are broken-down into the various types of read operations available for an index: seeks, scans, and lookups. Writes are presented as “updates” to the index.
Additionally you get activity information regarding when the last read or write operation occurred. I tend to not pay much attention to these date columns unless I’m observing an index with very low read or write usage so I can gauge just how stale the index is. It may also point to the periodic nature of an index as discussed above.
Transact-SQL for Calling sys.dm_db_index_usage_stats
The first step in refining our query against sys.dm_db_index_usage_stats is examining the raw query and what is returned in terms of columns that identify the individual indexes.
SELECT * FROM sys.dm_db_index_usage_stats;
This DMV is instance-scoped, meaning if left unfiltered it will return results for all indexes on the instance. Likely you will want to narrow the focus when doing performance analysis as well as identify the database, index, and objects involved by name. This is easily accomplished by joins to two system views:
Additionally I make use of the DB_NAME() system function for converting the database_id column into a recognizable name, DB_ID() system function for a reciprocal of that operation – converting a database name to its corresponding database_id. I also employ filtering through using a search predicate on the database_id. After all it’s rare you’ll want to examine index usage across all databases on the instance. I’ve replaced
SELECT DB_NAME(ixUS.database_id) AS database__name , SO.name AS object__name , SI.name AS index__name , ixUS.* FROM sys.dm_db_index_usage_stats AS ixUS INNER JOIN lifeboat.sys.objects AS SO ON SO.object_id = ixUS.object_id INNER JOIN lifeboat.sys.indexes AS SI ON SI.object_id = ixUS.object_id AND SI.index_id = ixUS.index_id WHERE ixUS.database_id = DB_ID('
') ORDER BY OBJECT_NAME(ixUS.object_id), ixUS.index_id;
Notice a few things here as we look at the first few columns from this iteration of the t-sql code:
We limited the scope to just a single database through the search predicate of ixUS.database_id and fully-qualifying the objects for sys.objects and sys.indexes. Additionally we now have recognizable naming for the objects in question. In the query text you’ll see I am committed to using fully-qualified naming. It’s absolutely vital to use fully-qualified naming or else you’ll get invalid results since object_id and index_id are not unique across the instance – only across the database.
The other option would have been to first set the context to the lifeboat database through the USE
As we evolve the script preparation we still are performing a SELECT * against sys.dm_db_index_usage_stats. In doing so we’re likely returning unnecessary columns. Before we pare those down I want to present you with the full column listing for this DMV:
I’m only interested in the columns pertaining to last_user|system_action when observing low-usage indexes after initial analysis. Therefore we’re going to exclude those columns now and revisit them in the next article in this series. Likewise I’m not concerned with how the SQL OS is tapping into these indexes so I exclude all of the system action based columns as well (system_seeks, system_scans, etc.) them as well. Continuing with the trimming of columns it’s safe to eliminate the database_id and object_id columns now that we’re using proper names for identification.
The index_id column is valuable to identify the index type (0 = heap | 1 = clustered | > 1 = non-clustered).
I want to look both at individual read operation types and the sum of those read operations as compared to write operations to support the core metric of reads v. writes. Our query starts to clean up nicely and now provides some valuable insights in terms of reads v. writes where most of our decisions derived from this DMV are focused (again using “lifeboat” as the substitution for the template parameter of
SELECT DB_NAME(ixUS.database_id) AS database__name , SO.name AS object__name , SI.name AS index__name , ixUS.index_id , ixUS.user_seeks , ixUS.user_scans , ixUS.user_lookups , (ixUS.user_seeks + ixUS.user_scans + ixUS.user_lookups) AS total_reads , ixUS.user_updates AS total_writes FROM sys.dm_db_index_usage_stats AS ixUS INNER JOIN lifeboat.sys.objects AS SO ON SO.object_id = ixUS.object_id INNER JOIN lifeboat.sys.indexes AS SI ON SI.object_id = ixUS.object_id AND SI.index_id = ixUS.index_id WHERE ixUS.database_id = DB_ID('
') ORDER BY OBJECT_NAME(ixUS.object_id), ixUS.index_id;
At this point you’ve now learned how to query this DMV, limit the scope of the results, and provide meaningful identification within the results. Just two more tweaks to this base query and we will be ready for an initial analysis of the results before we close the chapter on this first of two parts to this article.
Add Read:Write Statistics and Appropriate Sorting
Who likes extra math? Not I – and I was an applied math major in University at one point in time! That means I want to avoid dumping these results to Microsoft Excel, where I do much of my calculating, and provide a metric for a read:write ratio for index activity. I also want to look at indexes with a high amount of maintenance (writes and the burden of index rebuilds and reorganizations that come as a result) compared to the corresponding read benefits of indexes:
/* ------------------------------------ ORIGINAL ARTICLE CODE, COMMENTED OUT AND UPDATED PER COMMENTS BELOW ------------------------------------ SELECT DB_NAME(ixUS.database_id) AS database__name , SO.name AS object__name , SI.name AS index__name , ixUS.index_id , (ixUS.user_seeks + ixUS.user_scans + ixUS.user_lookups) / IIF(ixUS.user_updates = 0, 1, ixUS.user_updates) AS [r_per_w] , ixUS.user_seeks , ixUS.user_scans , ixUS.user_lookups , (ixUS.user_seeks + ixUS.user_scans + ixUS.user_lookups) AS total_reads , ixUS.user_updates AS total_writes FROM sys.dm_db_index_usage_stats AS ixUS INNER JOIN lifeboat.sys.objects AS SO ON SO.object_id = ixUS.object_id INNER JOIN lifeboat.sys.indexes AS SI ON SI.object_id = ixUS.object_id AND SI.index_id = ixUS.index_id WHERE ixUS.database_id = DB_ID('
') ORDER BY 5 , OBJECT_NAME(ixUS.object_id), ixUS.index_id; */ /* -------------------------------------- REVISED CODE TO REDUCE JOINS AND MAKE COMPATIBLE WITH SQL VERSION 2005- 2016 -------------------------------------- */ SELECT DB_NAME(ixUS.database_id) AS database__name , OBJECT_SCHEMA_NAME(SI.object_id, ixUS.database_id) AS schema__Name , OBJECT_NAME(SI.object_id, ixUS.database_id) AS object__name , SI.name AS index__name , ixUS.index_id , CASE ixUS.user_updates WHEN NULL THEN (ixUS.user_seeks + ixUS.user_scans + ixUS.user_lookups) WHEN 0 THEN (ixUS.user_seeks + ixUS.user_scans + ixUS.user_lookups) ELSE (ixUS.user_seeks + ixUS.user_scans + ixUS.user_lookups) / ixUS.user_updates END AS [r_per_w] , ixUS.user_seeks , ixUS.user_scans , ixUS.user_lookups , (ixUS.user_seeks + ixUS.user_scans + ixUS.user_lookups) AS total_reads , ixUS.user_updates AS total_writes FROM sys.dm_db_index_usage_stats AS ixUS INNER JOIN sys.indexes AS SI ON SI.object_id = ixUS.object_id AND SI.index_id = ixUS.index_id WHERE ixUS.database_id = DB_ID() ORDER BY [r_per_w] , OBJECT_NAME(ixUS.object_id, IxUS.database_id) , ixUS.index_id;
(Note that schema__name is not represented in results above since that was added in the revised script above.)
On a small scale we can’t really make any decisions. But here is a glimpse at results from a production environment where we can draw some immediate conclusions (database, objects, and index names changed or eliminated to protect the author):
This is just the top 22 results of the 60+ returned by the query made against the “FOO” database. What conclusions can we draw just from these results? Here is my standard process for analyzing the results on my initial pass:
- First thing I’m going to do is look at the top records where there is no read activity. Likely I’m going to run a secondary query looking at the last write operation on these tables to see how active they are but I’m not going to make a rash decision to drop any at this point.
- I look for activity on heaps (tables without clustered indexes). Likely a second query will be run to look at all non-clustered and heap activity on the table(s) involved to determine if there is a need for a clustered index as well as examine candidates for the clustering key. Heaps can be identified by the index__name value of NULL and also by the index_id value of 0.
- Next I’d start examining any tables that have a significant number of indexes to see if they can be pared down. Sometimes it helps to run this query a second time after changing the ORDER BY clause to be sorted first by object__name followed by index_id in order to examine the results for this metric.
- If looking for improved clustering key options I’m going to examine tables where the read activity on the clustering key seems low compared to updates. We will look at this scenario (and how I tweak the query to assist in this process) in the next article in this series.
Summary of Part One
To summarize: sys.dm_db_index_usage_stats is the main repository of index (and heap) usage metrics for your SQL Server databases. By querying this DMV you’re able to get usage statistics broken down by reads; separated into seeks, scans, and lookups; and by updates (writes). The data is only persisted between service restarts so it’s important to take that into consideration when evaluating usage in an environment where you may have processes that may occur on a less frequent schedule (think monthly or annually) yet incur maintenance on a more frequent schedule that would take your services offline.