Top Insights You Can Gain from the Usage Stats Dynamic Management View (Part 2) http://www.waynebarry.com/blog/wp-content/uploads/2013/02/Magnifying-Glass.jpg

Top Insights You Can Gain from the Usage Stats Dynamic Management View (Part 2)

A Review of Part 1

In part one of this article I laid out the base query for which to return valuable information out of the sys.dm_db_index_usage_stats Dynamic Management View.  In part two here I’ll now so how we can use the base query for addressing specific questions about index usage (or lack of) and what you may be able to discern from minor adjustments to that base query.

For those of you who want to jump right in without all the initial theory here is that base t-sql statement from Part 1

SELECT
 DB_NAME(ixUS.database_id) AS database__name
	, OBJECT_SCHEMA_NAME(SI.object_id, ixUS.database_id) AS schema__Name
	, OBJECT_NAME(SI.object_id, ixUS.database_id) AS object__name
	, SI.name AS index__name
	, ixUS.index_id
	, CASE ixUS.user_updates
		WHEN NULL THEN (ixUS.user_seeks + ixUS.user_scans + ixUS.user_lookups)
		WHEN 0 THEN (ixUS.user_seeks + ixUS.user_scans + ixUS.user_lookups)
		ELSE
			CAST((ixUS.user_seeks + ixUS.user_scans + ixUS.user_lookups) / (ixUS.user_updates * 1.0) AS DECIMAL(15,1))
		END AS [r_per_w]
	, ixUS.user_seeks
	, ixUS.user_scans
	, ixUS.user_lookups
	, (ixUS.user_seeks + ixUS.user_scans + ixUS.user_lookups) AS total_reads
	, ixUS.user_updates AS total_writes
 FROM sys.dm_db_index_usage_stats AS ixUS
	INNER JOIN sys.indexes AS SI
		ON SI.object_id = ixUS.object_id
			AND SI.index_id = ixUS.index_id
 WHERE ixUS.database_id = DB_ID()
 ORDER BY [r_per_w] DESC
	, total_writes
	, total_reads DESC
	, OBJECT_NAME(ixUS.object_id, IxUS.database_id)
	, ixUS.index_id;

To review, the sys.dm_db_index_usage_stats DMV provides usage information for all indexes and heaps for every database.  If there is no activity for a specific table or index then it will not surface in the results from sys.dm_db_index_usage_stats.  Information partitioned by user and system (aka internals) activity are returned and that activity is then broken down into reads: (scans, seeks, and lookups) and writes (as “updates”).  Furthermore the last time that specific action took place against the index is also tracked.  Now the caveat that is common to all discussions around Dynamic Management Objects:

The information obtained through DMOs are only as good as the cached metadata since the last restart of services (with limited exceptions.)

… and one more important caveat as it pertains to this particular DMV:

The usage information is cleared when an index is rebuilt. You’ll also need to take this into consideration when analyzing these figures.  Reorganizations howerver do NOT affect the output of sys.dm_db_index_usage_stats.

With the platform laid out let’s move into the questions we’re going to highlight in this part of the series.

What Indexes and Tables are Being Used

This would the simplest question to answer and requires no changes to the base query above.  In addition to the columns provided through sys.dm_db_index_usage_stats I’ve also added result set columns that consolidate the various read types and a computed column to provide a metric for reads compared to writes (r_per_w).  What we want to see from our indexes is considerably more reads than writes.  This tells us that our indexes are being put to good use and the overhead of maintaining that index through writes (expensive operations of clustered/non-clustered inserts, updates, and deletes) are worth it.  Indexes should exist for one purpose only: to make reads faster.  If you’re not seeing evidence of a high read-to-write metric in any of these queries and the activity collected over the time since the last restart of services when the cached DMO metadata was flushed is a good representation of your typical transaction load then you need to seriously consider removing the index.

This question can be answered in two levels of granularity: index specific as well as table rollup.  We care about the index granularity for obvious reasons – specifically whether or not to keep the index or possibly consolidate similar indexes into a single index.  When it comes to the table-level granularity it pertains to whether the table should even be in the database to begin.  It could be indicative of changes in usage patterns that may occur over time or changes in your product that have since made certain parts of the database schema obsolete and therefore marked for archive or removal.

Question 1: Which indexes are being used in my database and give me that information with those most used for read activity first.

SELECT
 DB_NAME(ixUS.database_id) AS database__name
	, OBJECT_SCHEMA_NAME(SI.object_id, ixUS.database_id) AS schema__Name
	, OBJECT_NAME(SI.object_id, ixUS.database_id) AS object__name
	, SI.name AS index__name
	, ixUS.index_id
	, CASE ixUS.user_updates
		WHEN NULL THEN (ixUS.user_seeks + ixUS.user_scans + ixUS.user_lookups)
		WHEN 0 THEN (ixUS.user_seeks + ixUS.user_scans + ixUS.user_lookups)
		ELSE
			CAST((ixUS.user_seeks + ixUS.user_scans + ixUS.user_lookups) / (ixUS.user_updates * 1.0) AS DECIMAL(15,1))
		END AS [r_per_w]
	, ixUS.user_seeks
	, ixUS.user_scans
	, ixUS.user_lookups
	, (ixUS.user_seeks + ixUS.user_scans + ixUS.user_lookups) AS total_reads
	, ixUS.user_updates AS total_writes
 FROM sys.dm_db_index_usage_stats AS ixUS
	INNER JOIN sys.indexes AS SI
		ON SI.object_id = ixUS.object_id
			AND SI.index_id = ixUS.index_id
 WHERE ixUS.database_id = DB_ID()
 ORDER BY [r_per_w] DESC
	, total_writes
	, total_reads DESC
	, OBJECT_NAME(ixUS.object_id, IxUS.database_id)
	, ixUS.index_id;

You’ll see that I’m going to be dumping all output so Excel for ease of manipulating the results for presentation sake.  I’ve also eliminated some of the median (aka “vanilla”) results as well so we can focus on the extremes.  I’d like you to pay attention to the r_per_w column (column F) above.  You’ll see that there is a wide disparity in values here. The higher values denote that these indexes are used more for satisfying reads in comparison to writes than those that follow as you go down the results.  Ultimately that is only telling to a certain point since you also need to factor in the number of writes and whether the benefit from reads warrants the burden of writes.  Take for instance rows 23 v. 24.  Sure there is a higher read-to-write ratio for row 23 but at the same time it’s enduring the burden of 10001 more writes just to satisfy 1966 reads. Is it really worth it?  The question would require an A|B test of what those queries utilizing that index on line 23 see in terms of performance difference between having and not having that index on the table – in this case a primary key clustered index.

What this query gives you: Identifying reads v. writes on indexes and heaps encountering activity.

 

Question 2: Which tables are being used in my database and give me that information with those most used for read activity first.

Here I am not so concerned about the types of reads performed as I am whether tables are in use or not and if in use how disproportionate the reads and write are.  This is accomplished through aggregation on database_id and object_id.  Since we’re using the GROUP BY clause we also need to convert some of our syntax to use aggregation-friendly functions like SUM() versus what we used previously:

SELECT DB_NAME(ixUS.database_id) AS database__name
	, OBJECT_SCHEMA_NAME(ixUS.object_id, ixUS.database_id) AS schema__Name
	, OBJECT_NAME(ixUS.object_id, ixUS.database_id) AS object__name
	, CASE SUM(ixUS.user_updates)
		WHEN 0 THEN (SUM(ixUS.user_seeks) + SUM(ixUS.user_scans) + SUM(ixUS.user_lookups))
		ELSE CAST((
					SUM(ixUS.user_seeks) 
	+ SUM(ixUS.user_scans) 
	+ SUM(ixUS.user_lookups)
					) / (SUM(ixUS.user_updates) * 1.0) AS DECIMAL(12,1))
	END AS [r_per_w]
	, SUM(ixUS.user_seeks) + SUM(ixUS.user_scans) + SUM(ixUS.user_lookups) AS total_reads
	, SUM(ixUS.user_updates) AS total_writes
FROM sys.dm_db_index_usage_stats ixUS
WHERE ixUS.database_id = DB_ID()
GROUP BY ixUS.database_id
	, ixUS.object_id
ORDER BY [r_per_w] DESC
	, total_writes
	, total_reads DESC
	, OBJECT_NAME(ixUS.object_id, IxUS.database_id);

I’ve hidden the database column so I can provide a better display experience of the results. 

What this query gives you: Identifying reads v. writes at the table level which will perhaps help in determining not only whether objects are being queried or written to but also may provide some metadata towards balancing IO by splitting up IO load between file groups and separate disks.

 

What Tables and Indexes are Not in Use

Question 3: Which indexes are encountering no activity: read nor write.

Earlier in this article I mentioned that only those indexes and heaps encountering activity have records returned from sys.dm_db_index_usage_stats.  Therefore using the base structure of the query we’d never see results for where reads and writes are 0.  I build a common table expression (CTE) from the sys.dm_db_index_usage_stats DMV first to identify any activity. I then use that in-lieu of a correlated subquery in the next step.  That next step is simply a LEFT JOIN from the system view for indexes (sys.indexes) back to the CTE. I use the construct of picking a required column on the right-side-of-the-join’s table where that value is NULL to return the indexes/heaps where there is no recorded activity; in this case object_id.

WITH Used AS
	(
	SELECT ixU.object_id
		, ixU.index_id
	FROM sys.dm_db_index_usage_stats ixU 
	WHERE ixU.database_id = DB_ID()
	GROUP BY --ixU.database_id,
		ixU.object_id
		, ixU.index_id
	)

SELECT OBJECT_SCHEMA_NAME(I.object_id, DB_ID()) AS schema__name
	, OBJECT_NAME(I.object_id) AS table__name
	, I.index_id
	, COALESCE(I.name, 'Heap - No Index') AS index__name
FROM sys.indexes I 
	INNER JOIN sys.tables T ON I.object_id = T.object_id
	LEFT JOIN Used ON I.object_id = Used.object_id
		AND I.index_id = I.index_id
WHERE Used.object_id IS NULL
	AND OBJECT_NAME(I.object_id) <> 'sysdiagrams'
ORDER BY schema__name
	, table__name
	, I.index_id;

Additionally I added clarifying code (in the way of a COALLESCE() function to remove “NULL” for heaps and replacing that with the “Heap – No Index” phrasing.  This is completely up to the DBA in terms of preference however. (Results abbreviated for sake of readability.)

What this query gives you: Identifying indexes with no activity since the last time services were restarted or the index(s) were rebuilt.

Question 4: Which tables are encountering no activity: read nor write.

First of all, why do I care?  This query – really just a roll-up of the one above – provides me with insight into how the database’s use patterns may have shifted over time.  I support a very large SQL Server implementation and we’re employing continuous integration to deploy changes on a daily basis into production.  Over time we implement new features and move on from earlier product pathways in small steps.  Over time those small steps take us down new paths and tables may atrophy and go unused forever.  The last thing I, as a DBA, want to do is support unused tables that may possibly add gigabytes or terabytes of disk consumption for not just the live database but also all those iterations of backups we may keep in a variety of places.  We would also feel the impact of those “dead tables” when recovering databases or implementing any of a variety of HA/DR architectures. 

No, what I want to do is identify object that are not in use – monitor that state for a period of time – then script out all the referential integrity code in place, remove those constraints, then migrate these unused tables and their data to a separate database to keep online for a certain amount of time before eventually backing it up to a long term platform of your choice before either deleting or placing into a SQL instance you target for archival databases.

To accomplish this I tweak the previous query to roll the results up to the object level.  I have to make a few other modifications to remove any reference to sys.indexes and replacing some of the system functions to use sys.tables as their source instead of sys.indexes also:

WITH Used AS
	(
	SELECT ixU.object_id
	FROM sys.dm_db_index_usage_stats ixU 
	WHERE ixU.database_id = DB_ID()
	GROUP BY --ixU.database_id,
		ixU.object_id
		--, ixU.index_id
	)

SELECT OBJECT_SCHEMA_NAME(T.object_id, DB_ID()) AS schema__name
	, T.name AS table__name
FROM sys.tables T 
	LEFT JOIN Used ON T.object_id = Used.object_id
WHERE Used.object_id IS NULL
	AND T.name <> 'sysdiagrams'
ORDER BY schema__name
	, table__name;

What this query gives you: Identifying tables with no activity since the last time services were restarted or the index(s) were rebuilt for possible archiving, purging from your database to conserve space and reduce possible restore time in case of a disaster.

Conclusion

The sys.dm_db_index_usage_stats DMV on it’s own can provide answers to many critical questions surrounding how your indexing strategy is being put to use as well as to identify indexes and objects that are no longer employed through your current access patterns.  In and of itself these are valuable things to know about the databases you support.  As you’ll see in the next part of this series we can join the results here to additional Dynamic Management Object metadata to gain even further insight into these results.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish