Question: I've been reading your Q&As on partitioned tables and views and I'm wondering how filtered indexes fit in. Couldn't you get some of the same benefits of partitioning by using filtered indexes?
Answer: This is a really interesting question, even a bit challenging to answer. Filtered indexes are an incredibly powerful feature (one of my favorites) so I don't want to dissuade you from using them. However, just like many things in SQL Server - it depends.
A table can be partitioned using the partitioned tables feature. A table can be broken into multiple tables and then unioned (UNION ALL) in a view using the "partitioned views" feature. And, any of these base tables - either a partitioned table or a table that makes up a partitioned view - can have filtered indexes. For me, filtered indexes supplement your partitioning strategy they don't replace it.
What could/should you use filtered indexes for?
- Active vs. inactive: yes/maybe!
Smaller sets of data that you query and/or manipulate differently are great to filter. For example, imagine you have active/inactive data within a set. If you were to create a filtered index where active = 1 then only those rows would be present in the index. By having a smaller set in the index you have more accurate statistics (remember, you'll need to make sure that [filtered stats get updated regularly] ) and you'll take up less storage and have less maintenance. Having said all of this, if the inactive data is really going to become obsolete then partitioning might be a better option as you might be able to complete eliminate maintenance for the obsolete data.
- Different sets: yes (this is my favorite reason)!
Imagine salespeople working with their sales data. The salesperson value of 1 is actually a special value and it represents over-the-counter (OTC) sales. For the values 2 through 27, these salespeople are specific folks in the company. There might be queries for OTC sales that are different than what the salespeople query. You could create a filtered index WHERE Salesperson = 1. For a query that salespeople run you could create a filtered index WHERE Salesperson > 1. And, for any specific queries that an individual salesperson wants (and ONLY they want), you can create a filtered index WHERE Salesperson = 6. But, for things like sales, I also (generally) recommend this in addition to partitioning. So, you might end up with one table per year where each might be partitioned by month. Then, each of these partitioned tables will have filtered indexes. You can create on filtered index over the entire set (the entire year) and fast-switching is still allowed.
- Partitioning: maybe!
Some folks have mentioned that filtered indexes can replace partitioning by allowing you to create an index WHERE SalesDate >= ‘20120101’ AND SalesDate < ‘20120201’ to represent January. You could then do the same for February, March, April, etc. However, they are not exactly the same.
With filtered indexes each index only applies to a subset of the base table. As a result, fast switching is not allowed.
More specifically, this filtered index will not impact fast switching:
CREATE INDEX ActiveItems ON Table (col1, col2) WHERE Status = 1 ON PartitionScheme(PartitionColumn)
However, this filtered index will not allow fast switching:
CREATE INDEX JanuaryItems ON Table (col1, col2) WHERE SalesDate >= '20110101' AND SalesDate < '20110201'
You could create this second index on the PartitionScheme but then you’d have to create this same index for every new partition. Yes, it’s possible. However, there’s a bigger problem. The problem is tied to a concept known as interval subsumption. Interval subsumption is not a concern with partitioned views; however, it is a problem with filtered indexes. To explain this I’ll use two scenarios.
Scenario 1 – Single large table with filtered indexes
Scenario 2 – Partitioned view (individual base tables unioned together in a view: SalesView)
Now, what happens with queries?
Both of the following queries benefit from these indexes:
SELECT * FROM Sales WHERE SalesDate BETWEEN ‘20120110’ AND ‘20120120’
This query uses the filtered index.
SELECT * FROM SalesView WHERE SalesDate BETWEEN ‘20120110’ AND ‘20120120’
This query eliminates accessing any of the other tables and just uses JanSalesTable. Additionally, SQL Server leverages the index on this base table.
However, the following WHERE clause presents a problem for the filtered case:
SELECT * FROM Sales WHERE SalesDate BETWEEN ‘20120125’ AND ‘20120205’
The Sales table does NOT leverage the filtered index because the query’s predicate is not a subset of a SINGLE filtered index. In this case, SQL Server cannot find a SINGLE filtered index that “contains” the entire set. They could make this work but it is difficult. There is NO guarantee that the combination of the filtered indexes contains every value. Very possibly, there could be gaps. As a result, they cannot use the combination of the two indexes. Instead, they will use none.
SELECT * FROM SalesView WHERE SalesDate BETWEEN ‘20120125’ AND ‘20120205
SQL Server must push these predicates down to the base tables and as a result, CAN use the index(es) on the base tables.
So, by truly breaking your data sets down into separate tables, you are more likely to get the indexes to be used!
Keep those questions coming! I’m having great fun with this Q&A series!!
- Q1: Partitioned Tables v. Partitioned Views–Why are they even still around?
- Q2: Solutions to VLT concerns around statistics and maintenance!
- Q3: Did SQL Server eliminate any partitions?