Solutions to VLT concerns around statistics and maintenance!

Question: Partitioned tables were the new shiny feature in SQL Server 2005 – why are partitioned views even still available? Are there any benefits that they provide?

Answer: In the first post (Partitioned Tables v. Partitioned Views–Why are they even still around?) I raised some of the general questions and concerns that come with VLT (very large table). Today, I’ll start to tackle why partitioned views can be a fantastic choice for partitioning large sets – even for new design.

When I first posted, some folks emailed me and said that the most obvious reason to choose partitioned views is that they don’t require Enterprise Edition. While that’s a fantastic reason, it’s still not the only reason (or my favorite). The most important reason for me is that there are still a few concerns around VLT that even partitioned tables don’t solve. In this post, I’ll tackle two concerns with partitioned tables.

The first issue is related to statistics. Statistics always cover the entire set described by the index (or statistic). For the purpose of this discussion, I’m going to focus primarily on table-level statistics (meaning statistics that are NOT filtered). And, while you can use filtered statistics with partitioned tables – there are significant limits to fast-switching. As a result, they might solve one problem yet introduce another. So, why are table-level statistics an issue? Really, in concept, they’re not. But, remember, a statistic must be relatively small to be useful. With larger and larger tables (especially those that have a lot of uneven data distribution), it becomes more and more difficult for a statistic to be accurate.

Additionally, it also takes longer and longer for the statistic to get updated. There is a special trace flag available to change the invalidation threshold for statistics (so that they don’t wait as long to get updated) but there’s nothing that changes their accuracy (except filtered stats – which then eliminate fast-switching if the filtered statistic isn’t over the entire set). And, unfortunately there’s also an issue with regard to how/when filtered statistics get updated. Simply put, you’ll need to manage their updating in order to keep them more accurate. So, none of the solutions to make statistics for VLT more accurate really work.

However, what does solve the problem? Keeping your tables smaller and more manageable. Conceptually, this IS “partitioning” but partitioned tables don’t solve the problem. Instead, partitioning your data into “partitioned views” is exactly that solution. Instead of having 20 years’ worth of sales in one VLT, consider one table per year. These “yearly” tables could each be partitioned tables or just standalone, non-partitioned tables. Then, how do you query 20 tables? If you constrain the tables by date (possibly by ID as well if there’s a correlation) and then use a UNION ALL view to bring them together, then your queries can do partition elimination. As long as your queries supply either the date (or the ID, if the base tables are also constrained by ID) then the optimizer can compare your WHERE clause to the constraints that exist. If the tables have been “checked” against the constraint then the constraint is deemed trusted. Trusted constraints can be used by the optimizer to eliminate partitions. Tables without any relevant data will not be accessed. This is the definition of partition elimination. The end result for statistics is that each table’s table-level statistic will be more accurate because the data set over which it covers is smaller.

This brings me to the second issue – maintenance. While partitioned tables do offer some maintenance benefits (fast-switching for data loads and/or data removal) they do not offer online rebuilds at the partition level (in any release or edition – even SQL Server 2012). So, if you have a single, VLT of 20 years’ worth of data with only the most recent/current data being updated then you only need to defragment that last year. In a single partitioned table you’ll need to take the partition offline to rebuild it. If you have separated this into 20 individual years of data then you can rebuild the most current year (at the table-level) online. You can always do only reorganization of your data but a rebuild cannot be done at the partition level as an online operation. Online operations are only supported at the table-level.

So, for part 2 – those are two of my favorite reasons to consider partitioned views over partitioned tables! I’ll tackle a bit more around this scenario in part 3.

Keep those comments/emails coming!
kt

Comments

Plain text