Querying Partitions

Downloads
46207.zip

Native Partitioning is one of SQL Server 2005's most important enhancements for DBAs. This feature improves manageability, scalability, and performance of large tables and indexes. In this four-part series I've covered creating and maintaining partitions, as well as obtaining metadata information. Here I finish my partitioning series by covering querying partitioned tables and indexes. I'll discuss how querying partitions has improved compared to SQL Server 2000, more specifically, the improved partition elimination process, and the efficient collocated joins.

Partition Elimination

Partition elimination is the process that the optimizer applies to figure out which partitions are necessary to access and which aren't based on the filters you specify in your queries. In SQL Server 2000, the different tables belonging to a certain partitioned view don't have to be identical. As an example, each table might have a completely different set of indexes. Even though you typically create tables for a partitioned view with identical schemas and indexes, the optimizer can never take this for granted and must take each qualifying partition into consideration during the optimization process. By qualifying I mean that SQL Server 2000 performs partition elimination if you specify the partitioning column in your filter, but once the partitions included in the query filter (qualifying) are selected, each has to be optimized separately and be included in the execution plan. If you think that qualifying each partition separately is a problem, consider the following experience I had with a customer.

My customer partitioned a big table in SQL Server 2000 daily-meaning every day a new table was added to the partitioned view. From time to time, queries with filters over 20,000 characters long are submitted against the partitioned view. Such a query covers long periods of time, e.g., a whole year. More than 300 partitions qualify, and the whole 20,000-character filter must be applied to each separately. This means that the plan includes all 300+ partitions, duplicates a lot of data for each, and optimizes each separately. Such a plan can take several minutes to be generated. In some cases, this optimization attempt caused the server to stall. We ended up optimizing the queries and reducing the need for such lengthy filters by figuring out smarter, shorter ones. That's an example to emphasize the limitations of querying partitioned views in SQL Server 2000.

In SQL Server 2005, partitioning is native. You can't control individual partitions yourself; rather you deal with one unit that's the partitioned table or index. All individual partitions making a particular table are identical and have exactly the same schema, indexing, etc. Having identical schema to the partitions, the optimizer can logically deal with just one partition in the plan (e.g., determine access methods) and repeat the activity for all qualifying partitions after the partition elimination process. Run the code in Listing 1 to create and populate our partitioned Orders table. Then, run the following query and examine the execution plan that Figure 1 shows:

SELECT * FROM Orders WHERE orderdate >= '20030701' AND orderdate

The Constant Scan in our case represents the partition elimination process, i.e., determining which partitions meet the query's filter. This operator outputs the qualifying partition numbers. Then a nested-loops operator iterates through partition numbers that were picked and accesses the partitions that they represent. If you place your mouse pointer over the arrow between the constant-scan operator and the nested-loops operator, you'll see that one row is returned (meaning that one partition matched the query's filter). This makes sense because I requested data from only one year and I partitioned the Orders table by year. If you change the filter to span two years

WHERE orderdate >= '20031231' AND orderdate

You'll get the same plan, but the arrow will show two matching rows for two qualifying partitions. Now you should understand why the optimization process in SQL Server 2005 is simpler, takes less time, and results in a smaller plan.

Collocated Joins

Collocated joins are joins that are performed between individual partitions of the join's two inputs (tables or indexes) before the results of the joins are concatenated. In contrast, non-collocated joins are joins in which the different partitions of each join's input are first concatenated, then the concatenated result sets are joined. Collocated joins are efficient because individual partitioned indexes can be utilized efficiently for the join activity, based on access methods that rely on the order enforced by those indexes. To allow collocated joins, the two inputs must be partitioned the same way (same or equivalent partitioning function). Also, you must specify the partitioning column from both sides of the join in the join condition. If the meaning and benefits of collocated joins are not clear yet, bear with me while I demonstrate this with an example.

Listing 2 creates the partitioned OrderDetails table and populates it with data. I partitioned the OrderDetails tables and indexes the same way I partitioned the Orders table (partition per year) to allow collocated joins. Notice that to partition the OrderDetails table based on years, I had to denormalize it by including the orderdate column, populating it with the orderdate values of the corresponding orders. Now, run the following join and examine the execution plan that Figure 2 shows:

SELECT *
FROM Orders AS O
  JOIN OrderDetails AS OD
    ON O.orderid = OD.orderid;

Even though both tables and indexes are partitioned the same, the join isn't collocated because the partitioning column orderdate isn't specified in the join condition. Notice in the plan that a separate constant-scan operator performs partition elimination for each table separately, a nested-loops operator concatenates the results of the individual partitions of each table separately, then each result set is explicitly sorted, and finally a merge join is used to join the two sorted inputs. Depending on the size of the inputs, the optimizer might decide not to sort the concatenated inputs and use a hash join instead of merge. Next, make a small revision to the join query, matching also the orderdate column (the partitioning column) from both sides and examine the plan that Figure 3 shows:

SELECT *
FROM Orders AS O
  JOIN OrderDetails AS OD
    ON O.orderid = OD.orderid
    AND O.orderdate = OD.orderdate;

The plan shows a collocated join. Notice that only one constant scan performs partition elimination, figuring out the target partitions from both sides, then each pair of partitions is merge-joined efficiently, relying on the fact that the joined indexes are pre-sorted based on orderdate. Then, nested-loops operator concatenates the results of the joins. No need for explicit sorting here to support a merge join or for a hash join. As you can see, a collocated join is significantly more efficient than a non-collocated one, especially when good indexes for the join are in place.

In my previous articles about partitioning, I explained the concept of index-alignment (for details, please refer to "Native Partitioning Tables and Indexes" (InstantDoc ID 45153). Now you can understand why you might want to create non-aligned indexes. If two tables T1 and T2 are partitioned in a different way (e.g., T1 is partitioned by year, and T2 is partitioned by week), you might want to create a non-aligned covering index on one of them, based on the same partitioning as the other's to allow collocated joins.

Wrap Up

Querying partitions in SQL Server 2005 improves the partition-elimination process by using a significantly more simplified process than the one used in SQL Server 2000. The plan now takes less time to create and is smaller. Joins can be performed efficiently provided that the joined inputs are partitioned the same way and the partitioning column is used in the join condition. Meeting these requirements allows collocated joins where individual partitions from both inputs are joined before concatenating the results. Better partition elimination and collocated joins are two of the important partition-querying enhancements in SQL Server 2005. There are many operations that can be performed "per partition," such as grouping, filtering, data modifications, index creation and rebuilds, bulk inserts, and more. In general, the optimizer's goal is to perform as many operations as possible on a per-partition basis. As you probably realize, it's a whole new world, with a new set of tools, and a new vocabulary, that let you deal with existing problems more efficiently.

Comments

Plain text