SQL Server 2005 introduces support for native table and index partitioning. My first two articles in this series covered creating partitions and querying metadata information related to partitions. Here I'll show you how to maintain partitioned tables and indexes when you need to add or remove partitions. Run Listing 1 to create a testdb database that has a partitioned Orders table. The code populates the Orders table with three years worth of data, creates a primary key, and a couple of indexes. Once you've run Listing 1 you'll be at the same point where I stopped last month.
Adding a partition to a partitioned table is an important partitioning maintenance activity. You must add a new partition periodically depending on your partitioning needs. For example, our partitioned Orders table partitions the data by year, so every year you need to add a new partition. You add a new partition by splitting an existing partition-typically the last. The last partition covers the range
The new partition will reside in a certain filegroup, so first you need to create new files+filegroup (unless you want the new partition to reside in an existing one). Second, you need to alter the partitioning scheme and specify which filegroup will be NEXTUSED, meaning the target filegroup for the new partition. Finally, you alter the partitioning function and request a split operation and specify the new boundary point.
Before I show you the T-SQL code that initiates the split, I'll point out an important design consideration that will make splits faster. If the last partition contains data, the split operation will scan all data in the existing partition to determine the target partition for each row. To ensure instant splits you simply make sure that the last partition is always empty, so no I/O will be involved in the split operation. For example, in our case we have data in the Orders table no later then 2003, but I created a partition for data >= 2004. Once you split this last partition, you can load 2004 data to the partition, while the last partition for data >= 2005 will remain empty. So next year, the split will still be instant. The code in Listing 2 adds the new partition. The code adds a filegroup with file(s) for data >= 2005, alters the partition scheme to mark the new filegroup as next used, then applies the split by altering the partition function adding new boundary 20050101.
If you need to load data into a new partition using a bulk load or INSERT operation, the fastest way is to load the data to an empty non-indexed staging table. The staging table must have the exact same structure as the target partitioned table and must reside on the same filegroup as the target partition. Once the data is loaded, you create indexes and a CHECK constraint verifying that only data in the target partition's date range is allowed. Then, you simply switch the staging table and the target partition and get rid of the staging table. The switch operation is instantaneous because it's only a metadata operation. No data physically moves from one place to another. Remember that both the source staging table and target partition must reside on the same filegroup (same physical location). In simple terms, you can think of the switch as exchanging pointers between the source and target. Switching is achieved with ALTER TABLE's new SWITCH option.
In the SWITCH command, you specify the source staging table's name, the target partitioned table's name, and the number of the target partition. If you're not sure about the number of the target partition, use the metadata queries I provided last month to check which partitions belong to the Orders table and what are the partition function's boundaries:
SELECT * FROM sys.partitions WHERE object_id = OBJECT_ID('Orders') AND index_idSELECT * FROM sys.partition_range_values WHERE function_id = (SELECT function_id FROM sys.partition_functions WHERE name = 'PF_RangeByYear')
In our case, the target partition for 2004's data load is partition number 4. Listing 3 loads the 2004 data to partition 4 by applying the steps I described earlier.
When you need to purge historic data, perform a switch operation similar to the one you performed to load data but in the opposite direction. Instead of altering the source staging table and switching it with the target partition, you alter the source partitioned table and switch the partition that has the historic data with the target staging table. Once you apply the switch, the staging table will contain the historic data and the source partition will be empty. You can archive the data and drop the staging table.
Everything I mentioned earlier about switching applies here as well, namely, the source and target partitions must reside in the same filegroup and the switch is an instantaneous operation because it's a metadata operation. Listing 4 purges the 2001 data, creates a staging table, switches Orders partition 1 (oldest) into the staging table Orders2001, then drops the staging table.
After purging historic data from multiple partitions, you end up with empty partitions, which are of no use. You can merge partitions to get rid of empty partitions. Merging partitions is self-explanatory-take two partitions and make them one. Technically, altering the partition function, and specifying which boundary point to get rid of achieve merging-the two partitions surrounding the boundary point become one. However, the merge activity isn't limited to empty partitions. If the two partitions to be merged are empty, the operation is instantaneous (no I/O involved). If the partitions aren't empty, data from one of the partitions is physically moved to the other (remember that each partition resides on a certain filegroup). In our Orders table we got rid of 2001's data, so partition 1 is now empty, but partition 2 contains 2002's data. If you merge partitions 1 and 2 the operation will take time. So it's worthwhile to keep the empty partition and only next year (after getting rid of 2002's data) merge the two empty partitions.
Imagine it's next year. Run Listing 5 to purge 2002's data. You can run the metadata queries I provided earlier to see that the first two partitions are now empty. Then, run the following ALTER PARTITION FUNCTION command to merge the two instantly:ALTER PARTITION FUNCTION PF_RangeByYear() MERGE RANGE ('20020101');
If you run the metadata queries again, you'll notice that you have only one partition instead of the two original ones and that all partition numbers shifted (2 becomes 1, 3 becomes 2, etc.).
Regular partitioning-related maintenance involves loading and purging data, and is achieved by splitting, merging, and switching partitions. As long as you follow the guidelines I described you can maximize the performance of such activities.