Recently, I was tasked with maintaining indexes under a five nines model. I was told by my boss that both the blocking caused by reorganizing indexes and the table locking caused by offline index rebuilds are unacceptable. Because our SQL Server environment is made up of all SQL Server 2008 and 2005 Enterprise Edition systems, I decided to avoid these problems by performing online index rebuilds. I set about researching in hopes of finding an appropriate customizable solution. Ultimately, I wound up creating my own solution, which I’ll share in this article in hopes you find it as useful in your environment as it is in mine. I’ll show you how I created the job, how to implement the job, and the various job parameters.
Creating the Online Index Job
My starting requirements for the job were simple. I first needed to identify which indexes could be rebuilt online. A quick look at MSDN revealed certain guidelines. Large object (LOB) data types (e.g., image, ntext, text, varchar(max), nvarchar(max), varbinary(max), XML) must be created, rebuilt, or dropped offline. Nonunique, nonclustered indexes can be created online when the table contains LOB data types, but none of these columns are used in the index definition as either key or nonkey (included) columns. Nonclustered indexes defined with LOB data type columns must be created or rebuilt offline. It’s important to identify which indexes can be rebuilt online because issuing a rebuild with the online index option on an index that fits the above offline criteria will result in an error, leaving your index unprocessed.
After identifying which indexes could be properly rebuilt online, I added various parameters to different steps of the index job. For indexes that can’t be rebuilt online, you can choose whether to reorganize them, ignore them, or rebuild them offline (if your business model allows). I also added parameters that let you reorganize indexes if they’re under a certain fragmentation percentage and rebuild them if they’re over a certain fragmentation percentage. Then I built in features to track each index’s duration, start time and end time, progress, fragmentation before and after running the job, and any resulting blocking. Lastly, I included an option to exclude databases and tables from index operations.
But didn’t my boss say blocking caused by reorganizing was unacceptable? To address this problem, I tracked the SPID of the index operation and created a sub-job to monitor any blocking that this operation might cause. If blocking exists for more than x minutes in a row and blocks more than x processes, the SPID is killed, a period of time passes, and the index operation is retried. According to Microsoft, reorganizes don’t hold long-term blocking locks; therefore, SQL Server won’t block running queries or updates. However, in high-transaction environments that can exceed 1,800 transactions per second, as in my environment, you have to be very careful with reorganizations, hence my blocking detector.
So what happens if the blocking detector kills a SPID that’s in the process of performing a rebuild? Because rebuild operations occur as single transactions, the rollback incurred from killing an offline rebuild on a large index that’s near completion would be far worse than simply letting the operation continue. For this reason, I don’t target them with the blocking detector. I also don’t target online rebuilds, but for different reasons such as possible performance implications related to reverting mapping indexes. According to "Microsoft SQL Server 2005 Online Index Operations" (http://technet.microsoft.com/en-us/library/cc966402.aspx) "There are only very short periods of time in which concurrent DML and select operations are prevented." Any blocking should be tolerable given the alternatives. A reorganize is a different story. Because a reorganize acts only on a pair of pages at a time, if it’s killed, only the current swap operation is affected. The next time a reorganize is issued, the job picks up where it left off.
Implementing the Index Job
To create the index job, run the executables, which you can download by clicking the Download the Code link at the top of the article page, in the following order:
Note I use a database called "Common" to house common functionality amongst multiple servers. Before you run these executable files, you’ll need to replace Common in each script with the references to suit your environment. Also, in the Index_BuildIndexKey procedure I create tables on a "Tables" file group. You’ll need to change Tables to the name of your file group or the job will fail. When you run job_Indexing_Master.sql, you’ll get four non-existent step warnings. They’re safe to ignore. Each step in a job has to be created sequentially, and some steps are set up to skip to future steps in the event of a step failure. For example, if Step 2 fails, the job goes to Step 8. Well when Step 2 is created, Step 8 obviously doesn’t exist yet, and a warning is issued.
After running the provided scripts, you’ll see two new disabled SQL Server Agent jobs: DB Maint – Blocking Killer and DB Maint—Index Maint. I named the steps of the DB Maint – Index Maint job to be pretty self-explanatory. The following is a quick overview of the steps:
- Step 1 builds a tracking table that’s consumed by all subsequent steps to perform their operations. This step contains both an edition check and version check. If you aren’t using the appropriate edition and version for online rebuilds, all of the indexes targeted for rebuild will be done so offline.
- Step 2 and Step 8 are put in place to adjust the duration of your log backups if necessary. This functionality helps keep your transaction logs from growing too large and/or filling depending on your autogrowth and drive configurations. Keep in mind that your transaction logs won’t clear during a rebuild operation because, as I stated earlier, a rebuild operation occurs as a single transaction.
- Step 3 and Step 7 enable and disable the previously created DB Maint – Blocking Killer job to monitor the amount of blocking caused by a particular index operation. If you don’t want this feature to run, simply delete these steps.
- Step 4 and Step 9 are in place if you can switch your recovery model in your environment. I have included the Microsoft best practice method in each job step regarding switching recovery models on the fly. I included this feature because rebuild is a bulk logged operation, meaning it’s minimally logged when your database is in the bulk-logged recovery model. According to "SQL Server 2005 Online Index Operations," when rebuilding an index in a bulk-logged database, the log-to-data ratio is roughly 10 percent of the full recovery log-to-data ratio. Minimal logging is helpful when you want to keep transaction log sizes manageable, especially in replicated environments.
- Step 5 processes all the indexes in the table created by Step 1. You can change the retry interval and retry attempts on this step to the desired minutes to wait in the event the blocking killer terminates this step.
- Step 6 logs any failed or incomplete operations to the SQL Server error log.
- Step 10 goes back and recalculates fragmentation so that you can see a before and after snapshot.
- Step 10 and Step 1 use the sys.dm_db_index_physical_stats dynamic management view (DMV) to gather information about indexes, which requires only an intent-shared (IS) table lock, regardless of the mode that it runs in.
Figure 1 shows all of the above steps listed in SQL Server Management Studio (SSMS).
You need to understand how all of various input parameters are utilized to custom tailor the index job to your environment. The stored procedure in the first step of the DB Maint – Index Maint job (Build Indexing Key) accepts a few parameters. @minFragPercent is the minimum percentage of fragmentation an index must possess before it’s considered for an index operation. You can set the minimum percentage of fragmentation according to your specific needs. Anything below this threshold will be ignored. @maxAttempts are the number of times an index operation will be retried that was previously killed by the DB Maint – Blocking Killer job for causing excessive blocking. If you want to target only one or two databases, use @databaseIncludeList, which is a comma delimited varchar. If you want to exclude only a few databases, use @databaseExcludeList, which is also a comma delimited varchar. Lastly, if you want to exclude only a particular table, you can do so by using @tableExcludeList.
The stored procedure in Step 5 of the index job also accepts a few parameters, which Figure 2 shows.
@reorgMinFragPercent is the minimum amount of fragmentation that must be present for a reorganization to occur. This value shouldn’t be below the @minFragPercent value in the Build Index Key step. @rebuildMinFragPercent is the minimum fragmentation that must be present for a rebuild to occur. For example, if you set these values to 20 percent and 30 percent, respectively, anything below 20 percent fragmentation will be ignored, anything between 20 and 30 percent fragmentation will be reorganized, and anything greater than or equal to 30 percent will be rebuilt online if possible.
If you set the two percentages to equal each other, only rebuilds will occur. @onlineOnly set to 1 means you’re going to perform only online rebuilds. Setting this value to 0 will result in indexes being built online where possible and offline where it’s not possible. @reorgNonOnlines when set to 1 in conjunction with @onlineOnly set to 1 lets indexes that can’t be rebuilt online be reorganized. I added @globalAllowReorgs to be a final fail safe that lets reorganizations occur on a global scale. In my environment it’s absolutely critical that some indexes not be reorganized. If all of your indexes can be reorganized, @globalAllowReorgs can be set to 1.
Lastly, the DB Maint – Blocking Killer contains one step that includes a stored procedure that accepts two parameters. @blockingMins is the number of minutes a SPID must be detected as blocking. This job runs every minute and performs the blocking check. If it finds the SPID performing an index operation blocking @blockingMins in a row, it will kill the SPID. To define blocking I added the @blockedProcesses parameter. If the blocking SPID is blocking more than @blockedProcesses processes, it’s considered to be an offender and its blocking count is iterated.
For the sake of time and space in this article, I have included extensive documentation in the executables describing what each step performs and the parameters they take. Once you’ve configured DB Maint – Index Maint to your liking, including specifying a schedule, be sure to enable it in SSMS by right-clicking the job and selecting Enable.
Maximize Index Performance
This solution lets you get maximum performance out of your indexes without having to sacrifice uptime. I could have added functionality to this job seemingly forever. These are a few extra jobs that can be utilized, expanded upon, or deleted to tailor this job to your environment. In the code, you’ll notice a few comments hinting at possible expansion. I tried to make it very easy to have a solid foundation to build upon. I realize no job is ever written perfectly or applied to all applications, but I hope my index job helps at least a few of you out there.