Use Checkpoints to Restart Failed SSIS Packages

You’ve developed a SQL Server Integration Services (SSIS) package but a problem occurs during its execution process. What do you do? If the tasks prior to the point of failure take a long time to execute, you hardly want to repeat them each time you execute the package as you’re troubleshooting or even after you’ve found and corrected the problem.

Fortunately, SSIS in SQL Server 2005 and later includes a feature called checkpoints, which lets you restart the package if it fails for any reason. During package execution, the last successfully completed task or container is noted in a checkpoint file, and the checkpoint file is removed if the package completes successfully. But if the package fails before completing, the checkpoint file remains available as a reference to the location from which to restart the package.

In this article, I explain how to configure package and task properties to enable the checkpoints feature. I then walk you through a package failure to demonstrate how the checkpoint file gets created and later used as a restart point. I also show you what happens when a checkpoint occurs after a task has failed inside a container.

Creating a Test Package

To learn how to work with checkpoints, create a package with three Script tasks (Script Task 1, Script Task 2, and Script Task 3) and create Success precedence constraints between them. By using Script tasks, you won’t have to worry about configuring connection managers, setting properties, and so on. You’ll have a working test package up and running quickly, even though it doesn’t do anything interesting.

After you create the Script tasks and Success precedence constraints, click anywhere in the package background. This ensures that the scope for the variable you’re about to add is at the package level. From the SSIS menu, open the Variables window and click the Add Variable button. Change the default name of Variable to Flag and set its value to 0.

The next step is to modify the package to use this variable to toggle between a successful result and an unsuccessful result. Double-click Script Task 2 to open the editor. In the ReadOnlyVariables property, type Flag. If you’re using SQL Server 2005, you must first open the Script page in the editor before you can update the ReadOnlyVariables property.

To add the script to the task in SQL Server 2008 or later, change the ScriptLanguage property to Microsoft Visual Basic 2008 and click the Edit Script button to bring up the Visual Studio Tools for Applications (VSTA) editor. If you’re using SQL Server 2005, simply click the Design Script button to bring up the Visual Studio for Applications (VSA) editor. In the code, locate the comment Add your code here. Beneath that line, replace the existing code with the code shown in Listing 1. Close the VSTA or VSA editor, then close the Script Task Editor. Execute the package to make sure it works (which it should at this point).

It’s now time to break the package. Change the value of the Flag variable to 1. Now the package execution stops at Script Task 2, as Figure 1 shows.

What have you accomplished thus far? You have a package that you know works when the variable’s value is 0 and doesn’t work when the value is 1. If you rerun the package while the value remains 0, the package once again begins execution with Script Task 1 and fails at Script Task 2.

But what if you have a situation in which you don’t want the first task—whatever type of task it is—to execute a second time after you’ve resolved the problem that prevented the remaining tasks from executing? For example, you might have a package that you use for extraction, transformation, and loading (ETL) processing that includes a Data Flow task that extracts data from a source and puts it into a staging table. If a task fails after the Data Flow task completes, you don’t want the package to re-extract the data, especially if that process takes a long time. By configuring the package to use a checkpoint file, you ensure that the package will restart at the point of failure (assuming you fixed the problem).

Configuring a Package for Checkpoints

You now have a test package that fails on demand, so you’re ready to see how checkpoints work. First, you need to set three package properties:

CheckpointFileName. For this property, you need to provide a path and filename for the checkpoint file. If you plan to keep checkpoints implemented when you put a package into production, it’s a good idea to use a Universal Naming Convention (UNC) path. However, as Figure 2 shows, I ignored that best practice in this example and used C:\SSIS\MyCheckpoint.xml because it’s just a test package.
CheckpointUsage. This property has three possible values: Never, Always, and IfExists. The default is Never, which prevents checkpoint creation. When you specify the Always option, the package uses the checkpoint file if it exists. If it doesn’t exist, the package fails. Therefore, the Always option isn’t recommended for a package in production because the package shouldn’t be failing regularly. (A package failure is the only way a checkpoint file gets created. Once the package completes successfully, the checkpoint file is removed.) The best option to use is IfExists. When you select this option, the package uses the checkpoint file if it exists. If it doesn’t exist, the program starts from the beginning of the package.
SaveCheckpoints. This property must be set to True. Otherwise, the previous settings won’t have any effect. By default, it’s set to False.

Figure 2: Checkpoint properties at the package level

After changing the three properties, the next step is to set the FailPackageOnFailure task property. When you set this property to True for a task, SSIS will restart the package if that task causes a failure. You can set this property for every task or for only certain tasks.

The FailPackageOnFailure property isn’t accessible in the task editor. To set it, you must select the task in the package designer, locate it in the Properties window, and select True in the drop-down list. For this example, set the FailPackageOnFailure property to True for Script Task 2.

Generating a Checkpoint File

At this point, your test package is still in a broken state, but no checkpoint file exists because you didn’t have the checkpoint feature enabled when you previously executed the package. Now that it’s enabled, run the package again. Once again, Script Task 1 executes and Script Task 2 fails. Before you press the button to stop debugging, take a look at the Progress tab (or the Output window). When the package begins executing, the following two messages display, confirming that the checkpoint feature is enabled:

The package will be saving checkpoints to file "C:\SSIS\MyCheckpoint.xml" during execution. The package is configured to save checkpoints.
Checkpoint file "C:\SSIS\MyCheckpoint.xml" update starting.

When Script Task 1 completes, SSIS updates the checkpoint file. In the Progress tab, another message appears: Checkpoint file “C:\SSIS\MyCheckpoint.xml” was updated to record completion of this container.

Run the package one more time. You should see the results that Figure 3 shows. Script Task 1 doesn’t execute at all because the package is using the checkpoint file to determine where to begin execution. The package execution begins with Script Task 2, which again fails because you haven’t yet fixed the problem. In the Progress tab, the following messages display:

The package restarted from checkpoint file “C:\SSIS\MyCheckpoint.xml”. The package was configured to restart from checkpoint.
The package will be saving checkpoints to file “C:\SSIS\MyCheckpoint.xml” during execution. The package is configured to save checkpoints.

Figure 3: Results after the checkpoint feature is enabled

Because Script Task 2 doesn’t succeed, the checkpoint file isn’t updated.

Now fix the package by changing the Flag variable’s value back to 0 and execute the package once more. Because the checkpoint file exists and you changed the variable’s value, you probably expected the package execution to start with Script Task 2 and succeed, but it didn’t. What happened? To find out, open the checkpoint file, which should look like that in Figure 4. Most of the lines in this XML document refer to properties for the Flag variable, including its value at the time that the package failed.

Looking at variables’ values in the checkpoint file can be helpful for understanding the conditions in the package when it failed. However, if a variable’s value is causing the task to fail, you’ll never be able to restart the package successfully. That’s the case here. Take a look at the value in the DTS:VariableValue element, which I highlighted in Figure 4. You need to edit the value in the checkpoint file so that it matches the value that’s currently set in the package. That is, edit the element so that it reads

0.

After saving and closing the checkpoint file, execute the package. This time the package succeeds. It correctly begins with Script Task 2 and continues to Script Task 3, as shown in Figure 5. The Progress tab shows that the checkpoint file is updated after Script Task 2 and Script Task 3 complete successfully. Because the package succeeds, SSIS removes the checkpoint file. Thus, the next time you execute the package, execution will begin with Script Task 1.

Figure 5: Results after the checkpoint file is edited

Examining the Checkpoint File

Take a moment to look again at the checkpoint file shown in Figure 4. At the beginning of the file, you can see the DTS:Checkpoint element that contains the PackageID. When a package starts executing, SSIS looks for the checkpoint file specified in the CheckpointFileName property if the CheckpointUsage property is IfExists or Always. If it finds the file, it makes sure that the file’s PackageID property value matches the ID of the package currently executing. If it doesn’t match, an error occurs and the package doesn’t restart.

The checkpoint file also includes a section devoted to variables, including the last value assigned when it failed. The package will restart using this value, unless you change it, as noted in the example just given.

At the end of the file, you’ll find the DTS:Container element, which contains information about containers that succeeded. In this context, containers are either containers or tasks. To confirm which containers are identified, you need to match up the ContID value in the file with the DTSID value for a container or task, which is visible if you right-click the package in Solution Explorer and select View Code. You can use the search feature to quickly locate a particular value.

Using the FailParentOnFailure Property

When you’re working with a task that isn’t inside a container, take care not to confuse the FailPackageOnFailure property with the FailParentOnFailure property. In this situation, the package is the parent for the task. Let’s say that you decide to set FailParentOnFailure to True and set FailPackageOnFailure to False. In that case, when the task fails, you’ll see the following message in the Progress tab: This task or container has failed, but because FailPackageOnFailure property is FALSE, the package will continue. This warning is posted when the SaveCheckpoints property of the package is set to TRUE and the task or container fails. However, the package stops executing at the same place it stopped previously because the package is the parent of the task.

In the Progress tab, you’ll also see a message such as Script Task 2: Checkpoint file

"C:\SSIS\MyCheckpoint.xml" was updated to record completion of this container. This message is a cue that information has been removed from the checkpoint file that would’ve been retained if the FailPackageOnFailure property had been set to True. However, because the FailPackageOnFailure property is set to False, no information exists in the checkpoint file when the package ends in failure and the file gets removed. You have no choice at this point other than to restart the package from the first task.

If you place a task inside a container, configure the task’s FailParentOnFailure property to True, and configure the container’s FailPackageOnFailure property to True, you’ll get different results—and those results are probably not what you want. Before exploring that outcome, though, you need to see what happens when the properties are configured correctly when working with containers.

Hitting a Checkpoint Inside a Container

If you set the FailPackageOnFailure property to True and the FailParentOnFailure property to False for a task and you place that task inside a container, you’ll find that the checkpoint file gets created when the task fails. The package behaves just as it does if no containers exist in the package. That is, the package restarts at the point of the failure when the checkpoint file exists. This behavior is true for any of the three container types—Sequence, Foreach Loop, and For Loop.

To see how this works, add a Sequence container to the sample package and place the three Script tasks inside the container. Then add another Script task to the package and connect it to the Sequence container so that the new Script task executes first. Rather than reset the Flag variable, you can just set the ForceExecutionResult property for Script Task 2 to Failure. This is a handy way to test results without having to figure out a way to generate an error. Also, reset the Flag variable to 0 if you haven’t done this already.

Now execute the package once to create the checkpoint when Script Task 2 fails. Not only does Script Task 2 fail, but the Sequence container also fails. Fix the problem by changing the ForceExecutionResult property to None for Script Task 2 and execute the package a second time. As Figure 6 shows, the execution starts at Script Task 2 and continues as it did in the package without the Sequence container.

Figure 6: Results when using checkpoints inside a container

But what happens when you configure the properties incorrectly? To find out, set the Sequence container’s FailPackageOnFailure property to True. Then, for Script Task 2, set the FailPackageOnFailure property to False, set the FailParentOnFailure property to True, and set the ForceExecutionResult property to False. Execute the package and watch the Progress tab.

Although Script Task 2 fails as expected, the following message displays: Script Task 2: Checkpoint file "C:\SSIS\MyCheckpoint.xml" was updated to record completion of this container. Remember from earlier testing that this message displays when a task completes successfully. Now, reset the ForceExecutionResult property to None on Script Task 2 and execute the package again. The package restarts with Script Task 3, rather than the true point of failure. The bottom line is that you must set the FailPackageOnFailure property to True to get the correct restart point reflected in the checkpoint file. Be sure to reset all the properties correctly now before continuing.

Hitting a Checkpoint Inside a Transaction

When you configure both checkpoint and transaction support in a package, the package might not restart with the failed task. Instead, the start point depends on whether that task is part of the transaction. Typically, you use transaction support in a package that includes Execute SQL tasks, not Script tasks like you’re using in the sample package. Nonetheless, you can still observe the difference in restart behavior by configuring the sample package to support transactions.

To do this, select the Sequence container and set the TransactionOption property value to Required in the Properties window. This value means that the Sequence container starts a new transaction in this package or joins a transaction that was started either at the package level or by a container appearing earlier in the workflow. In the sample package, there are no other containers configured to support transactions, so the Sequence container will start a new transaction when the package executes. Each Script task inside the Sequence container should have its TransactionOption property set to Supported by default, which means it’ll join the transaction started by the container.

Next, break the package by setting the ForceExecutionResult property to Failure for Script Task 2, and execute the package to get the checkpoint file in place. Then execute the package again to see where the package restarts now that you have the container configured to support a transaction. As you can see in Figure 7, this time the package execution starts with Script Task 1, which is the first task inside the container.

Figure 7: Results when using checkpoints inside a container that’s configured to support a transaction

Because the transaction didn’t complete successfully when you first executed the package, the checkpoint file includes a reference to the container to use as the start point. If you open the checkpoint file, you’ll see the DTS:Container element.

The Bottom Line

You now know how to handle a package restart when a problem occurs during execution. However, keep in mind that this technique is only available within the control flow. Consequently, checkpoints aren’t helpful if a package fails during a data transfer process that you’re performing with a Data Flow task.

As you design the control flow of your package, think carefully about strategic points within the workflow that would benefit from checkpoints when you need to restart a package, and configure the package and task properties accordingly. Remember that variables’ values are saved in the checkpoint file to accommodate dependencies between tasks and that the use of containers will affect where the package restarts when containers participate in a transaction. The bottom line is that using checkpoints can save you time by skipping past completed tasks whenever you need to restart a failed package.

Listing 1: Replacement Code

If (Dts.Variables("Flag").Value = 1) Then
Dts.TaskResult = ScriptResults.Failure 

Else

  Dts.TaskResult = ScriptResults.Success

End If

Comments

Plain text