As our series on SQL Server data warehouse & BI for DBAs continues, I want to now switch to data integration, or ETL, with SQL Server Integration Services (SSIS). Previously, I’ve scratched the surface on some things to look for, monitor and links to further reading on Reporting Services and Analysis Services. Now let’s touch on SSIS for Integration Services. I’ll come back around with a second round for some other areas that DBAs need to be aware of in terms of all 3 SQL Server BI/DW tools that are in the SQL Server box (SSIS, SSRS, SSAS) but let’s start here with SSIS today.
I’ll talk about 3 common areas for DBAs with SSIS:
1. Monitoring SSIS
SSIS is not just for ETL for moving data into data warehouses. It’s also not only for data integration, such as from other systems (and other database types) into SQL Server. For that matter, you can even more data into Oracle, Teradata or other databases with SSIS. DBAs also use SSIS frequently for tasks such as exporting and importing data with Excel or text files. It is used to move database logins and other database objects.
But when you are responsible for the daily care & feeding of an SSIS package that was deployed into production and developer by an ETL developer, you are now in charge of monitoring the health and success of that workflow. You can use Perfmon to monitor pipeline buffers (more on this below and from SQL CAT below) from SSIS executions and I recommend capturing those through tools like data collectors because your jobs will likely run overnight and you can then capture history and monitor that over time for spikes and trends. I like this MSSQLTIPS entry on SSIS Perfmon counters for more.
Errors, failures and logs from the package executions will depend on how the SSIS developer configured the package. They can use events in the package to log to a table or file, which is a good practice, or you can view the job success/failure in the activity monitor for the job. Default logging from the SSIS package should be enabled by the developer before deploying into production such that errors are logged to a target like a flat file or table. You can use the BOL entry on this option and make sure that the logging is working in production, not just in development or staging. I’ve seen that not be the case many times. Although this is not a blog post on SSIS development, I do want to point you to the Codeplex community site for SSIS, which includes some example projects that make error logging a bit easier on the developer. Once in production, you’ll be thankful for the additional log messages!
2. Scheduling SSIS
After the SSIS package is developed in BIDS and then deployed, you can store it either in MSDB or on the file system. As the pic below indicates, you can then schedule the job to run on a schedule from SQL Server Agent Jobs. Just as something of a note that I see often, you’ll want to make sure that the jobs have nice long descriptive names so that you can see at a glance from your job activities that they are ETL jobs, data integration, maintenance plans, backup jobs, etc. You can control the job security roles and run package manually by logging into the SQL Server Integration Services service from SSMS.
A common practice that I see a lot and works very well is that when you have a sequence of SSIS jobs that need to run serially, is to create a master “controlling” job to call the subsequent SSIS package jobs as described in the BIDN website which was started up by the folks at Pragmatic Works. With this technique, you won’t need to sequence these from the SQL Agent jobs control and have separate configuration files for multiple packages. That being said, there is much more work that the developer needs to go through that the link I gave you above helps to describe.
3. Changing SSIS Workflows & Configurations
The SSIS developer will hopefully have utilized package variables that give you flexibility to configure file names, locations and other parameters so that the package does not need to be modified in a BIDS development environment. For a DBA, the ability to configure the package at run time is crucial. The developer can enable package configurations from the Control Flow surface in BIDS with right-click selection Package Configurations.
As you can see in this dialog box, the configurations can be stored and configured from a range of choices like XML file, registry entry, SQL table, etc.
I don’t have the space or time in this posting to talk about best practices with SSIS workflows. As always, I’ll point you to the seminal experts on SQL Server best practices, the SQL CAT team here with their top-10 list of SSIS best practices. As a DBA, you may not need to use all of that information. But an important note in there is the point that many SSIS workflows are performed in memory in the SSIS pipeline. So you will see a spike in memory consumption from the SSIS execution. What you’ll really want to look for in most common cases with SSIS, which is used as an ETL tool for loading data warehouses, will be failed executions of the packages, updates and inserts from the log files.