During a very early SQL Server 2005 beta cycle, the SQL Server 2005 Integration Services (SSIS) development team was looking for difficult problems with which to test SSIS. Along with our colleague Erik Veerman, we presented the team with a particularly nasty text file to use as a source. The file, which was about 5MB, represented a sample log from a telephony switch and contained more than five multi-line record formats. In our SQL Server 2000 Data Transformation Services (DTS)-based solution, these log files were imported into one consolidated table that contained more than 100 columns and used a 500-line ActiveX script transformation to parse the records. Three data pumps separated the table into a dedicated table for each needed record type. One file took about 40 seconds to process into the database, and we needed a custom loop to handle the harvest of multiple text files from a source directory. At up to 1,000 files a day, the processing workload was significant.
The SSIS team poured over the telephony switch log format and determined the file wasn’t going to fit into any of the team’s existing source adapters, so we were introduced to the concept of a custom source component. The team was working on this problem before documentation, so one of the core SSIS development team members, Jim Howey, walked through the concepts and details for developing a custom source component. The result was an elegant solution for importing and parsing the log files. Furthermore, SSIS provided a built-in ForEach loop capability to harvest all logs from a directory without cumbersome scripting.
Leveraging the investments made to decode the file, we moved the complicated parsing logic from an ActiveX transformation into the compiled .NET source component. Using the parsing logic, the source component separated the record types and produced three outputs so that the data could be inserted directly into the three destination tables in one step, eliminating the expensive intermediate staging step required in DTS. The overall result was a reduction in import times by 12 times for one file and as much as 17-times reduction when importing files concurrently. This impressive improvement happened before the beta 1 release of SSIS was available and before the team had completed any product performance tuning.
The increase in performance was initially staggering, but taking a step back, it made a lot of sense. The custom source component is a dramatic shift in architecture in which the transformation logic moves into a compiled component that has no additional overhead associated with the ActiveX scripting tasks. In addition, handing the transformations in the source connection reduces the number of times the data is touched. The combination of impressive performance in a portable, elegant solution makes the custom source component a very attractive option.