Leveraging Managed Functions in Computed Columns

CLR integration offers a richer programming model and potentially faster performance than T-SQL

Suppose that the Acme manufacturing company uses a mechanism such as OLE for Process Control (OPC) to store records in a database table. One column in the records contains a set of comma-delimited values that tells us everything we need to know about a production line’s operation during a given hour. We need to split up the column values and place one of them in a separate column so that we can take some further action on it. This capability is built into most modern-day programming languages but not into T-SQL.

The most common solution to this problem is to create a user-defined function (UDF) that parses the individual values in T-SQL’s set-based queryexecution engine. SQL Server 2005 Common Language Runtime (CLR) integration gives us a better alternative: employing the .NET Framework’s Split() method in a UDF as a computed column expression.

Since SQL Server 7.0, computed columns have let us populate a field in a table with the result of an expression. In SQL Server 2000, we gained the ability to create an index based on computed columns. In SQL Server 2005, computed columns fully support creating and updating statistics and can now be based on a managed, scalar-valued function. Let’s see how we can use CLR integration and a UDF to create the computed column that Acme needs.

Why Use CLR?
SQL Server 2005 CLR integration allows developers to create managed database modules, including stored procedures, functions, triggers, aggregates, and types. Without question, the module that stands to benefit the most when implemented as managed code is the UDF because UDFs typically contain some of the most compute-intensive algorithms in the database and because UDFs are executed on a row-by-row basis. UDFs take advantage of the two main benefits afforded by CLR integration: access to the .NET Framework’s rich Base Class Library (BCL) and just-in-time (JIT) compilation.

Access to the .NET Framework BCL gives us access to several of the BCL’s namespaces, including those in Figure 1. These include the root namespace System, which has the base CLR types, including the System.String class and its associated Split() method. JIT-compiled code provides performance benefits over T-SQL code. When you’re writing database routines that are processor-intensive and that perform little or no data access, managed code will clearly perform better than its T-SQL counterpart. For more information about when to use managed code, see “CLR or Not CLR: Is That the Question?” April 2006, InstantDoc ID 49429.

Creating the ProductionHistory Table
Our first step toward demonstrating how managed code can split a column into its separate values is to create the database and table that will contain the column. The ProductionHistory table will store hourly snapshots of Acme’s various productionline information, as mentioned above. For brevity’s sake, we won’t create related tables as we would in the real world. Listing 1 shows the T-SQL script to enter in SQL Server Management Studio to create AcmeProductionDB and ProductionHistory.

We also need to insert a few dummy records that use a fictional convention for the ProductionValues column. This convention includes values for the hour start, hour end, units produced, and defects produced. Based on this standard, a typical entry in this column would look like this: '01,02,100,3'. Listing 2 shows the scripts for populating the table.

Creating the UDF
To create a new managed UDF, we create a new database project in either Visual Basic or Visual C# (Visual C#, in this example). Start up Visual Studio 2005, and select File, New, Project from the editor’s main menu. (You’ll need the Professional, Tools for Office, or Team System edition of Visual Studio 2005 to create CLR integration projects.)

In the Project types pane of the New Project window, expand the Visual C# node, and select the Database project. In the Name field, enter AcmeProductionDB_ ParsingRoutines as the name for our new C# database project.

Once you’ve done this, you’ll see the Add Database Reference dialog box. Click Add New Reference, and you’ll see the New Database Reference dialog box. Enter SQL Server 2005’s instance information, specify our new database (AcmeProductionDB), and click OK. The Add Database Reference dialog box now shows our new database reference. Make sure the new reference is selected, and click OK. Our new CLR Integration project has now been created and set up.

The next step is to use Visual Studio’s Solution Explorer pane to add a new database UDF. Solution Explorer shows you two levels of the AcmeProductionDB_ ParsingRoutines project: the project itself, plus its properties, references, and test scripts. Right-click the project in Solution Explorer and select User- Defined Function from the Add menu. Set the source file’s name in the resulting dialog box to udf_ParseProductionHistoryValues.cs. By default, this name will also be given to the initial function (which will be a public static method).

At this point, you should see the UDF template code in Visual Studio. We need to replace this template code with our own custom code that parses the ProductionValues column. Listing 3 shows this code. As you can see, the UDF has two input parameters, one for the ProductionValues content and a second for specifying the actual value to be returned. The UDF also sets the optional parameter IsDeterministic to true, which affords us the option of creating an index on the computed column that will reference this UDF (the UnitsProduced column).

Deploying the UDF
We must now deploy our new UDF to our target SQL Server instance. The target SQL Server instance is the database reference we added in our Visual Studio project. To deploy the new UDF, right-click the project in Solution Explorer and select Deploy as Figure 2 shows. Note: To change your project’s current database reference (target SQL Server instance), right-click the project in Solution Explorer, and select Properties. Go to the Database tab and click Browse to open the Add Database Reference dialog box I described earlier.

Altering the ProductionHistory Table
We now must alter our table’s schema to set the formula for the UnitsProduced column to reference our new UDF. Before doing so, though, let’s confirm that the new UDF was deployed to the target database. Start SQL Server Management Studio, and select View, Object Explorer. In the Object Explorer pane, navigate to AcmeProductionDB\Programmability Functions\Scalar-valued Functions and look for udf_ParseProductionHistoryValues under that node.

We still need to modify the ProductionHistory table. As with most operations in SQL Server, we can do this via T-SQL or graphically by using Management Studio. We generated the script from Management Studio, so all we need to do is execute the script in Listing 4 by using a query pane in Management Studio.

Confirming UDF Use
To confirm that the UnitsProduced column is using udf_ParseProductionHistoryValues correctly, we need to issue a Select statement to the database engine referencing the ProductionHistory table. Figure 3 shows this Select statement and its result. We have now created and deployed a managed computed column and confirmed that it’s working as it should.

Optionally, you can create an index based on the UnitsProduced column. You must adhere to several "rules” when creating indexes based on computed columns. I won’t go into detail on those rules here; if you wish to learn more about them, please see the topic “Creating Indexes On Computed Columns” in SQL Server 2005 Books Online. Listing 5 shows the T-SQL script for creating a basic nonclustered index on the UnitsProduced column.

By leveraging managed functions in computed columns, you obtain the benefits of CLR integration in a computed column context. Among these benefits are a richer programming model than that offered by T-SQL and the possibility for improved performance over that of T-SQL code. The performance improvement can be significant because computed columns tend to contain complex algorithms and they’re calculated for every row.

I used a simple example in this article so that you could focus on how managed computed columns work without being distracted by external logic. If you wish to learn more about SQL Server 2005’s CLR integration, please see my recent book Professional SQL Server 2005 CLR Programming: with Stored Procedures, Functions, Triggers, Aggregates, and Types (Wrox Press).

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.