Beyond T-SQL: Digging into the CLR

No matter what your job, if you work with SQL Server, you need to understand the Common Language Runtime (CLR), the engine that facilitates the execution of all managed code in the .NET Framework. The CLR is one of the most innovative additions to the next SQL Server release, SQL Server 2005, which Microsoft expects to release next year. Regardless of whether you're a DBA, a developer, or a programmer, your role as a SQL Server expert is changing, and you need to understand how the CLR works.

In "Developing CLR-Based Stored Procedures," May 2004, I explained the fundamental architecture behind CLR-based stored procedures and introduced an application to test the procedures. This month, I show you how to code and call CLR procedures and functions. I created this article's application and CLR procedures by using the alpha builds of Visual Studio 2005 (formerly code-named Whidbey) and SQL Server 2005 that Microsoft released at the October 2003 Professional Developers Conference (PDC). (Microsoft planned to release beta 2 of SQL Server 2005 in late June 2004.) Many of the underlying implementation details in the current beta have changed since the alpha release, but the fundamental concepts remain the same. For example, the current SQL Server 2005 beta no longer integrates with Visual Studio, but the two products are expected to converge this summer. As these two tools march arm-in-arm toward their release to manufacturing (RTM), I expect Visual Studio to incorporate the SQL Server 2005 toolset and permit even tighter CLR development than the earlier alphas. SQL Server 2005's new SQL Server Management Studio (formerly called the SQL Server Workbench) uses the Visual Studio 2005 IDE as a foundation to simplify the developer's job of creating data-centric applications and the DBA's job of managing the data that data-centric developers are accessing.

To demonstrate how to code and call CLR procedures, I developed an example application that calls CLR stored procedures to generate an RSA encryption key and encrypt and decrypt credit card numbers and expiration dates. When the user clicks Submit, the code calls a stored procedure that, in turn, calls a CLR assembly. The assembly generates a private RSA encryption key and uses this key to encrypt the string that holds the credit card number. The Encrypt() function then saves the encrypted data (now in the form of a byte array) and the expiration date to the database. The Windows Forms application program (which I wrote in Visual Basic .NET) stores the private encryption key and displays it in the client application to assist in debugging the application. The program doesn't keep the encryption key on the server; this way, only the user can decrypt the server-side value—and only if the client-side application persists the value locally.

When the user wants to retrieve the credit card number so that some server-side function can use it, the client-side application calls the FetchCCNo stored procedure, which uses the encryption key, the client ID, and the expiration date to look up the customer record by ID and expiration date. FetchCCNo then decrypts the value by using the Decrypt CLR-based procedure. Figure 1 shows the interface to the application, complete with sample results. For this example and to help debug the application, the FetchCCNo stored procedure returns the decrypted credit card number to the client application, but I expect that the server-side application would typically use the unlocked value to perform some other operation.

Note that this article's example calls the CLR stored procedures from T-SQL stored procedures. I found this method more convenient than engineering direct calls to the CLR assembly functions, and I expect that's what most developers will end up doing anyway. Another method, which I don't discuss in this article, would be to code the CLR procedures as functions and embed them in T-SQL code.

You can download the scripts I use for this article's example at InstantDoc ID 42838. Many of the example scripts I use point to the Biblio test database; you'll need to alter these scripts to point to your own test database. Let's start by looking at how to create a CLR-based assembly and code CLR functions, then walk through the creation of the project and the scripts I use.

Creating a New CLR-Based Assembly in Visual Studio 2005

To implement the CLR-based stored procedures I use in this example, you start by building the assembly code in Visual Studio. When you start the alpha version of Visual Studio 2005, you have several new Project types to choose from—including one type to create SQL Server 2005-hosted CLR assemblies. Start a new Project by clicking File, New Project, and choose SQL Server Project. Be sure to name your Project so that you'll recognize where to insert its name in the scripts you use to install the resulting assembly in SQL Server. I usually begin each Project name with an identifier; in this case, I used the letters asy to indicate that the Project is an assembly.

When you click OK in the New Project dialog box, you get the Add Database Reference dialog box that Figure 2 shows. This reference links the Visual Studio 2005 IDE to a specific SQL Server 2005 server that will host the CLR assembly you're creating.

Next, you need to choose a Project template. Select Project, Add New Item to open the dialog box that Figure 3 shows. In this dialog box, you can choose one of the new types of assemblies associated with the SQL Server Project type that Visual Studio 2005 can create for you: Stored Procedure, User-Defined Function, User-Defined Type, Trigger, or Aggregate. For this article's example, I selected Stored Procedure. Before you click Open, be sure to set a filename that you can easily recognize later. I prefix my stored-procedure class files with cls to prevent confusion later. (A class is simply a block of code that exposes functions and properties.) After you click Open, you have a new assembly prototype (which Figure 4 shows) that Visual Studio 2005 has preset to import all of the namespaces you're likely to need. A namespace is a unique name that addresses a class in the .NET Framework. The namespaces you import help the compiler identify the classes that your code references. Importing doesn't add code to your assembly—it simply provides a way to address the parts of the .NET Framework you intend to reference in your code.

Coding the CLR Functions

Ideally, you should use CLR classes only for operations such as complex math functions that require lots of CPU cycles. I explained the reasons for this rule in "Developing CLR-Based Stored Procedures." In a nutshell, T-SQL is a better choice for most stored procedures and SQL Server functions unless you need them to perform operations not really suited for T-SQL. The example code you downloaded builds three CLR functions to generate an RSA encryption key. The code then uses the key to encrypt and decrypt a data string—in this case, a credit card number. T-SQL can't perform these encryption operations without access to the RSA encryption functions in the .NET Framework, so these operations are ideal for a CLR implementation. The CLR function can access selected parts of the .NET Framework, including the RSA encryption functions.

With the CLR assembly in place, you're ready to code your CLR assembly's function prototype, which names the function, specifies its input and output parameters, and specifies the data type of the value the function returns. The code at callout A in Listing 1 shows the prototype for the GetKeys() function, the first of three classes (in this case, Visual Basic .NET functions) that the example implements to provide encryption services. The GetKeys() function generates RSA encryption keys and returns them to the calling procedure. This encryption approach lets you use the public key to encrypt a value and the private key to encrypt or decrypt a value—as long as that value was encrypted with the matching public key. The Visual Studio 2005 IDE adds the tag at the beginning of callout A to tell the compiler that SQL Server needs to be able to call this function.

Because SQL Server calls this Visual Basic .NET function the same way it does a typical T-SQL stored procedure or function, the GetKeys() function can have input and output parameters as well as a return value. Your function might not return anything to SQL Server after it executes. But in most cases, you'll want to pass at least a return value (integer) and some sort of exception string so that if the CLR procedure fails to execute, you'll have information about what went wrong. Note that the three parameters in the function prototype at callout A are all coded ByRef. This designation is required for all parameters that the function returns. The GetKeys() function also returns an integer (SqlInt32) to indicate its success (0) or failure (-1). To simplify the debugging process, I developed the function code in a separate assembly that includes a test harness that calls the function and processes the returned values. Once the function code is working, you can simply paste the code into the SQL Server function.

The next block of code in callout A generates the RSA private and public encryption keys. Note that this function returns its values by using the function Return operator and by setting the values of the output arguments.

The Encrypt() function at callout B encrypts the inbound value (the credit card number) and returns an encrypted byte array. Encrypt() has two input arguments (which the application passes into the function by using ByVal), two output arguments (which the function returns by using ByRef), and a return-value integer. Note that the function returns the encrypted value as a byte array by using the SqlTypes.SqlBinary data type. I used this approach to bypass problems I encountered when attempting to pass back the value in a Unicode string.

The code for the Decrypt() function at callout C is similar to the code for the Encrypt() function. Callout C's code simply reverses the process by using the RSA private key to decrypt the byte array. The Decrypt() function passes back to SQL Server a decrypted string or an exception message to indicate that the code couldn't decrypt the value by using the given key.

Building the Project and Creating Scripts

Now that you've coded the assembly's functions, you can right-click the project and choose Build. After the code compiles, you're ready to deploy the CLR assembly. You'll want to test the functions, and I did so by writing another application that exercised them without using SQL Server. However, you can also test and debug the functions after they're installed, as I described in "Developing CLR-Based Stored Procedures." You need to deploy the assembly manually, as I explain in a moment. Don't be tempted to right-click the project in the Visual Studio Solution Explorer and click Deploy, as you might if this were an ordinary Windows Forms or ASP.NET application. If you try to use this automatic deploy method, Visual Studio runs through the build cycle (which creates the DLL), then starts a series of operations on the selected SQL Server that include dropping all stored procedures on the server that have links to the assembly. This means you'll have to go back and recreate all your stored procedures manually.

Instead, you now need to switch to T-SQL editing to create scripts to build the stored procedures you'll use to interface with the CLR assembly's functions. Although you can use the Visual Studio 2005 alpha tools to build the test-harness T-SQL scripts, the alpha release has problems that make building and testing stored procedures difficult. I understand the latest beta builds handle the entire process of building T-SQL stored procedures that call CLR functions (as we're doing here), but I didn't have access to those builds at the time I wrote this article. For this example, Management Studio was easier for me to use than the Visual Studio 2005 tools. In the example files you downloaded, I included a Management Studio project, EncryptDecrypt
.sqlsln, that includes the following scripts:

Create CCard Table—I used this T-SQL script to build the CCards table, which holds the encrypted credit card data.
Build script—As I explain in a moment, this T-SQL script creates the server-side ASSEMBLY object and addresses the CLR assembly DLL I created in the first step. This script also creates three T-SQL stored procedures—GetKeys, Encrypt, and Decrypt—that link to the CLR assembly functions we just looked at.
Create Proc InsertCCard—This T-SQL script creates the InsertCC stored procedure that the Visual Basic .NET front-end application calls so that it can save a credit card number to the database as an encrypted varbinary value.
Create FetchEncryptedCCNo—This T-SQL script creates the FetchCCNo stored procedure, which the Visual Basic .NET front-end application calls when it needs to decrypt an encrypted credit card number.

The next step in creating a callable CLR-based assembly from SQL Server is to create a server-side ASSEMBLY object from the CLR assembly that Visual Studio .NET built.

Invoking a CLR-Based Stored Procedure

To create the CLR assembly, I used the T-SQL CREATE ASSEMBLY statement. Visual Studio 2005's alpha release can do this for you automatically if your CLR-based assembly is simple, but once you start calling out to the .NET Framework's external DLLs, you have to revert to installing your server-side assemblies manually. The alpha release of Visual Studio 2005 doesn't include the mechanisms that Visual Studio needs to pass to SQL Server so that SQL Server knows how to trust the new ASSEMBLY.

The T-SQL script BuildScript installs a new server-side assembly. Because the ALTER ASSEMBLY statement isn't implemented in SQL Server 2005's alpha code, you must drop all dependencies before recreating the assembly. Visual Studio 2005 drops the dependencies for you when you use the Deploy feature on the Project. This means that if your CLR ASSEMBLY gets too complex, you'll need to revert to manually dropping the dependencies, as the code at callout D shows, and hard-coding the CREATE ASSEMBLY code in a T-SQL script, as the code at callout E shows.

The next few lines at callout E show the name and address of the new SQL Server assembly, which I built at the beginning of this process by using Visual Studio 2005 and compiled to a DLL. Note that I specified that this assembly is UNSAFE, which is one of three permission settings that tell SQL Server whether to execute code in safe or unsafe areas outside the sandbox. The term sandbox refers to the "safe" area within reach of SQL Server. If your code attempts to access functions or resources outside of this area (i.e., in areas considered to be unsafe), it's said to be working outside of the sandbox and is prohibited from executing—unless it's trusted and permitted to do so. In this case, I tell SQL Server that this CLR assembly is permitted to make UNSAFE calls to encryption libraries outside SQL Server. For a description of the permission settings, see the sidebar ".NET Safety Settings."

After you create the CLR assembly on the server, you can address it the same way you do any other T-SQL function—from
T-SQL procedures or functions. For this example, I installed three T-SQL procedures (as the code at callout F shows) to call three function classes that the assembly exposes: the GetKeys class, which generates public and private RSA encryption keys; the Encrypt class, which accepts a string and uses a specified RSA public encryption key to return an encrypted byte array; and the Decrypt class, which accepts a private RSA key and a byte array and returns the decrypted value as a string. All three of these classes are exposed as Visual Basic .NET functions within the single CLR assembly we just created on SQL Server.

You can invoke the server-side assembly from a T-SQL stored procedure (as in this example), a user-defined function (UDF), or a trigger. As the code at callout F shows, the T-SQL procedures start out like any typical stored procedure, by defining the input and output parameters and starting the T-SQL code block with an AS statement. The magic occurs as you specify the connection with the EXTERNAL NAME statement to the pre-installed server-side ASSEMBLY, which links to the CLR assembly DLL entry points (the class functions). I named the CLR DLL assembly asyEncryptDecrypt, the class within the assembly is clsEncryptDecrypt, and the method (Visual Basic .NET function) the assembly links to is GetKeys, as the EXTERNAL NAME T-SQL clause shows:

EXTERNAL NAME 
\[asyEncryptDecrypt\]:\[asyEncrypt
Decrypt.clsEncryptDecrypt\]::\[Get
Keys\]

When this T-SQL procedure executes, SQL Server fetches and prepares the CLR assembly for execution and verifies several pieces of information. In the process, SQL Server verifies the existence of the CLR assembly. If the assembly isn't installed at the specified location or you didn't specify the path correctly, the CREATE PROCEDURE statement fails. Then, SQL Server verifies the number of parameters the invoking stored procedure passes to the CLR class function. This number must be the same as the number of parameters you defined for the T-SQL procedure. Next, SQL Server confirms the data type of each of the CLR-based function arguments. In my testing of this early SQL Server 2005 build, CLR-based stored procedures couldn't access all T-SQL data types. For example, if you want to pass a string from your CLR code, you have to declare it as a SqlTypes.SqlString data type and declare the T-SQL parameter as nvarchar because CLR strings are Unicode by default. Finally, SQL Server verifies the way you defined the CLR assembly's function arguments. To get T-SQL to recognize an output parameter, you must define the CLR function arguments as ByRef and the input arguments as ByVal.

One of the mysteries I had to solve was how to pass T-SQL data types to and from CLR assemblies. The sidebar "Mapping T-SQL Data Types to SqlTypes" lists some tips for converting T-SQL data types.

As I noted earlier, I also didn't have much luck using the alpha release of the Management Studio Create Stored Procedure feature to create and edit these stored procedures. I found it easier to use Management Studio's built-in SQL file editor to code and test the T-SQL procedures. The alpha Visual Studio 2005 build wouldn't let me easily create and edit the stored procedures either. Both tools lacked the ability to set permissions, so I hope Microsoft has solved this problem in more recent builds.

A Valuable Process

This encryption project has been valuable as well as challenging. It taught me a lot about the realities of building, testing, and deploying CLR-based procedures as well as what works and what doesn't. I also learned quite a bit about suitable uses for CLR-based procedures, functions, and triggers and for CLR-based user-defined types (UDTs). Some of the marketing I've seen about SQL Server's upcoming support for the CLR is overblown, and I saw some exaggerations of the CLR's usefulness when I was at Microsoft TechEd in May. But I think Microsoft is reducing the hyperbole and doing a better job of informing its marketers about SQL Server 2005 and Visual Studio 2005 feature sets.

I encountered lots of quirks in the alpha versions of SQL Server 2005 and Visual Studio 2005 that I had to write around as I went. I achieved the purpose of the exercise, though—to see how to write and test CLR-based database code. Although you might have only a few uses for this particular encrypt/decrypt application, it demonstrates the ability of the CLR to deal well with specific operations and functionality that T-SQL currently doesn't provide. Developers have struggled for years to implement technology that supplements T-SQL functionality, and I can see that CLR-based code execution will be a welcome innovation.

Comments

Plain text