Ready or Not, Here Comes the CLR

Editor's note: This article is based on the beta 1 release of Yukon. Some items covered in this article might change before the product's release.

One of the most-anticipated yet least-understood features of the upcoming Yukon release of SQL Server (formerly named SQL Server 2005) is its integration of the Common Language Runtime (CLR). This is an extremely rich topic, covering many new features and functions for database administrators and developers. As you probably know by now, Yukon is based on the Microsoft .NET Framework and provides new features that go beyond the capabilities of SQL Server 2000.

CLR Integration

The CLR is the engine of the .NET Framework, facilitating the execution of all managed application code. With Yukon, managed code is part of the core application-development technologies. Managed code is code that's compiled for the .NET environment. It runs completely inside the .NET Framework, without relying on other resources such as COM components (we discuss these later). The CLR translates managed code to executable code, which the CPU then executes. The .NET Framework also provides class libraries—key software building blocks of .NET applications.

All code in .NET belongs to a unit called an assembly. An assembly contains two important types of data: code compiled to Intermediate Language (IL) and metadata. IL is a form of assembly code that the CLR can translate to executable code at runtime. This process is called Just-In-Time (JIT) compilation. The server retains the executed code in memory until the host process terminates. The metadata that the compiler generates describes the assembly's individual classes and the assembly unit itself. Figure 1 outlines this process. By leveraging integration with the CLR, Yukon supports the development of managed code and its deployment to the database.

The integrated CLR will let developers use .NET languages in addition to T-SQL to create database objects including stored procedures, user-defined functions (UDFs), and triggers. CLR integration also lets developers build user-defined data types (UDTs).

Yukon uses an implementation of ADO.NET for the data access that CLR code performs in the database. This implementation's classes, such as SqlDataReader and SqlCommand, provide a familiar programming model for ADO.NET developers. And when you're writing a managed-code application to run in the SQL Server process, you already have an implicit database connection, so the SqlConnection object doesn't need initializing. This implementation makes in-process data access simple, yet it's powerful enough for developing rich applications.

The ability to run managed code in Yukon doesn't mean that T-SQL is going away. In fact, Yukon has significant T-SQL enhancements. (See Itzik Ben-Gan's article "What's New in Yukon T-SQL," page 27, for more about T-SQL enhancements in Yukon.) However, a .NET language can be useful in some situations. For example, CLR-based code might be a better choice for computation and logic-intensive operations because of the optimization you get from running compiled code against the CLR. A .NET language compiler compiles the CLR code into a binary executable format, but the SQL Server database engine always interprets T-SQL at runtime. In addition, .NET languages such as C# and Visual Basic .NET provide better looping and branching constructs than T-SQL does. Writing structured code is easier with these languages than with T-SQL, and having access to the .NET Framework class libraries opens up new possibilities that don't exist with T-SQL, including graphic image manipulation, advanced string processing, and advanced arithmetic computations.

In SQL Server 6.5 and later, developers have two basic options for implementing code in their databases. Some use the OLE automation for stored procedures (sp_OA) series of procedures to create and use COM objects from SQL Server. The Windows OS uses COM objects heavily, leveraging a programming infrastructure that lets you encapsulate and define a piece of code so that many programs can use it. Before .NET, COM was the primary mechanism for developing Windows-based applications that require distributable units of code. For example, a Visual Basic (VB) 6.0 project to build a class module compiles as a COM object, which lets other programs use it as long as they can implement a COM component. The sp_OA feature lets developers make limited use of COM code inside the database. The drawbacks of sp_OA include cumbersome deployment and limited support for debugging these objects while they're running within SQL Server. Yukon's support for managed- code applications addresses some of these limitations.

In addition to sp_OA, SQL Server developers can use extended stored procedures—stored procedures that are written in C or C++ and compiled as DLLs. Developers often use extended stored procedures to perform computation-intensive tasks or to access resources outside the database. But extended stored procedures are also difficult to write and are notorious for causing memory-related problems and unhandled exceptions in code.

With Yukon's CLR integration, developers can use .NET languages to build the functions that they've previously used sp_OA and extended stored procedures for. The CLR has advantages over these other options because of its built-in security model, the stability and safety associated with executing code in a managed environment, and the ease of development, debugging, and deployment. In addition, access to the .NET Framework class libraries can make programming with the CLR far more productive than using sp_OA or extended stored procedures. Let's look at an example that shows the power of using managed code in the database.

Applying Managed Code

The .NET Framework includes a set of class libraries under the System.Text namespace that support the processing of regular expressions. Regular expressions support pattern matching in text, so you can write code to find desired text patterns—for example, to validate user input from a program. Listing 1 contains a method from a C# class that demonstrates the use of a regular expression to validate US phone numbers. For data-processing functions, you could expose the ValidatePhoneNumber method as a UDF or as a stored procedure. This function uses the IsMatch method of the RegEx class to determine whether the phone number value passed to the function matches the regular-expression pattern that defines phone number formats. New DDL statements such as CREATE ASSEMBLY, CREATE FUNCTION, and CREATE TYPE make managing .NET code in the database simple. For example, the code in Listing 2 creates a reference to an assembly in the database, maps one of its methods to an endpoint—in this case, a UDF—and executes that UDF.

Another powerful feature of Yukon's CLR integration is the ability to create real UDTs. SQL Server 2000 and earlier releases support something called user-defined types, but these types are just another way to reference the basic SQL Server data types. Yukon defines a much more powerful UDT system in which developers can create types from CLR-based types or build classes that can have their own properties and methods. SQL Server's data-storage engine stores the UDTs just like any other data type, by using an optimized type of serialization (the process of saving an object as a stream of bytes). You can also define custom aggregation logic, including sum, average, count, and standard deviation. You define UDTs and custom aggregations by creating .NET classes. Listing 3 shows the C# code to build a SQL Server UDT. The ComplexNumber class defines a complex number data type (numbers that have a real and an imaginary component) and handles addition for complex numbers. The class also implements the base requirements for a UDT.

The exact rules and considerations for building a .NET UDT are beyond this article's scope, but at a high level, the class must support the INullable interface (which lets the type support a null value), implement the Parse and ToString methods (so that the type can be parsed from and written to a varchar value), and provide a default constructor. (A constructor is a method call that creates a new instance of that type.) Some SQL Serverdefined attributes are also required to specify details such as how to serialize the object to disk. Listing 4 shows T-SQL code that references this assembly in the database and implements the UDT in a table, then performs some simple tasks with the UDT. Once you declare a UDT, you can use it much like any data type, but it also supports method calls and properties.

Yukon stores compiled .NET code assemblies in the database. Storing them there makes it easy to move databases between servers and eliminates the need to back up program components from the file system when the database is backed up. Yukon will ship with tools to streamline the process of loading .NET components into the server, but as with other SQL Server administrative tasks, you can also use a command-line option for this task. CREATE ASSEMBLY is the T-SQL command for loading a .NET assembly into a database. You must first use Visual Studio or one of the .NET command-line compilers (which ship with the free .NET Framework Software Development Kit—SDK) to compile the assembly because the SQL Server engine doesn't compile the .NET code. The CREATE ASSEMBLY command then takes the location of the assembly DLL (or a bytestream), the name of the assembly to be used in the database (this name must be unique to the database), and optionally a permission set for that assembly (we cover this in a moment). Yukon also provides a DROP ASSEMBLY statement, which removes an assembly from a database, and an ALTER ASSEMBLY statement, which refreshes an assembly with a new version.

But loading the assembly into the database accomplishes nothing on its own. Before you can use the assembly, you must declare the references to the functionality exposed in the assembly. To support the mappings of functionality in the assemblies (i.e., classes and their methods), Yukon has an extended syntax for existing T-SQL DDL statements, including CREATE PROCEDURE, CREATE FUNCTION, and CREATE TRIGGER. (We cover the new CREATE TYPE statement later.) When creating a T-SQL stored procedure, you type the CREATE PROCEDURE declaration followed by the T-SQL code. Creating a .NET stored procedure involves the same declaration, but you use the EXTERNAL NAME argument to tell SQL Server that the implementation of this procedure is contained in a loaded assembly under a specific name. Note that you must already have loaded the assembly by using the CREATE ASSEMBLY statement and that the stored procedure declaration, including parameters, must be compatible with the .NET method in the assembly. If the declaration includes nonSQL Server data types (for example, using String instead of the varchar type) or other incompatibilities, the CREATE PROCEDURE statement returns an error message. The CREATE FUNCTION and CREATE TRIGGER statements work much the same as CREATE PROCEDURE, with the same rules. CLR triggers have access to the inserted and deleted tables just as T-SQL triggers always have. CLR functions work like CLR stored procedures except that they can return a value or a table.

The management of assemblies and declarations is a lot of work if you do it manually. But Microsoft is working to make Yukon easier to use and administer than its predecessors. You can bet that Microsoft will streamline this process with updated management tools and some handy enhancements in the next Visual Studio release, code-named Whidbey.

Security with the CLR

Introducing user-defined executable code into SQL Server raises questions about how the code might endanger your server's security and stability. Anyone who has debugged extended stored procedures or misbehaving COM components in SQL Server knows how easy it is to make a mistake that brings the server down. To minimize such problems, Yukon provides support for securing assembly code within the database. The .NET Framework relies on a mechanism called code-access security that controls an application's access to protected operations, functions, and resources. The CLR's security system performs the necessary access checks for code through a technique known as "walking the call stack." The call stack of a program keeps track of the order and hierarchy of method calls the program uses. In turn, each method can serve as a caller of other methods. If any caller in the stack doesn't have required permissions, the CLR throws a security exception and access to the caller is refused. This stack-walking technique prevents security breaches that can happen when less-trusted code calls highly trusted code to perform unauthorized operations. For example, assembly A might have permissions that enable its code to access a file on a network share. If assembly B doesn't have this permission but attempts to call assembly A to access the file, this request is denied. Walking the stack incurs some additional overhead, but it's essential to maintaining a secured application environment with .NET.

With the introduction of managed code in the database, SQL Server has a new security mechanism. Before Yukon, you applied security to database users, meaning that you granted a user or role privileges to perform certain functions. The level of permissions granted, such as database owner (dbo) or systems administrator (sa), defines the functions users can perform. But now that managed code can be used in the database, you can also apply security levels to units of code. The three security levels for managed code in the database are SAFE, EXTERNAL, and UNSAFE.

The SAFE designation applies to implementations that rely solely on managed code that accesses no resources outside the database. The EXTERNAL designation, which supports read and write access, applies to code that accesses external resources in addition to database resources. Examples of external resources include file systems and network resources. The UNSAFE designation applies to code that can access all resources, including unmanaged code, and is limited to use by systems administrators. These security levels will help DBAs manage and secure at the assembly level all managed code that runs inside the database.

Ready, Set—Yukon

Yukon's .NET integration features are already sparking interest, excitement, and concern in the SQL Server community. DBAs are concerned about the use of managed code in their databases and the potential security and performance implications. And developers are anxious to learn how using managed code to build database applications will improve their productivity.

To prepare for the new release, SQL Server professionals should investigate the .NET Framework and begin to understand how the CLR languages work and what kind of functionality the .NET Framework class libraries support. We advise DBAs to also begin learning a .NET language so that they can effectively support their databases as managed code is deployed in their environments. In addition, developers and DBAs should discuss potential applications of the new technology and try to understand and address each other's concerns before the product's release. For those with access to the beta, this kind of feedback will also help Microsoft improve the product.

As a major new release of SQL Server, Yukon presents an opportunity for developers and DBAs to use their existing skills while taking advantage of powerful new capabilities such as CLR-based code, rich UDTs, and security enhancements. As with any new release, SQL Server implementers will have to define their solution architecture and decide which features to use and how to leverage them. At press time, Microsoft hadn't identified a release date for Yukon, except to say that it will happen in the second half of 2004. If you're planning projects for the next year, Yukon likely won't be a factor in your decisions about systems to deploy. But for those looking further out, Yukon includes features and functionality that developers, DBAs, and architects need to study and plan for.

Comments

Plain text