5 Things You Probably Didn't Know About T-SQL

No one can know everything about T-SQL. No matter how well versed you are in its intricacies, you can always identify a few knowledge gaps that you need to fill. In this article, I help you plug a few of those holes. For starters, here's a fact you probably do know: T-SQL supports several extensions to the ANSI SQL-92 standard. But are you up-to-date on all the extensions? Let's examine four extensions that you might not be familiar with and correct inaccurate documentation for CHECK constraints in SQL Server Books Online (BOL).

1. Computed Columns

Computed columns, which Microsoft introduced in SQL Server 7.0, aren't well documented in BOL. In both SQL Server 2000 and 7.0, you declare computed columns in the CREATE TABLE statement, as in the CREATE TABLE statement for table Test that Listing 1 shows. Figure 1 shows the output of SELECT * FROM Test.

Because column c3 depends completely on the Test table's other columns, SQL Server can compute its value on the fly. Therefore, column c3, except in the following situation, takes no physical space in the Test table's rows.

Only SQL Server 2000 supports the creation of indexes that contain computed columns. However, when a computed column resides in a clustered index, SQL Server materializes the column. That is, the computed column takes physical space in each of the table's rows. Only the creation of a clustered index that contains a computed column causes the materialization of the computed column.

In addition to materialization, a couple of other points about computed columns are worth noting. SQL Server automatically assigns a computed column's data type based on the other column data types and the calculation that follows the AS clause. Also, because computed columns are virtual, you can't issue an explicit UPDATE against them.

Keeping the preceding background in mind, let's look at one practical application for computed columns. When an application requires an alphanumeric column and a sequential numeric prefix, you can often use a computed column. For example, suppose a company has multiple regional offices, all of which accept sales orders. Each office numbers its sales orders sequentially using an IDENTITY column. The home office uses transactional replication to collect the regional sales orders into a corporate table. To distinguish the sales orders of each region, the application requires each office to number its sales orders with a two-character alphabetic prefix followed by a numeric suffix. Accordingly, the New York sales office must create sales orders in the format NY00001, NY00002, and so on, and the Los Angeles sales office must generate sales orders in the format LA00001, LA00002, and so on. Consequently, the New York application might create an alphanumeric column in the Orders table, as Listing 2 shows.

You can populate the Orders table with sample data by running the code that Listing 3 shows. By executing a SELECT statement against the table, you can get a better idea of how the table works. Table 1 shows the populated table. The definition of the Orders table for the Los Angeles regional office is the same as the definition for New York except for the default value assigned to the Sales_Office column. When the New York and Los Angeles offices send data to the home office, each office's orders are easily identified. The beauty of computed columns in this case is that SQL Server automatically assigns location-specific order numbers completely independent of application code.

2. Primary Keys on Computed Columns

As I mentioned earlier, only SQL Server 2000 supports creating indexes that include computed columns. Because SQL Server implements both unique and primary key constraints as indexes, only in SQL Server 2000 can you include computed columns in these types of constraints. However, SQL Server doesn't easily accept a primary key constraint on a computed column. To demonstrate how you can get SQL Server to accept such a key, let's start by defining a unique constraint. The difference between the two constraint types is that SQL Server doesn't permit NULL values in any column on which a primary key is defined, whereas a unique constraint does support NULL values.

You can modify the Orders table in Listing 2 by adding a unique constraint to the Order_Number computed column, as Listing 4 shows. Then, you can execute the INSERT statements that Listing 3 shows and retrieve the same results. Because you're declaring both the Sales_Office and Order_Seq columns as NOT NULL, you can expect the Order_Number computed column to also be implicitly NOT NULL. Therefore, the unique constraint effectively becomes a primary key constraint. However, if you change the constraint to be explicitly a primary key, as Listing 5 shows, SQL Server returns the following error message:

Cannot define PRIMARY KEY constraint on nullable column in table 'Orders'.

Clearly, something is amiss. This behavior appears to represent a bug in SQL Server 2000. However, SQL Server is correct in not letting you place a primary key on the computed column. The SQL Server parser knows that calculations involving non-nullable columns can produce a NULL value under certain circumstances. For example, the following computation--a division by zero--uses non-NULL values, yet returns a NULL:

SET ANSI_WARNINGS OFF
SET ARITHABORT OFF
GO

SELECT 1 / 0

Because the SQL Server parser knows that computations involving non-nullable columns can yield a NULL, the parser disallows primary key constraints on computed columns unless you take additional precautions. You need to add either the ISNULL() or the COALESCE() function to the computation if the computation might yield a NULL. Therefore, if you modify the definition of the Order_Number computed column, as Listing 6 shows, to

Order_Number AS ISNULL (Sales_Office +
  RIGHT('00000' +
  CAST (Order_Seq AS varchar (5)), 5), '')

SQL Server 2000 lets you add a primary key constraint on the computed column.

3. The CONVERT() Function with Char and Varchar Data Types

You probably learned quickly that in T-SQL, when you declare a variable's data type as either char or varchar without an explicit length specification, a default length of 1 results. Accordingly, when you invoke the stored procedure

CREATE PROCEDURE procTest @x char
AS
  SELECT @x

with a multiple-character string, such as

EXEC procTest 'abcde'

the stored procedure executes without generating warning or error messages, even though the output that the stored procedure returns is the single character a. However, hidden within BOL is an exception to this rule. Documentation in the "Conversion Functions" section of both the SQL Server 2000 and 7.0 BOLs states that SQL Server assumes a default length of 30 for a char or varchar data type without a length specification in the CONVERT() function.

Consequently, the statement

SELECT CONVERT (varchar, 1000)

returns the character string 1000 because SQL Server internally converts the statement to

SELECT CONVERT (varchar(30), 1000)

SQL Server acts the same with the CAST() function as well. The SELECT statement

SELECT CAST(1000 AS varchar)

also returns the character string 1000.

4. T-SQL Extensions in the ORDER BY Clause

T-SQL supports extensive additions to the ANSI SQL-92 ORDER BY clause. According to the ANSI SQL-92 standard, the ORDER BY clause can reference only columns listed in the SELECT clause. This limitation makes sense because the ANSI SQL-92 ORDER BY is an operation that occurs on the resultset that the SELECT query returns. And because the resultset contains only the columns listed in the SELECT clause, the ANSI SQL-92 ORDER BY can sort only by the data in the resultset and can't refer back to the original tables that the resultset comes from.

T-SQL, however, interprets the ORDER BY clause more liberally and supports referencing columns in the original tables, even if the columns don't appear in the SELECT clause. Hence, T-SQL permits the following statement:

USE pubs
SELECT au_fname, au_lname
FROM authors AS a
ORDER BY au_id

In this example, SQL Server sequences the resultset by the au_id column even though the au_id column isn't part of the SELECT clause.

T-SQL also supports the use of expressions in the ORDER BY clause, thereby providing a richer assortment of options than the ANSI SQL-92 standard provides. For example, if you work for an entity that's organized into multiple departments, you might have to present the departments in a particular nonalphabetic, nonalgorithmic sequence. Although creating a table that contains the organization's display sequence might work best, you can also use the CASE expression that Listing 7 shows.

T-SQL also supports correlated subqueries in the ORDER BY clause, which introduces a whole new level of power. For example, let's suppose that your human resources (HR) department asks you to generate an alphabetical list of department managers and a sublist of employees who report to them directly, also in alphabetical sequence, under each manager's name. Listing 8 shows the code to create an employee-manager reporting table.

In the final column of figures, which corresponds to management status, 1 represents a manager and 0 represents a direct report. Therefore, Mary, George, and Tom are managers. The next-to-last column stands for the department numbers, so Mary manages Department 1, George heads up Department 2, and Tom is the boss of Department 3. John reports to Mary because they're both in Department 1 and John's manager flag is set to 0. Similarly, Dick reports to George, and Harry and Betty report to Tom. Listing 9 shows the query, including a correlated subquery in the ORDER BY clause, that creates HR's requested report. Figure 2 shows the result of Listing 9's query. In Listing 9, you use the CASE expression in the SELECT clause to indent the names of the nonmanagement employees. This expression has no bearing on the name sequencing but serves as a visual aid in determining rank within the organization.

The ORDER BY clause contains the most interesting piece of the code. Because the HR department requires that the overall list be in alphabetic sequence by manager's name, you need to first associate the manager's name with each employee. The correlated subquery identifies the name of each employee's department manager. Because the manager's name is the first item in the ORDER BY clause, it becomes the major sort, achieving the goal of sequencing the report alphabetically by the manager's name and grouping the employees who report to each manager with their manager.

The second item in the ORDER BY clause fulfills the requirement that for each department, you need to list the manager first. Because a 1 in the Manager column identifies a manager and 0 represents a nonmanagement employee, you can use the Manager column to place the department manager's name at the top of each department's list. Finally, to satisfy the request that the individual employees within each department be listed alphabetically, the employee's name is included in the ORDER BY clause.

5. Successful CHECK Constraints

Application code is always the first line of defense against the entry of invalid data into a database, but in many situations, users can access databases through multiple avenues. By ensuring that users enter only valid values into the database, CHECK constraints provide an extra safeguard against users who find a way to manipulate the database data by using tools such as Microsoft Access.

BOL's guidance for applying CHECK constraints might mislead you. BOL for SQL Server 2000 and 7.0 states that CHECK constraints evaluate to either TRUE or FALSE and that "All values that do not evaluate to TRUE are rejected." If those BOL statements were correct, the CHECK constraint in the following code example would guarantee that the retail price of an item is greater than zero:

CREATE TABLE Products (Retail_Price money NULL CHECK (Retail_Price > 0))

However, the BOL statements are incorrect: CHECK constraints can yield a TRUE, FALSE, or UNKNOWN value, and SQL Server complies with the ANSI SQL-92 specification that says for the value to be rejected, the result must be FALSE. Therefore, conditions that evaluate to either TRUE or UNKNOWN are accepted. For the preceding Products table, values of $1, $2, and $3 are acceptable, whereas SQL Server rejects values of $0, -$1, -$2, and -$3. However, a NULL assignment will succeed because the column accepts NULL values, and because when Retail_Price is NULL, Retail_Price > 0 evaluates neither to TRUE nor to FALSE, but to UNKNOWN. Therefore, the CHECK constraint in the earlier statement is equivalent to

CHECK (Retail_Price IS NULL  OR  Retail_Price > 0)

Comments

Plain text