Downloads |
---|

128810.zip |

The ISO SQL Standard defines a feature called *row value constructor,* which in essence is a value or a list of values that construct a row. This feature is also referred to as *vector expression*. An example of a row value constructor is

ROW(101, '20100212', 10.10)

The use of the ROW keyword is optional. In database platforms that support this feature, developers often omit it, like so:

(101, '20100212', 10.10)

The ordinal positions of the elements in this construct are significant, as they are in other SQL column lists (e.g., SELECT, ORDER BY). Hence this construct can’t be considered a true equivalent of a *tuple* from the relational model. Some relational model purists advise against its use for this reason. I find this feature useful; it makes the code less verbose, more readable, and easier to maintain.

Standard SQL supports the use of row value constructors in many language elements. SQL Server 2008 introduced very minimal support for the feature in the form of an enhanced VALUES clause that allows defining multiple rows. The enhanced VALUES clause can be used in an INSERT statement and to define a derived table. (For details, see “SQL Server 2008’s T-SQL Features.”) But a lot of functionality related to row value constructors is still missing as of SQL Server 2008 (including R2). In this article I introduce the missing functionality. If you think this functionality is important, you can vote for Hugo Kornelis’s request to add the feature to SQL Server. Go to the Microsoft Connect page at connect.microsoft.com/SQLServer/feedback/details/299231.

Most of the code that I present in this article won’t run on SQL Server because the features aren’t yet supported. However, you can run logically equivalent code that’s currently supported; use Listing 1 to create the sample data used by those code samples.

Some of the samples refer to an argument called @key, as well as arguments called @c1, @c2, and @c3. In those examples you can use the following code to declare and initialize those variables:

DECLARE @key AS INT = 2;

DECLARE

@p1 AS INT = 201,

@p2 AS DATETIME = '20100212',

@p3 AS NUMERIC(12, 2) = 10.10;

###

Assignment

The SQL standard supports assignments in an UPDATE statement using row value constructors, like so (don’t try to run this code, because it’s not supported as of SQL Server 2008 R2):

UPDATE dbo.T1

SET (c1, c2, c3) = (@p1, @p2, @p3)

WHERE keycol = @key;

The logically equivalent form supported in SQL Server is

UPDATE dbo.T1

SET c1 = @p1,

c2 = @p2,

c3 = @p3

WHERE keycol = @key;

This example isn’t very exciting and doesn’t clearly demonstrate the advantage of row value constructors; the true benefit becomes apparent when used with a subquery:

UPDATE dbo.T1

SET (c1, c2, c3) = (SELECT T2.c1, T2.c2, T2.c3

FROM T2

WHERE T2.keycol = T1.keycol)

WHERE keycol = @key;

This is logically equivalent to

UPDATE dbo.T1

SET c1 = (SELECT T2.c1

FROM T2

WHERE T2.keycol = T1.keycol),

c2 = (SELECT T2.c2

FROM T2

WHERE T2.keycol = T1.keycol),

c3 = (SELECT T2.c3

FROM T2

WHERE T2.keycol = T1.keycol)

WHERE keycol = @key;

The form that uses a row value constructor is less verbose and lends itself to better optimization. Figure 1 shows the plan I got for the form that’s currently supported by SQL Server (with multiple subqueries). Note that the plan shows a separate visit to the row in T2 for each subquery.

There are existing supported alternatives in SQL Server that do optimize well, including the nonstandard join-based UPDATE statement and the standard MERGE statement. But the suggested UPDATE syntax that uses row value constructors, in addition to being standard, is also clearer and more elegant than the supported alternatives.

** **

### Comparison Predicates

Standard SQL also supports comparison predicates that use row value constructors. The general form of such a predicate is

<RVC1> <comparison_operator> <RVC2>

The operators equals (=) and not equal to (<>) follow the rules for *equality operations* in the standard. An example for use of the equals operator is

SELECT keycol, c1, c2, c3

FROM dbo.T1

WHERE (c1, c2, c3) = (@p1, @p2, @p3);

This is logically equivalent to

SELECT keycol, c1, c2, c3

FROM dbo.T1

WHERE c1 = @p1

AND c2 = @p2

AND c3 = @p3;

Regarding treatment of NULLs, remember that SQL follows three-valued predicate logic, meaning that a comparison of two NULLs or a non-NULL value with a NULL yields UNKNOWN. In predicate-based query filters such as WHERE, UNKNOWN is filtered out. Standard SQL supports an interesting predicate called *distinct predicate* that has the syntax IS \\[NOT\\] DISTINCT FROM. It can be used in two ways—IS NOT DISTINCT FROM yields TRUE when both sides are known to be equal or when both are NULLs, and FALSE otherwise. IS DISTINCT FROM yields TRUE when the two sides are known to be different or when one side is NULL and the other isn’t, and FALSE otherwise. SQL Server currently doesn’t support this predicate (as of SQL Server 2008 R2), but if you find it useful, you can vote for Steve Kass’s recommendation to add it. Go to the Microsoft Connect page at connect.microsoft.com/SQLServer/feedback/details/286422.

In standard SQL you can compare two row value constructors and consider two NULLs as equal for this purpose (or more accurately, as not being distinct from each other), like so:

WHERE (c1, c2, c3) IS NOT DISTINCT FROM (@p1, @p2, @p3);

The equals operator for row value constructor can be very handy when expressing a composite join:

SELECT T1.keycol AS T1_key, T2.keycol AS T2_key

FROM dbo.T1 JOIN dbo.T2

ON (T1.c1, T1.c2, T1.c3) = (T2.c1, T2.c2, T2.c3);

Similarly, the SQL standard supports the not equal to operator with vector expressions:

SELECT keycol, c1, c2, c3

FROM dbo.T1

WHERE (c1, c2, c3) <> (@p1, @p2, @p3);

This is logically equivalent to the currently supported form:

SELECT keycol, c1, c2, c3

FROM dbo.T1

WHERE c1 <> @p1

OR c2 <> @p2

OR c3 <> @p3;

And again, in standard SQL you can compare two row value constructors and consider a non-NULL value versus a NULL as being distinct from each other, like so:

WHERE (c1, c2, c3) IS DISTINCT FROM (@p1, @p2, @p3);

The operators less than (<), greater than (>), less than or equal to (<=), and greater than or equal to (>=) follow the rules for *ordering operations* in the standard. Use of these operators with vector expressions can significantly reduce the length and complexity of predicates, as well as lends itself to good optimization. Following are examples of predicates and their logical equivalents that SQL Server does support (as of SQL Server 2008 R2).

Less than operator:

SELECT keycol, c1, c2, c3

FROM dbo.T1

WHERE (c1, c2, c3) < (@p1, @p2, @p3);

Logically equivalent to:

SELECT keycol, c1, c2, c3

FROM dbo.T1

WHERE (c1 < @p1)

OR (c1 = @p1 AND c2 < @p2)

OR (c1 = @p1 AND c2 = @p2 AND c3 < @p3);

Greater than operator:

SELECT keycol, c1, c2, c3

FROM dbo.T1

WHERE (c1, c2, c3) > (@p1, @p2, @p3);

Logically equivalent to:

SELECT keycol, c1, c2, c3

FROM dbo.T1

WHERE (c1 > @p1)

OR (c1 = @p1 AND c2 > @p2)

OR (c1 = @p1 AND c2 = @p2 AND c3 > @p3);

Less than or equal to operator:

SELECT keycol, c1, c2, c3

FROM dbo.T1

WHERE (c1, c2, c3) <= (@p1, @p2, @p3);

Logically equivalent to:

SELECT keycol, c1, c2, c3

FROM dbo.T1

WHERE (c1 = @p1 AND c2 = @p2 AND c3 <= @p3)

OR (c1 = @p1 AND c2 < @p2)

OR (c1 < @p1);

Greater than or equal to operator:

SELECT keycol, c1, c2, c3

FROM dbo.T1

WHERE (c1, c2, c3) >= (@p1, @p2, @p3);

Logically equivalent to:

SELECT keycol, c1, c2, c3

FROM dbo.T1

WHERE (c1 = @p1 AND c2 = @p2 AND c3 >= @p3)

OR (c1 = @p1 AND c2 > @p2)

OR (c1 > @p1);

You can clearly see the benefits in the support for row value constructors by these operators in terms of reducing the length of code and complexity compared with the logically equivalent options. The benefits of the equals and not equal to operators are easy to see in business cases. As for the other** **operators, such as greater than, an example for a business case is paging. Suppose that you’re implementing a paging solution that involves multiple sort columns (e.g., c1, c2, c3). For the paging solution to be deterministic, you need to ensure that together, the ordering elements uniquely identify a row. If not, you can add the primary key as a tiebreaker. At any rate, whenever the user asks for the next page, the application invokes a procedure, passing the values of the ordering elements from the last row in the last page retrieved. The procedure can then invoke a query that filters all rows greater than the passed anchor row, like so:

SELECT TOP(@pagesize) keycol, c1, c2, c3

FROM dbo.T1

WHERE (c1, c2, c3) > (@p1, @p2, @p3)

ORDER BY c1, c2, c3;

This code is clear and concise, and it lends itself to good optimization provided an index is defined on (c1, c2, c3).

It’s important to note that row value constructors don’t address a couple of needs related to this paging idea. First, the ordering direction of all elements is the same (ascending when using > and descending when using <). Second, treatment of three-valued logic in terms of NULLs is different than with ordering in the sense that a comparison with a NULL yields UNKNOWN and the row is filtered out. But this feature could be very useful as long as all ordering elements are defined as NOT NULL and you do need them all to be in the same direction.

###

IN and BETWEEN

After considering the operators equals, not equal to, less than or equal to, and greater than or equal to, it’s easy to see that row value constructors can also be used with IN and BETWEEN predicates. Here’s an example using IN:

SELECT keycol, c1, c2, c3

FROM dbo.T1

WHERE (c1, c2, c3) IN (SELECT c1, c2, c3

FROM T2);

Assuming the subquery returns the rows R1, R2, …, Rn, this query is then equivalent to

SELECT keycol, c1, c2, c3

FROM dbo.T1

WHERE (c1, c2, c3) = R1

OR (c1, c2, c3) = R2

OR ...

OR (c1, c2, c3) = Rn;

Here’s an example with BETWEEN:

SELECT keycol, c1, c2, c3

FROM dbo.T1

WHERE (c1, c2, c3) BETWEEN (@pl1, @pl2, @pl3)

AND (@pr1, @pr2, @pr3) ;

This is logically equivalent to

SELECT keycol, c1, c2, c3

FROM dbo.T1

WHERE (c1, c2, c3) >= (@pl1, @pl2, @pl3)

AND (c1, c2, c3) <= (@pr1, @pr2, @pr3) ;

**MIN and MAX Aggregates**

Earlier I showed how the SQL standard supports comparison predicates with row value constructors. The MIN and MAX aggregates also perform comparisons, only across rows. So the idea of row value constructor comparisons can therefore be extended to aggregates as well. Here’s an example using MAX:

SELECT grpcol, MAX(c1, c2, c3)

FROM dbo.T1

GROUP BY grp;

Note that the goal here is to return the maximum c1 value in each group, the maximum c2 value out of the rows with the maximum c1 value, and the maximum c3 value out of the rows with the maximum c1 and maximum c2 values. This doesn’t mean** **return the maximum of c1, c2, and c3. Some platforms support functions called GREATEST and LEAST that return the maximum and minimum, respectively, out of the given list of values in the same row. Although it would be nice for a future version of SQL Server to include such support, that’s not the focus of this discussion. Logically, the proposed MAX of a row value constructor is similar to what you can currently achieve in SQL Server, like so:

WITH C AS

(

SELECT grpcol, c1, c2, c3,

ROW_NUMBER() OVER(PARTITION BY grpcol ORDER BY c1 DESC, c2 DESC, c3 DESC) AS rownum

FROM dbo.T1

)

SELECT grpcol, c1, c2, c3

FROM C

WHERE rownum = 1;

One of the ways to avoid the confusion with the logic of the GREATEST and LEAST functions is to explicitly use the ROW keyword, like so:

SELECT grpcol, MAX(ROW(c1, c2, c3))

FROM dbo.T1

GROUP BY grpcol;

Adding such support in a future version of SQL Server would be valuable. Such an aggregate could help solve business problems such as the TOP *N* Per Group problem. For details about this problem, including a performance discussion about currently available solutions, see “Optimizing TOP *N* Per Group Queries.” In “Optimizing TOP *N* Per Group Queries,” I cover three possible solutions:

- A solution based on APPLY that I recommend using with dense partitions when a good index is available.
- A solution based on ROW_NUMBER that I recommend using with nondense partitions when a good index is available.
- A solution based on a carry-along-sort concept (concatenation technique) that I recommend using regardless of density when there’s no good index in place and you can’t or don’t want to create one.

Listing 2 contains the query I used to implement the third approach. With support for row value constructors in MIN and MAX aggregates, you can implement a similar idea only with far clearer code, like so:

SELECT custid, MAX(orderdate, orderid, filler)

FROM dbo.Orders

GROUP BY custid;

The benefit of such support for row value constructors becomes clear when you compare this query with the one in Listing 2.

###

Cast Your Vote

In this article I introduced the concept of row value constructors from standard SQL and explained the different language constructs that can benefit from using it. This construct would add a lot of value if a future version of SQL Server included support for it. To vote for Hugo Kornelis’s request to add the feature to SQL Server, go to the Microsoft Connect page at connect.microsoft.com/SQLServer/feedback/details/299231.