SQL Server 2005 introduces a new OVER clause, which enables window-based calculations. In "Ranking Functions," May 2004, and "Calculating Row Numbers in SQL Server 2005," April 2004, I discuss the OVER clause with respect to the four new analytical ranking functions (ROW_NUMBER, RANK, DENSE_RANK, and NTILE). This month, I’ll explain how you use the OVER clause with aggregate window functions.
OVER Clause Preliminaries
The idea behind using the OVER clause with aggregate functions is to enable window-based calculations. That is, SQL Server calculates the aggregate on an entire window of values, without requiring you to use a GROUP BY clause. This means that you can integrate both base attributes and aggregates in the same row.
In SQL Server 2005, the only clause that the OVER clause supports is the PARTITION BY clause, which is optional. If you don't use a PARTITION BY clause, T-SQL calculates the aggregate over the entire set returned from the query processing (e.g., joining, filtering). When you specify a PARTITION BY clause, T-SQL performs the calculation against a window containing the rows that have the same values in the PARTITION BY column list. Unlike the GROUP BY clause, the PARTITION BY clause doesn't limit the query to a single-result row per group. Later, I'll provide an example of using a PARTITION BY clause in an OVER clause.
Be aware that you can use the OVER clause only in expressions in SELECT or ORDER BY clauses. If you want to use an OVER clause in other types of clauses, you’ll have to first logically materialize the result of the expression in the SELECT list of a derived table or common table expression (CTE), then in an outer query refer to the result column returned from the table expression. Later, I’ll also provide examples for encapsulating the calculations that use the OVER clause in a table expression.
Techniques for Window-Based Aggregate Calculations
Let's look at some examples that will give you a feel for performing window-based aggregate calculations; first, two examples that don't use the OVER clause, then an example that uses the OVER clause. As the sample data for my queries, I’ll use the order details that Table 1 shows. The following query against the SalesOrderDetail table in the AdventureWorks database returns the results in Figure 1:
USE AdventureWorks; SELECT SalesOrderID AS orderid, SalesOrderDetailID AS line, ProductID AS productid, OrderQty AS qty, LineTotal AS val FROM Sales.SalesOrderDetail WHERE SalesOrderID IN(43659, 43660);
(You don't need to run this query; I provide it merely as background information.) Say you need to perform calculations involving both base attributes and aggregates. For example, for each order line that appears in Table 1, return the base attributes. Also return the word first if it’s the first line in the order (i.e., minimum line number), last if it’s the last (i.e., maximum line number), and mid if it’s neither. Finally, return the percentage of the quantity from the total order quantity and percentage of the values from the total order value. Figure 2 shows the desired results.
T-SQL programmers use two common solutions to perform such a task. One solution uses subqueries; the other uses joins. Listing 1 shows the solution that uses subqueries. The query generating the derived table D basically issues a correlated subquery for each aggregate that you need—SUM(OrderQty), SUM(LineTotal), MIN(SalesOrderDetailID), and MAX(SalesOrderDetailID). The outer query against the derived table D can now perform the calculations involving the base attributes and the aggregates.
This solution has two main disadvantages. First, it’s lengthy because it uses correlated subqueries. Second, each subquery involves an independent scan of the base data, so that the solution's performance is problematic.
Listing 2 shows the solution that's based on joins. The query generating the derived table Agg groups the data by order ID and for each order calculates all required aggregates. You join the base table with the aggregates table based on order ID match and this way gain access in the SELECT list to both base attributes and aggregates in your calculations. This method performs better than the subquery solution because you calculate all aggregates based on one scan of the data. However, you still have two separate steps here: accessing the base table and calculating the aggregates.
OVER Clause in Action
The third solution, which Listing 3 shows, uses the OVER clause. You simply embed the aggregate functions with the OVER clause in the SELECT list, along with the base attributes. You specify PARTITION BY SalesOrderID because you want the window of values to be the window of all order lines that have the same SalesOrderID as in the current base row.
This solution offers two advantages over the previous two. First, it calculates all aggregates that logically share the same window (order lines that have the same SalesOrderID), based on the same scan of the data, which provides better performance than the first method. You can easily see this efficiency if you examine the query's execution plan: one scan to grab the window, and a single aggregate operator calculating all aggregates. But you could also use the second solution to achieve good performance by accessing all aggregates through the same scan of the data. However, when you use the OVER clause, you don’t need two separate steps—one to grab the base data, and another to calculate the aggregates. Both base attributes and aggregates are simply mentioned inline in the same expressions in the SELECT list. The OVER clause gives you both good performance and simplicity.
Note that the OVER clause is defined by ANSI SQL 99, and that the ANSI OVER clause contains additional elements that aren't implemented in SQL Server 2005. One is an ORDER BY clause that allows running aggregates; the other is a ROWS clause that allows sliding aggregates.
OVER and Out
The new OVER clause lets you request window-based calculations. It improves both the performance and readability of your solutions and provides an alternative to using subqueries and joins, which typically result in lengthy queries. With any luck, future versions of SQL Server will include the missing elements of window aggregate functions: ORDER BY and ROWS clauses.