As you've no doubt surmised from my previous articles in this series ("TOP Troubles," January 2006, InstantDoc ID 48616 and "Making TOP Better," February 2006, InstantDoc ID 48937), Microsoft's implementation of the TOP option in SQL Server 2005 has some significant shortcomings that hinder its usefulness to DBAs. For example, when using TOP with an ORDER BY clause, ORDER BY serves two different sorting functions—determining the order of calculation and determining the order of records in the output cursor—but you can't separate the two functions. In "Making TOP Better," I describe alternatives to TOP that give you full control over the different sorting functions and even let you request the TOP n rows for each group. I'll conclude the series here by describing some additional problems that TOP poses when it's used with data-modification statements (i.e., INSERT, UPDATE, and DELETE) and alternative techniques that avoid these problems.
TOP and Data Modifications
In SQL Server 2005, Microsoft has added support for using the TOP option with data-modification statements. (For more information about using TOP with modification statements and the applications that might benefit from it, see "A More Versatile TOP," September 2005, InstantDoc ID 47690.) But using TOP with modification statements is even more ambiguous than using TOP with SELECT queries because with modifications, you can't specify an ORDER BY clause at all. Essentially you end up modifying the TOP (n) rows that SQL Server happens to access first.
To demonstrate using TOP with modifications, first run the following code, which copies the contents of the Orders table from the Northwind database to the MyOrders table:
SET NOCOUNT ON; USE Northwind; SELECT * INTO dbo.MyOrders FROM dbo.Orders ORDER BY RAND(CHECKSUM(NEWID()));
Now, consider the following DELETE statement:
DELETE TOP (3) FROM dbo.MyOrders;
Which three rows will be deleted? You can’t specify an ORDER BY clause that would tell SQL Server, for example, to delete the three orders with the minimum order dates, using the order ID as a tiebreaker. This DELETE statement simply deletes the first three orders that SQL Server finds. By the way, if you’re curious about which rows were actually deleted, remember that you can now specify an OUTPUT clause in modification statements to request the affected rows as output, as this sample statement does:
DELETE TOP (3) FROM dbo.MyOrders OUTPUT deleted.OrderID, deleted.OrderDate;
Similarly, you can’t dictate the order in which the following UPDATE statement will modify rows:
UPDATE TOP (3) dbo.MyOrders SET Freight = 123.456;
Again, SQL Server will modify the first three rows that it finds first.
If you want to see which rows were modified, use the OUTPUT clause like this:
UPDATE TOP (3) dbo.MyOrders SET Freight = 123.456 OUTPUT inserted.*;
As I describe in “A More Versatile TOP,” certain applications can benefit from the ability to use the TOP option with modifications, when you don’t care about the order in which rows will be chosen—for example, deleting or updating large volumes of data in batches, to avoid log-explosion and lock-escalation problems. However, obviously the use of the term TOP in such cases is a misnomer.
The Ideal TOP with Modifications
Imagine how nice it would have been if TOP supported an OVER clause with modification statements, where you could specify the order in which rows were chosen. For example, you could delete the three orders with the minimum order dates, by using the order ID as the tiebreaker, as this example shows:
DELETE TOP (3) OVER(ORDER BY OrderDate, OrderID) FROM dbo.MyOrders;
Or, you could even use the PARTITION BY clause to choose rows per each group. For example, you could use a statement like this to delete the three oldest orders for each customer:
DELETE TOP (3) OVER(PARTITION BY CustomerID ORDER BY OrderDate, OrderID) FROM dbo.MyOrders;
Alas, T-SQL 2005 doesn't support this syntax. Fortunately, by using common table expressions (CTEs) and the ROW_NUMBER function, you can gain full control of the order in which SQL Server will choose rows for modification purposes and still be able to limit the number of rows. For example, the following code deletes the three orders with the minimum order dates, using the order ID as the tiebreaker:
WITH OrdersRN AS ( SELECT *, ROW_NUMBER() OVER(ORDER BY OrderDate, OrderID) AS RowNum FROM dbo.MyOrders ) DELETE FROM OrdersRN WHERE RowNum <= 3;
If you want to see the OrderID and OrderDate values of the deleted rows, specify OUTPUT deleted.OrderID, deleted.OrderDate right after OrdersRN in the outer DELETE statement (…DELETE FROM OrdersRN OUTPUT deleted.OrderID, deleted.OrderDate WHERE…). The OrdersRN CTE query expression generates row numbers based on OrderDate, OrderID order. The outer DELETE simply filters only those rows that have a row number that's less than or equal to 3.
Additionally, the ROW_NUMBER function supports a PARTITION BY clause, so that you can request to delete TOP (n) rows for each group. For example, the following code deletes the three oldest orders for each customer:
WITH OrdersRN AS ( SELECT *, ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY OrderDate, OrderID) AS RowNum FROM dbo.MyOrders ) DELETE FROM OrdersRN WHERE RowNum <= 3;
Similarly, you can use the CTE and ROW_NUMBER technique to achieve ordered updates. For example, the following code updates the Freight values of the three most recent orders:
WITH OrdersRN AS ( SELECT *, ROW_NUMBER() OVER(ORDER BY OrderDate DESC, OrderID DESC) AS RowNum FROM dbo.MyOrders ) UPDATE OrdersRN SET Freight = 123.456 WHERE RowNum <= 3;
The following query updates the Freight values of the three most recent orders for each customer:
WITH OrdersRN AS ( SELECT *, ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY OrderDate DESC, OrderID DESC) AS RowNum FROM dbo.MyOrders ) UPDATE OrdersRN SET Freight = 123.456 WHERE RowNum <= 3;
In short, you have full control over the order in which rows are selected for modification.
TOP Without TOP
We've finally reached the end of our discussion about the SQL Server 2005 TOP option, its ambiguous nature, and alternatives for overcoming its limitations. You've seen how using CTEs and the ROW_NUMBER function can give you full control of the two different sorting functions that you might need with TOP queries as well as provide partitioning capabilities and even let you perform ordered data modifications. Ironically, you've gained all the missing functionality in TOP by not using TOP at all!
Learn more at "Jump Start: Using Order By."