I'm writing a Web-based application that will retrieve data from a Microsoft SQL Server 2005 database. The application includes a simple report that provides the total annual sales per district, according to the year that the user enters. I want to rank the data by sales amount as the application retrieves the rows. For example, if the Southwest district has the most sales for a particular year, I want that district to be ranked number one and appear in the first row returned from the database. How do I include in a SQL Server stored procedure the logic necessary to support this functionality?
You're in luck! SQL Server 2005 includes a set of functions that analyze and rank values returned by a query. What takes several lines of code to achieve in earlier SQL Server versions can now be condensed into a single function and its arguments. Simply include the function as a column definition in the SELECT clause of your query, and SQL Server analyzes the specified values and automatically assigns a ranking value to each row, based on that analysis.
SQL Server includes four ranking functions: ROW_NUMBER, RANK, DENSE_RANK, and NTILE. Of these functions, you'll probably want to use one of the first three, but we'll give you a quick rundown of all of them so that you can determine which one best suits your needs. To demonstrate each of these functions, we'll use the AnnualSales table definition and INSERT statements that Listing 4 shows. The user-defined Sales schema owns this table. In SQL Server 2005, each database object is owned by a schema that acts as a namespace for organizing the objects. (As you can see, this is a basic table definition. Your source table might include additional constraints, or you might need to join tables to retrieve the necessary data.) To demonstrate the functions, we populate the AnnualSales table with a small sampling of data to provide sales figures for various districts, such as Northwest, Southwest, and Midwest.
Now take a look at Listing 5. This listing contains a stored procedure (sp_SalesRanking) that queries the AnnualSales table. The SELECT statement includes the four ranking functions as column definitions. Notice that each definition contains the function-name and an OVER clause, which specifies the order in which to rank the data. The OVER clause includes an ORDER BY subclause that works as it does in a typical SELECT statement: You can specify one or more columns, and you can specify whether the values in those columns are sorted in ascending (ASC) or descending (DESC) order. If you specify neither ASC nor DESC, the database engine assumes ASC.
ROW_NUMBER. The first ranking function in the stored procedure is ROW_NUMBER, as callout A in Listing 5 shows. This function sequentially numbers each row according to the values retrieved from the Amount column. Because the values are ranked in descending order, the highest value in the Amount column is ranked 1.
Now refer to Table 1, which provides the result set returned by executing sp_SalesRanking, as Listing 5 shows. (The assumption is that a year value is passed from the application code to the @year parameter defined in the stored procedure.) The RowNumber column in Table 1 contains the values that the ROW_NUMBER function generates. The database engine assigns these numbers based on how it ranks the rows. For example, the Northwest district was one of the two districts that had the highest sales for that year, so it's ranked number 1; the Northeast had the lowest sales for that year, so it's ranked number 6.
RANK. In Table 1, notice that some rows (e.g., the first two rows) contain identical Amount values. Yet the rows are ranked sequentially, even though they have the same rank. In some cases, you can add columns to the ORDER BY subclause to help further distinguish rows, but doing so isn't always practical or possible. For this reason, SQL Server 2005 also supports the RANK function, which callout B in Listing 5 shows. This function takes into account identical ranked values. As you can see in Table 1's Rank column, which contains the output from the RANK function, rows with identical rank values are assigned the same rank. So the first two rows are ranked 1, the second two rows are ranked 3, and the last two rows are ranked 5 and 6. Because the second row contains the same value as the first row, rank 2 is skipped; the same is true of rank 4.
DENSE_RANK. You might decide that you want your rows to be ranked sequentially without skipping rank values. To achieve this goal, you can use the DENSE_RANK function, which callout C in Listing 5 shows. As with the preceding two functions, DENSE_RANK ranks the values in the Amount column, but this function takes into account identically ranked values and unified sequential numbering. The results appear in Table 1's DenseRank column. Identical values in the Amount column are ranked at the same level, and no ranking values are skipped.
NTILE. The final ranking function is NTILE, which callout D in Listing 5 shows. This function is different from the others in that it sorts rows into tiles—groups of rows based on the values in the column being ranked (e.g., the Amount column). The integer assigned to the NTILE function determines the number of tiles, which in this case is three. SQL Server arrives at the number of rows in each tile by dividing the total number of rows that the query returns by the integer assigned to the function. Because the query returns six rows, each of the three tiles contains two rows (6 / 3 = 2). The NTILE function ranks the rows according to the values in the Amount column, as the other functions do. If you refer to the NTile column in Table 1, you can see that three tiles have been created and that the Amount values have been ranked in descending order.
Another aspect of the stored procedure's SELECT statement is that the statement includes an ORDER BY clause that sorts the query results by the Amount values. Keep in mind that although the ranking functions can analyze and rank values (achieved in part by using an ORDER BY subclause), you still need to sort the result set according to how you want to display the final results. Otherwise, your ranking values won't appear in sequential order. You can use ranking functions in the ORDER BY clause; doing so will sort the result set by rank but won't assign a ranking value to each row. Therefore, you must include the ranking function as a column definition, as Listing 5 shows.
The ranking functions can be a valuable asset for ranking the results that your queries return. And the functions support additional capabilities beyond what we've shown you. Refer to SQL Server 2005 Books Online (BOL) to learn more about how to use these functions to analyze and rank your data.