On September 27, 2007 I provided a puzzle where you were asked to

calculate the Mode of the distribution (the value that occurs most frequently).

You can find the details of the puzzle
here.

I got solutions to the puzzle from Alejandro Mesa, Steve Kass,

Umachandar Jayachandran, Michael Mac, Dieter Noeth, paPai and rlahoty.

Here I’ll describe a sample of the solutions which I find most elegant.

Most of the solutions would benefit from an index on the columns

CustomerID, EmployeeID:

USE Northwind; CREATE INDEX idx_cid_eid ON dbo.Orders(CustomerID, EmployeeID);Several people solved the two variations of the puzzle by using ranking

calculations and a table expression. This was also the approach I used in my

first attempt at solving the puzzle:

**Solution based on ranking, Variation 1**

WITH C AS ( SELECT CustomerID, EmployeeID, COUNT(*) AS Cnt, RANK() OVER( PARTITION BY CustomerID ORDER BY COUNT(*) DESC) AS Rnk FROM dbo.Orders GROUP BY CustomerID, EmployeeID ) SELECT CustomerID, EmployeeID, Cnt FROM C WHERE Rnk = 1;The code groups the orders by customer and employee calculating the count

of orders for each group. The RANK function is used to rank the

employees based on their order count in descending order within the

customer (PARTITION BY CustomerID). Remember that the RANK

function will assign the same rank value to all employees with the same order

count, and the highest will be assigned with a rank of 1. The rest is just a

matter of defining a table expression based on the query that calculates the

ranking values and filtering only the rows with a rank value of 1.

**Solution based on ranking, Variation 2**

WITH C AS ( SELECT CustomerID, EmployeeID, COUNT(*) AS Cnt, ROW_NUMBER() OVER( PARTITION BY CustomerID ORDER BY COUNT(*) DESC, EmployeeID DESC) AS RowNum FROM dbo.Orders GROUP BY CustomerID, EmployeeID ) SELECT CustomerID, EmployeeID, Cnt FROM C WHERE RowNum = 1;To solve Variation 2 of the puzzle you need to make a minor change to the

previous solution—use the ROW_NUMBER function instead of RANK

and add EmployeeID DESC to the ranking calculation’s ORDER BY

clause. In case of ties, the employee with the highest EmployeeID will be

chosen. BTW, it is not necessary to use ROW_NUMBER instead of

RANK since both are logically equivalent when the ORDER BY list is

unique. I just think that it is more natural to use ROW_NUMBER in case

I’m after only one row per customer.

Steve Kass also provided very elegant solutions that avoid the need to use a

table expression. He achieved this by using TOP (1) WITH TIES and

specifying the ranking calculation in the query’s ORDER BY clause. Here

are his solutions to the two variations of the puzzle:

**Solution based on ranking and TOP 1 WITH TIES, Variation 1**

SELECT TOP (1) WITH TIES CustomerID, EmployeeID, COUNT(*) AS Cnt FROM dbo.Orders GROUP BY CustomerID, EmployeeID ORDER BY RANK() OVER( PARTITION BY CustomerID ORDER BY COUNT(EmployeeID) DESC);

**Solution based on ranking and TOP 1 WITH TIES, Variation 2**

SELECT TOP (1) WITH TIES CustomerID, EmployeeID, COUNT(*) AS Cnt FROM dbo.Orders GROUP BY CustomerID, EmployeeID ORDER BY ROW_NUMBER() OVER( PARTITION BY CustomerID ORDER BY COUNT(EmployeeID) DESC, EmployeeID DESC);

Another approach that several people used in their solutions was to query

the Customers table and use the APPLY operator to apply a table

expression to each customer. The table expression uses a TOP (1) query to

pull the employee that occurs most frequently. Here are the solutions to the

two variations of the puzzle based on the APPLY operator:

**Solution based on APPLY, Variation 1**

SELECT C.CustomerID, D.EmployeeID, D.Cnt FROM dbo.Customers AS C CROSS APPLY (SELECT TOP (1) WITH TIES EmployeeID, COUNT(*) AS Cnt FROM dbo.Orders AS O WHERE O.CustomerID = C.CustomerID GROUP BY EmployeeID ORDER BY Cnt DESC) AS D;

**Solution based on APPLY, Variation 2**

SELECT C.CustomerID, D.EmployeeID, D.Cnt FROM dbo.Customers AS C CROSS APPLY (SELECT TOP (1) EmployeeID, COUNT(*) AS Cnt FROM dbo.Orders AS O WHERE O.CustomerID = C.CustomerID GROUP BY EmployeeID ORDER BY Cnt DESC, EmployeeID DESC) AS D;As you can see, the first solution specifies only the order count in the

ORDER BY clause and uses the WITH TIES option in order to return

multiple employees in case of ties. The second solution specifies both the

order count and employee id in the ORDER BY clause and does not use

the WITH TIES option since only one row is required per customer.

The previous solutions are elegant and fairly simple in their logic. But if

performance is of the highest priority, there is a solution that is a bit more

complicated, but is faster than all others. This solution will only work for the

second variation of the puzzle where you are after only one employee per

customer.

The solution first groups the data by customer and employee and calculates

the order count for each such group. The solution then groups the result by

customer, concatenates the order count and the employee id (call it

CntEmpID), and calculates the MAX CntEmpID value. You can use

various techniques to concatenate the values. The technique I used is to

multiply the order count by 2147483648 (max supported four-byte integer

plus 1) and adding the employee id. Once the max value is calculated (call it

MX), you can extract the order count by dividing MX by 2147483648

using integer division, and the employee id with a modulo (MX %

2147483648). Here’s the complete solution:

**Solution based on concatenation, Variation 2**

SELECT CustomerID, MX % CAST(2147483648 AS BIGINT) AS EmployeeID, MX / CAST(2147483648 AS BIGINT) AS Cnt FROM (SELECT CustomerID, MAX(CntEmpID) AS MX FROM (SELECT CustomerID, CAST(2147483648 AS BIGINT) * COUNT(*) + EmployeeID AS CntEmpID FROM dbo.Orders GROUP BY CustomerID, EmployeeID) AS D1 GROUP BY CustomerID) AS D2;

Cheers,

--

BG

0 comments

Hide comments