Sorting IP Addresses

Downloads
44942.zip

I often emphasize the importance of logic in T-SQL problem solving. In recent columns, I've given you a T-SQL puzzle and a purely logical puzzle to help you improve your logic skills. This article's T-SQL puzzle will stretch your T-SQL skills as you try to derive the best solution. In the sidebar "The Logical Puzzle," page 30, I present a new logic problem to try. Do your best to solve the puzzles yourself before looking at my solutions. If the trivial and intuitive approaches you come up with seem cumbersome, try to be creative and think outside the box.

Sorting IP Addresses

This month's puzzle involves sorting IP addresses. Your task is to write a query that sorts the IP addresses that are stored in the IPs table. Note that a correct IP sort logically means that you should consider each octet separately and as a number, as opposed to the way SQL Server stores it—as a character string. Run Listing 1's code to create the IPs table and populate it with sample data. Figure 1 shows the desired result. The first sort value is the first octet, so IP addresses with 3 in the first octet should be sorted first, then addresses with 22 in the first octet, then those with 192. The second sort value is the second octet—for example, among IP addresses that have 3 in the first octet, the one with 8 in the second octet should be sorted first, then those with 77 in the second octet. Similarly, the third octet is the third sort value, and the fourth octet is the fourth sort value.

If you attempt to simply sort the IP addresses by the ip column as follows

SELECT * FROM IPs ORDER BY ip;

you get the incorrect IP address sort that Figure 2 shows: the IP addresses sorted by their character-string representation. SQL Server compares characters in corresponding positions from left to right. Thus, IP addresses that start with the number 1 (e.g., 192.*) are listed before IP addresses starting with the number 2 (e.g., 22.*), which come before addresses starting with 3 (e.g., 3.*). Also, SQL Server doesn't recognize that an IP address is made up of four separate parts; it treats each IP address as a character string.

A different table design could have made sorting the IP addresses simple. However, for the sake of this exercise, assume that you have to face the task with the given table design.

You now have all the information you need to start working on the problem. I suggest you examine my solutions after trying to solve the problem yourself.

Trivial Solution: Using Lengthy Expressions

An intuitive approach to solving the problem is to specify four expressions in the ORDER BY clause, breaking the ip column into the four octets by using the SUBSTRING() and the CHARINDEX() functions, then converting them to integers. The following pseudo code shows how the solution query would look:

SELECT ip
FROM IPs
ORDER BY
  CAST(SUBSTRING(ip, 1,      p1 - 1     ) AS tinyint),
  CAST(SUBSTRING(ip, p1 + 1, p2 - p1 - 1) AS tinyint),
  CAST(SUBSTRING(ip, p2 + 1, p3 - p2 - 1) AS tinyint),
  CAST(SUBSTRING(ip, p3 + 1, 3          ) AS tinyint);

P1, p2, and p3 respectively represent the positions of the first, second, and third dots; you need to replace them with expressions that actually calculate each dot's position.

You calculate the position of the first dot by using the following expression:

CHARINDEX('.', ip) AS p1

To calculate the position of the second dot, you use a similar expression, but add a third argument to CHARINDEX() telling the function where to start looking for the dot. This third argument is p1+1 (the position number of the first dot plus 1):

CHARINDEX('.', ip, CHARINDEX('.', ip) + 1) AS p2

Similarly, to calculate the position of the third dot, you need to provide p2+1 as the third argument to CHARINDEX(), resulting in the following expression:

CHARINDEX('.', ip, CHARINDEX('.', ip, CHARINDEX('.', ip) 
  + 1) + 1) AS p3

Replacing p1, p2, and p3 in the pseudo code with the previous expressions gives you the solution query that Listing 2 shows.

Obviously, this solution query is long and hard to follow. If you later need to revise the query, you're bound to introduce bugs, so I don't recommend this solution.

Simplified Solution: Using Derived Tables

The first solution contained lengthy expressions, mainly because of the nesting required for the CHARINDEX() function's third argument. By using derived tables, you can significantly simplify the solution. Derived tables let you reuse the aliases you assign to expressions in the SELECT list, as Listing 3 shows.

In the innermost derived table (D1), you calculate the position of the first dot by using this expression:

CHARINDEX('.', ip) AS p1

In the next level's derived table (D2), you reuse the p1 alias to calculate the position of the second dot:

CHARINDEX('.', ip, p1+1) AS p2

Similarly, in the next level's derived table (D3), you reuse the p2 alias to calculate the position of the third dot:

CHARINDEX('.', ip, p2+1) AS p3

Now that you've calculated all dot positions and given them aliases, the outermost query against the derived table D3 simply uses those aliases in the ORDER BY clause's SUBSTRING() functions.

Outside the Box: Calculating IP Patterns

In an effort to further simplify the solution, I came up with a less-intuitive approach that you might call outside-the-box thinking. Create a table with all possible patterns of IP addresses, each with the four pairs of arguments required for the SUBSTRING() function (the start of each octet and the length of each octet).

Essentially, any IP address follows the pattern <1–3 characters>.<1-3 characters>.<1-3 characters>.<1-3 characters> . You'll use the LIKE predicate to match an IP address to a pattern that contains underscores to represent the digits. For example, the IP address 192.168.11.10 follows the pattern '___.___.__.__'. You can calculate the starting positions and lengths of all octets in the pattern by counting the number of characters preceding each octet and the number of characters in each octet. In the previous IP address, if we use sn to represent the starting position of nth octet and ln to represent the length of nth octet, the starting position and length values are s1=1, l1=3, s2=5, l2=3, s3=9, l3=2, s4=12, l4=2. After you match the IP address with its pattern, you can use the start position and length values accompanying the pattern as arguments for the SUBSTRING() functions.

You can manually populate such a table with all possible patterns and SUBSTRING() arguments. In total, you have 3⁴ patterns, resulting in 81 rows. Alternatively, you can create a view that returns all possible patterns by cross-joining four instances of an auxiliary table of numbers, filtering for only the numbers 1 through 3 in each instance to represent the possible octet lengths.

First, use the following code to create the auxiliary table of numbers and populate it with at least three values:

CREATE TABLE Nums(n int NOT NULL PRIMARY KEY);
INSERT INTO Nums VALUES(1);
INSERT INTO Nums VALUES(2);
INSERT INTO Nums VALUES(3);

Run the code that Listing 4 shows to create the IPPatterns view, which returns all 81 possible IP patterns along with the start and length values for all four octets for each pattern. The view's query cross-joins four instances of the Nums table (N1, N2, N3, and N4), filtering in each the n values that are less than or equal to 3. In the result of the cross join, the four n values for the four instances of Nums represent all possible combinations of octet lengths.

The SELECT list constructs the actual IP address pattern by replicating underscores for each octet and concatenating the underscores with dots between the octets. The SELECT list also calculates the start positions of the octets by summarizing the lengths of the preceding octets and the number of preceding dots plus 1. The lengths of the octets are simply the n values from the corresponding Nums instance.

Run the following query to see the patterns and arguments that the IPPatterns view returned:

SELECT * FROM IPPatterns;

Table 1 shows the abbreviated result. Having the IPPatterns view (or table) in place lets you write an extremely simple query to get the desired result, as Listing 5 shows. The query joins the base table to the IPPatterns view based on a match between the IP address and the pattern that it follows. The query's ORDER BY clause has four SUBSTRING() functions, each of which extracts an octet according to the start and length arguments that the IPPatterns view provides. The code converts each octet string to a tinyint data type to obtain the correct sort.

Solution That Uses PARSENAME()

All the previous solutions use standard SQL constructs, so they're ANSI-compliant. But if you don't mind proprietary T-SQL solutions, you can use the PARSENAME() function, which Microsoft designed to return a requested part of an object name. Because IP addresses are very similar to object names—four parts separated by dots—the PARSENAME() function fits the octet extraction from an IP address like a glove.

Here's an example of how to use the function to create a solution for this problem:

SELECT ip
FROM IPs
ORDER BY
  CAST(PARSENAME(ip, 4) 
AS tinyint),
  CAST(PARSENAME(ip, 3) 
AS tinyint),
  CAST(PARSENAME(ip, 2) 
AS tinyint),
  CAST(PARSENAME(ip, 1) 
AS tinyint);

This query invokes the PARSENAME() function in the ORDER BY clause once for each octet and converts the resulting octet string to tinyint to get the correct result.

All the solutions work similarly because they each perform a scan of the base table, then sort by four expressions. The main difference between the solutions is their complexity. This solution is very simple, but it's a proprietary solution because it uses the nonstandard PARSENAME() function. Also, it's not generic like the other solutions, which you can adapt for other scenarios that follow patterns, because the PARSNAME() function works only with exactly four elements.

It's always good when a solution to a T-SQL problem is both intuitive and simple. When the intuitive solution is complex, you should keep looking for other solutions. Apply logic and try different solutions until you come up with one you're satisfied with. And of course, keep practicing both T-SQL and pure logic.

Comments

Plain text