Reader to Reader: Storing and Sorting IP Addresses

In Web-based database applications, you often need to process IP addresses. For example, you can track Web site customers' activities by counting hits to your site from a specific IP address or addresses. Also, your application's security subsystem might use IP addresses to validate and permit or reject user access to an application.

When stored inefficiently, IP addresses in large databases can occupy too much space and slow database queries. You can store IP addresses in a database to occupy less space, but the storage format must also let you quickly order IP addresses for efficiency. For example, a query such as

SELECT *
FROM WebStats
ORDER BY

will sort the result set much faster than the query

SELECT *
FROM WebStats
ORDER BY

because comparing integers is much faster than comparing character strings.

Though by definition an IP address is a 32-bit integer number, generally, an IP address is represented as a character string in the form of A1.A2.A3.A4 where A1, A2, A3, and A4 are integers from 1 to 255. A string in the form 204.56.55.28 is easier to work with than in the form CC38371C₁₆ or 11001100001110000011011100011 100₂. Therefore, an IP address's storage format needs to provide an easy and fast method to display stored data as a character string.

The character string form A1.A2.A3.A4 is inefficient to use in a query because it's redundant and requires about 15 bytes of storage space, whereas only 4 bytes are necessary to store an IP address. A1, A2, A3, and A4 are 4 bytes total (A1 is 1 byte, etc.) and the entire string A1.A2.A3.A4 consists of 7 to 15 characters (read bytes, because each ANSI character is stored in 1 byte). Furthermore, if you store IP addresses as strings, you can't sort them properly because the data will be sorted as character strings, not as numbers, as Table 1 shows.

What you want is the result Table 2 shows. To get this result, you need to sort the data first by A1, then by A2, A3, and A4. Therefore, you first need to extract each A1, A2, A3, A4 from string form, convert them into unsigned integers, then sort them by using the integer data type instead of varchar(15), as Listing 1 shows.

The best way to satisfy space and processing requirements (i.e., data sorting, displaying, and querying) is to store A1, A2, A3, and A4 separately as four tinyint columns in a database. Then you can use the SQL Server query that Listing 2 shows to sort, query, and display the IP addresses.

However, you can use the query in Listing 1 when you don't want to store A1, A2, A3, and A4 separately as tinyint columns in a database or when your database is so large that you can't quickly convert its data into another format. You can also use this query when you don't care about performance—for example, when querying a small table with few records in it.

The column IP is varchar(15) and stores IP addresses in the form A1.A2.A3.A4 as you receive it from the Internet Information Server (IIS). To sort the IP address data, the SQL Server query needs to ORDER BY A1, A2, A3, A4. To do so, you need to extract A1, A2, A3, A4 values from column IP. Note that sometimes after you call a SUBSTRING() function, a dot might appear in the result. SQL Server 7.0 considers the dot as a decimal point and prevents you from directly converting the result into an integer; therefore, we use intermediate conversion to a float value.

Using these effective ways to store and sort IP addresses in a database can save disk space. In addition, these queries can accelerate your database performance when you're working with IP addresses.

—Alexey Ruban
[email protected]

Comments

Plain text