Importing IP Information

Congratulations to Muhammad Nadeem Akhter, Solution Architect at AAJ Technologies in Fort Lauderdale, Florida, who won first prize of $100 for the best solution to the October Reader Challenge, "Importing IP Information". Honorable mention goes to Byron Hynes, first-place winner in our September Reader Challenge. Here’s a recap of the problem and the solution to the October Reader Challenge.

Problem

For reporting purposes, Martin uses bulk copy program (bcp) to import Web data about IP networks into a SQL Server 2000 database. Each row in the data file contains a range of IP addresses that defines a network and some of the network's characteristics such as the network type (e.g., Class A, Class B). The bcp format-file information below includes the data file's relevant columns:

8.0
2
1   SQLCHAR   0   15   "\t"      1   StartIP
          SQL_Latin1_General_CP1_CI_AS
2   SQLCHAR   0   15   "\r\n"   2   EndIP
          SQL_Latin1_General_CP1_CI_AS

Martin also needs to define indexes that will let SQL Server perform range searches efficiently on these networks or IP address ranges. Let’s assume the following table schema:

CREATE TABLE IPs ( StartIP varchar(15) NOT NULL,
        EndIP varchar(15) )

Help Martin design an efficient solution that can perform the data import with no modifications to the input file and optimize the table for queries.

Solution

Martin realizes that SQL Server can perform efficient range searches and lookups if he converts each IP address to a binary format instead of a string. He can accomplish this conversion by performing the following calculations on each octet:

Numeric Value = CAST(Octet1 AS binary) + CAST(Octet2 AS binary)
             + CAST(Octet3 AS binary) + CAST(Octet4 AS binary)

By converting each octet to a byte and concatenating the values, Martin can obtain a compact representation of the IP address in binary format. The binary representation occupies 4 bytes and can order the IP addresses or perform range searches. Now Martin can code this logic in a user-defined function (UDF) and use the function to define a computed column on the table. Additionally, he can index this computed column—a new feature in SQL Server 2000. This method lets him efficiently import the data file without modifying the input file and automatically update the index created for the searches.

Martin knows that indexing on computed columns carries several restrictions because the computed column has to be deterministic and precise. (For more information about deterministic and nondeterministic functions, see these topics in SQL Server "Books Online"—"BOL.") Therefore, he needs to implement the UDF so that it is deterministic. The code below shows one way to implement a UDF that fulfills the requirement:

CREATE FUNCTION ipval
( @ip varchar(15) )
RETURNS binary(4)
WITH schemabinding
AS
BEGIN
   DECLARE @chars varchar(3), @char char(1), @charcnt tinyint, 
              @val bigint, @oct tinyint
   SET @chars = ''
   SET @charcnt = 1
   SET @oct = 4
   SET @val = 0
   WHILE( @oct > 0 )
   BEGIN
      SET @char = substring( @ip + '.', @charcnt, 1 )
      IF @char = '.'
      BEGIN
            IF @val IS NULL
                SET @val = CAST( CAST( @chars AS int ) AS binary(1) )
            ELSE
                SET @val = @val + CAST( CAST( @chars AS int ) 
                         AS binary(1) )
            SET @chars = ''
            SET @oct = @oct - 1
      END
      ELSE
            SET @chars = @chars + @char
      SET @charcnt = @charcnt + 1
   END
   RETURN @val
END

This implementation avoids system functions such as PATINDEX() or CHARINDEX() (because they're nondeterministic) and math functions such as POWER() (because they're imprecise and can't be indexed).

Martin can use this UDF in the computed column as follows:

ALTER TABLE IPs ADD StartIP_ AS
      ISNULL(dbo.ipval( StartIP ), 0x0 ),
        EndIP_ AS ISNULL(dbo.ipval( EndIP ), 0x0 ),
      CONSTRAINT pk_ips PRIMARY KEY
                 CLUSTERED( StartIP_, EndIP_ )

The ISNULL() check for the UDF's return value in the computed-column expression ensures that the result of the expression is always a non-NULL value. This result lets Martin use the columns as part of the primary key constraint. By implementing the index on the server side, Martin achieves both his goals. Additionally, Martin can eliminate the UDF by specifying an inline expression that converts the IP address into binary format to achieve better performance while inserting rows.

The script below shows the complete implementation of the table. It also shows how to perform the BULK INSERT operation into the table and compares the script’s performance when you import the file into the table without indexes to its performance after you define the index on the computed column. Note that Nick can save the format file to a specified location, then use that path in the following script:

-- Create the ipval UDF. USE tempdb GO CREATE FUNCTION ipval ( @ip varchar(15) ) RETURNS binary(4) WITH schemabinding AS BEGIN DECLARE @chars varchar(3), @char char(1), @charcnt tinyint, @val varbinary(4), @oct tinyint SET @chars = '' SET @charcnt = 1 SET @oct = 4 WHILE( @oct > 0 ) BEGIN SET @char = substring( @ip + '.', @charcnt, 1 ) IF @char = '.' BEGIN IF @val IS NULL SET @val = CAST( CAST( @chars AS int ) AS binary(1) ) ELSE SET @val = @val + CAST( CAST( @chars AS int ) AS binary(1) ) SET @chars = '' SET @oct = @oct - 1 END ELSE SET @chars = @chars + @char SET @charcnt = @charcnt + 1 END RETURN @val END GO -- Create the IPs table to store the IP address data. CREATE TABLE IPs ( StartIP varchar(15) NOT NULL, EndIP varchar(15) NOT NULL ) -- Load random IP data into the table. DECLARE @r int, @sd int, @o1 int, @o2 int, @o3 int, @o4 int, @o3e int, @o4e int SET @sd = rand() SET @r = 0 WHILE( @r

Now, test your SQL Server savvy in the November Reader Challenge, "Collation Conflict" (below). Submit your solution in an email message to [email protected] by October 16. SQL Server MVP Umachandar Jayachandran, a SQL Server Magazine technical editor, will evaluate the responses. We’ll announce the winner in an upcoming SQL Server Magazine UPDATE. The first-place winner will receive $100, and the second-place winner will receive $50.

Here’s the challenge: Thomas administers databases that are hosted on several SQL Server 7.0 servers with different collation settings. As part of an upgrade, he’s trying to consolidate these databases into one powerful SQL Server 2000 server. He can easily make this move because SQL Server 2000 supports collation at different levels (i.e., server, database, column). The new SQL Server 2000 server is installed with the default settings for each SQL Server 7.0 collation. But during his upgrade testing, Thomas notices that some of the existing stored procedures fail with the error message:

Server: Msg 446, Level 16, State 9, Line 3
Cannot resolve collation conflict for equal to operation.

Help Thomas determine the cause of this problem and the possible solution.

Comments

Plain text