Data Export

Congratulations to Anith Sen, an independent database consultant in Cordova, Tennessee, and Ken Haley, manager of development for SPECTRUM Human Resource Systems Corporation in Denver. Anith won first prize of $100 for the best solution to the April Reader Challenge, "Data Export." Ken won second prize of $50. Here’s a recap of the problem and the solution to the April Reader Challenge.

Problem:

Stella is a database programmer for a company specializing in hosting publishing information. The publisher database contains tables that store information about authors, publishers, and titles. On its Web site, the company displays a table containing information about publishers. Assume that the table schema is the same as the schema for the pub_info table in the Pubs database. The table includes the pr_info column, a text column containing detailed information about each publisher. Stella is trying to export data from this table for a proprietary application that can’t handle text data. She needs to export the data so that the value from pr_info is split across multiple lines, each having no more than 8000 characters. The following example shows how the exported data should be formatted:

Column #1:	pub_id
Column #2:	line_num (from 1 through n)
Column #3: 	line_text
Order of rows:	pub_id, line_num

Help Stella write the code that will export the publisher information in the desired format efficiently and with minimal coding. She wants a set-based solution to her problem, not a user-defined function (UDF).

Solution:

Stella can use the built-in SUBSTRING() function to retrieve a portion of the value from the pr_info column. For example, the following query returns the first 500 characters and, if available, the next 500 characters from each row of the Pubs database’s pub_info table:

SELECT p..pub_id, SUBSTRING( p.pr_info, 1, 500 )
AS pr_info_1, SUBSTRING( p.pr_info, 501, 500 ) AS pr_info_2
FROM pubs..pub_info AS p

Next, Stella can use the built-in SUBSTRING() function again to extract a portion of text for each pr_info value, then split the value based on its length. She can split the value by using a Numbers table to generate the starting-position parameter for the SUBSTRING() function based on the length of the data. The following code shows the set-based approach of using a Numbers table to split the text column into several lines, each with no more than 8000 characters:

SELECT p..pub_id,
       n.Number + 1 AS line_num,
       SUBSTRING( p.pr_info, (n.Number*8000)
	    + 1, 8000 ) AS line_text
       FROM pubs..pub_info AS p
    JOIN Numbers AS n
    ON n.Number <= DATALENGTH( p.pr_info )/8000
ORDER BY p..pub_id, line_num

The Numbers table contains numbers from 1 through some maximum value, n, depending on the supported length of the text value. For simplicity, assume the table contains numbers from 0 through 100. The code to generate the Numbers table is:

SELECT TOP 100 IDENTITY(smallint,0,1) AS Number
INTO Numbers
FROM sysobjects AS o1
CROSS JOIN sysobjects AS o2

The query logic works by determining the length of the text data in the pr_info column for each row, then dividing that number by 8000 to get the number of exported data rows. Stella can use the built-in DATALENGTH() function to get the length of data in the text value. Then by dividing the value for each row by 8000, she can determine the number of rows required for the text value. The Numbers table provides an easy mechanism for generating the starting position for SUBSTRING() and, thus, the line numbers for the text value.

May Reader Challenge:

Now, test your SQL Server savvy in the May Reader Challenge, "Query Strings" (below). Submit your solution in an email message to [email protected] by April 17. SQL Server MVP Umachandar Jayachandran, a SQL Server Magazine technical editor, will evaluate the responses. We’ll announce the winner in an upcoming SQL Server Magazine UPDATE. The first-place winner will receive $100, and the second-place winner will receive $50.

Maria is the database developer for a company that analyzes Web data and provides reports based on the traffic. One such report provides a list of the most popular URLs, based on number of hits. The company’s databases run on SQL Server 2000. Maria has a QueryStrings table that contains query strings from various URLs and a Urls table that maps the query strings to specific URLs. The schema for the tables and some sample data follows:

CREATE TABLE QueryStrings (
QSID int NOT NULL IDENTITY PRIMARY KEY,
QSVal varchar(8000) NOT NULL
)
INSERT INTO QueryStrings VALUES( 'a=1&b=2&c=3' )
INSERT INTO QueryStrings VALUES( 'a=1&b=4' )
INSERT INTO QueryStrings VALUES( 'b=2&a=1&c=3' )
INSERT INTO QueryStrings VALUES( 'a=1&b=24' )
INSERT INTO QueryStrings VALUES( 'a=11&b=24' )

CREATE TABLE Urls (
   UrlID int NOT NULL IDENTITY PRIMARY KEY,
   QSID int NOT NULL REFERENCES QueryStrings( QSID )
)
INSERT INTO Urls VALUES( 1 )
INSERT INTO Urls VALUES( 1 )
INSERT INTO Urls VALUES( 2 )
INSERT INTO Urls VALUES( 2 )
INSERT INTO Urls VALUES( 3 )
INSERT INTO Urls VALUES( 4 )
INSERT INTO Urls VALUES( 5 )
INSERT INTO Urls VALUES( 5 )

Maria uses the QueryStrings table in several queries and JOIN conditions to filter the URLs based on the search string specified by the user requesting the report. The following sample query joins the QueryStrings table with the Urls table:

DECLARE @SearchString varchar(8000)
SET @SearchString = 'a=1&b=2'
SELECT u.UrlID, u.QSID, q.QSVal
     FROM Urls AS u
     JOIN QueryStrings AS q
      ON q.QSID = u.QSID
   WHERE '&' + q.QSVal + '&' LIKE '%&' 
   + @SearchString + '&%'

In the sample query, Maria searches for URLs that contain only certain query-string parameters and values such as a=1&b=2. Given the sample data, the query will return two query-strings with QSID=1 and QSID=3. Now she wants to modify the query so that it returns all query strings that contain the parameters being searched for irrespective of their order. Help Maria find a set-based solution for her new requirement.

Comments

Plain text