A Bulk-Copy Procedure

Congratulations to Alejandro Mesa, a database programmer for Simplex Medical in Fort Myers, Florida, and Marcos Kirchner, a student at the University of Blumenau (FURB) in Santa Catarina, Brazil. Alenjandro won first prize of $100 for the best solution to the August Reader Challenge, "A Bulk-Copy Procedure." Marcos won second prize of $50. Here's a recap of the problem and the solution to the August Reader Challenge.

Problem:

Larry is a database analyst who maintains a reporting database for the company's data warehouse. He must periodically synchronize the reporting database with the data warehouse and other external data sources. The reporting database contains a table that stores a list of companies that bulk copy program (bcp) loads from a data file. The data file always contains the complete list of companies. The following sample schema for the Companies table is created from the Suppliers table in the Northwind database:

USE northwind
GO
SELECT SupplierId AS CompanyId, CompanyName
  INTO Companies
  FROM Suppliers
ALTER TABLE Companies ADD CONSTRAINT pk_companies_id
  PRIMARY KEY CLUSTERED(CompanyId)
GO

Other tables in the reporting database also reference the Companies table. When Larry tries using bcp to load the data file containing the list of companies into the Companies table, he gets a primary key violation because of the unique constraint on the CompanyId column. Larry needs to load only new companies into the table from the data file-he can ignore updates to existing companies. What should Larry do to import the data file with the least amount of coding and redesigning of the schema? The bcp commands must use the same data file for import and insert only the new rows into the Companies table.

Solution:

Larry can create an additional unique index on the Companies table that has the IGNORE_DUP_KEY option enabled. The following statement creates the new index:

CREATE UNIQUE INDEX uq_idx_companies_id
   ON Companies(CompanyId)
   WITH ignore_dup_key

The IGNORE_DUP_KEY option on the unique index lets a bulk insert or an INSERT statement insert rows that have the duplicate key values. SQL Server will ignore the rows with duplicate key values and issue a warning message. By adding the index with the IGNORE_DUP_KEY option enabled, Larry can use bcp to copy the data file containing the list of companies into the Companies table without any modification. The duplicate rows in the file will be ignored because of the new index, and the primary key violation won't occur.

Larry can also drop the primary key on the Companies table, drop the foreign key constraints that reference the primary key, create the index with the IGNORE_DUP_KEY option enabled, and recreate the foreign key constraints. This approach works because SQL Server lets you define foreign key constraints on columns that are part of a unique index. The following script demonstrates the technique:

USE tempdb
GO
CREATE TABLE pt ( i int NOT NULL )
CREATE UNIQUE CLUSTERED INDEX _ ON pt(i)
CREATE TABLE ft ( i int NOT NULL REFERENCES pt(i))
GO
INSERT INTO pt VALUES(1)
INSERT INTO ft VALUES(1)
GO
-- Will raise FK violation
INSERT INTO ft VALUES(2)
GO
DROP TABLE ft, pt
GO

SEPTEMBER READER CHALLENGE:

Now, test your SQL Server savvy in the September Reader Challenge, "Foreign Punctuation" (below). Submit your solution in an email message to [email protected] by August 19. Umachandar Jayachandran, a SQL Server Magazine technical editor, will evaluate the responses. We'll announce the winner in an upcoming SQL Server Magazine UPDATE. The first-place winner will receive $100, and the second-place winner will receive $50.

Problem:

David is a database programmer for an advertising company that deals with international data. The database that holds data for each customer contains a table that stores names in Unicode format. The following script contains the table and sample data:

CREATE TABLE #Temp ( nc nvarchar(30) COLLATE Latin1_General_CI_AS )
CREATE CLUSTERED INDEX Idx_Temp_nc on #Temp( nc )
INSERT INTO #Temp VALUES( N'MRKT' + nchar(0x3000) + N'Name')
INSERT INTO #Temp VALUES( N'TEST String' )
INSERT INTO #Temp VALUES( N'MRKT' + nchar(0x3000) + N'Name' + nchar(0x3000) )

In the sample data, the Unicode character 0x3000 represents a punctuation symbol in different languages, such as Chinese. When David writes a query using an equality search condition, he notices that the query doesn't work as expected. For example, the following query returns two rows instead of one:

SELECT * FROM #Temp WHERE nc = N'MRKT' + nchar(0x3000) + N'Name' + nchar(0x3000)

What is incorrect in the query? And how can David modify the query to return the correct data without ignoring punctuation or white-space characters in any language?

Comments

Plain text