Boost Support for Large Object Data Types

Take advantage of SQL Server 2005's variable-length data types

SQL Server 2005 enhances support for large object (LOB) data types in ways that are sure to please T-SQL enthusiasts and DBAs. The enhancements provide a unified programming model for regular data types and LOBs and let you manipulate LOBs more easily and flexibly than ever. In this article, I'll discuss some enhancements related to LOBs and demonstrate their use. First I'll describe the new MAX specifier that you can use for variable-length data types. Then I'll discuss the WRITE method that you can use with LOBs to modify their value. Finally, I'll introduce the BULK rowset provider, which lets you load a file into a scalar LOB value and query the file's content as a rowset.

MAX Specifier
You can use the MAX specifier when you define a column, variable, or parameter of a variable-length data type (VARCHAR, NVARCHAR, and VARBINARY). You use the MAX specifier in the format col data type(MAX)—for example, col1 VARCHAR(MAX). SQL Server 2005 lets you use these data types with the MAX specifier as alternatives to the TEXT, NTEXT, and IMAGE LOB data types, which are more difficult to work with.

Using MAX lets you give SQL Server control of the value's management, whereas in earlier SQL Server versions regular data types were stored inline within the record and TEXT, NTEXT, and IMAGE data types were stored external to the record. A minor exception in SQL Server 2000 was support for the text-in-row option that allowed storing small amounts of TEXT, NTEXT, and IMAGE data inline. SQL Server 2005 uses its own algorithms to determine whether to keep the value inline within the physical record or store the value external to the base record and to keep track of the value by using a pointer. No more awkward manipulation of large objects that require your code to first grab the pointer to the value, then use specialized commands (READTEXT, WRITETEXT, UPDATETEXT) to manipulate it.

In SQL Server 2005, the programming model for regular types and LOBs is unified. Most of the functions that work with regular data types also work with those data types when you use the MAX specifier. SQL Server 2005 lets you use the MAX-specifier data types wherever you'd use regular data types—such as in column definitions, input and output arguments of routines, and local variables. Furthermore, rows that have regular variable-length columns (as opposed to variable-length columns defined with MAX) are no longer limited to a single page; instead, they can now span pages. As in SQL Server 2000, each individual value can't exceed 8000 bytes (unless, of course, you used the MAX specifier). This type of storage is called row-overflow data.

To get a feel for working with LOBs in SQL Server 2005, first run the code in Listing 1 to create the BigData table. The table has an integer key column called keycol and a VARCHAR(MAX) column called vcmcol.

Next, run this code to load a new row to the BigData table:

  dbo.BigData(keycol, vcmcol)
'This is a very long string ('
     (CAST('x' AS 
     + ')';

(Some lines of code in this and other statements wrap to multiple lines because of space constraints.) Notice two things: First, the insert is similar to any other insert dealing with regular data types. Second, I created a character string much longer than 8000 characters by using regular concatenation.

To check the actual length of the new value, run the LEN() function as follows:

SELECT LEN(vcmcol)
  FROM dbo.BigData
  WHERE keycol = 1;

You'll get the output 100029, which indicates the number of characters in the new vcmcol column value.

WRITE Method
If you want to substitute the current value of a LOB column, you use a plain UPDATE statement. However, to let you efficiently substitute portions of the value (as opposed to overriding the entire value by using a function such as STUFF), SQL Server provides a WRITE method that you can use with data types with the MAX specifier. The WRITE method's format is SET lob_value.WRITE(@Expression, @Offset, @Length).

If you're familiar with the STUFF function, which was available for regular characters prior to SQL Server 2005, its logic and arguments are similar to those of the WRITE method. When all arguments are specified, the WRITE method removes @Length characters starting from @Offset position (zero-based offset) and inserts @Expression in that position.

For example, the following UPDATE statement removes 100,000 characters starting at position 28 (character 29). In other words, it removes all 100,000 x characters from the value and replaces them with 10 y characters:

UPDATE dbo.BigData
  SET vcmcol.WRITE(
   (CAST('y' AS 
   10), 28, 100000);

Similarly, the following UPDATE statement removes the word long and replaces it with the word short:

UPDATE dbo.BigData
  SET vcmcol.WRITE('short', 15, 4);
After you run the previous two UPDATEs, query the value to see the result, like this:
SELECT vcmcol
 FROM dbo.BigData 
 WHERE keycol = 1;

You'll get the output This is a very short string (yyyyyyyyyy).

When @Expression is NULL, SQL Server ignores @Length and truncates the string at @Offset position. For example, the following code truncates the string right after the word string:

UPDATE dbo.BigData
SET vcmcol.WRITE(NULL, 27, NULL);

SELECT vcmcol
 FROM dbo.BigData 
 WHERE keycol = 1;

You'll get the output This is a very short string.

When @Offset is NULL, @Length is ignored and @Expression is appended at the end of the string. For example, the following code adds a closing period to the sentence:

UPDATE dbo.BigData
SET vcmcol.WRITE('.', NULL, NULL);

SELECT vcmcol
 FROM dbo.BigData 
 WHERE keycol = 1;

You'll get the output: This is a very short string.

Finally, when @Length is NULL, @Expression is truncated at @Offset position. For example, the following code truncates the string after the word This:

UPDATE dbo.BigData
SET vcmcol.WRITE(NULL, 4, NULL);

SELECT vcmcol
 FROM dbo.BigData 
 WHERE keycol = 1;

Your output is simply This.

BULK Rowset Provider
Another LOB-related enhancement is the new BULK rowset provider, which is available when you use the OPENROWSET function. This provider lets you use the bulk engine to efficiently return a files contents as a scalar value. Now loading files into column values is easier then ever. Along with the BULK keyword, you specify the filename and one of three possible data type choices: SINGLE_BLOB (for binary data), SINGLE_CLOB (for character data), or SINGLE_NCLOB (for Unicode data). The OPENROWSET function will return a table with a single row and a single column called BulkColumn that contains the file's contents.

For example, I created two text files called C:\temp\textfile1.txt and C:\temp\textfile2.txt. Each file contains the text: This is the content of text file textfilen.txt, where n represents the file number. To load the file's content and a new row with key 2 into the table, use the following INSERT statement:

  dbo.BigData(keycol, vcmcol)
  SELECT 2, BulkColumn

To update the value of an existing row (the one with key 1), use an UPDATE statement like this:

UPDATE dbo.BigData
  SET vcmcol = 
    (SELECT BulkColumn
    WHERE keycol = 1;

When you query the contents of the BigData table by using the following SELECT statement, you'll get the output that Table 1 shows.

SELECT * FROM dbo.BigData;

The BULK rowset provider also lets you return a file's contents as a rowset using the BULK engine by simply querying the file instead of having to load the result set into a table. This capability gives you more flexibility than the BULK INSERT statement, which uses the same engine but forces you to load the result set into a table.

Assume that you have a text file called C:\temp\mytextfile.txt that contains the data that Figure 1 shows and a format file called C:\temp\myfmtfile.fmt that contains the data that Figure 2 shows. To return the contents of mytextfile.txt applying the format file myfmtfile.fmt using the BULK engine, run the following code, which returns the output that Table 2 shows:

  (BULK 'c:\temp\mytextfile.txt',
     AS D;

The BULK rowset provider has many additional options both for returning a file as a scalar value and as a rowset. You can find the technical details about the options that you can specify in SQL Server Books Online (BOL).

MAX Makes Handling LOBs Easier
With the introduction of the MAX specifier for variable-length datatypes, the programming model for regular data types and LOBs is at last unified. Treatment of LOBs is efficient and easier than ever. You can use the WRITE method to update portions of a LOB efficiently, and you can use the BULK rowset provider to return the contents of a value as a scalar LOB or as a rowset by efficiently using the BULK engine.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.