Like the jelly-like Blob monster in those old Steve McQueen horror movies, mixing binary large objects (BLOBs) with relational database data has traditionally been unnatural and difficult to manage. BLOBs and relational database data are very different entities. Relational data usually consists of text or numbers and tends to be small. In contrast, BLOB data is most often pictures in .jpg, .tiff, or .bmp format—such as product images on a Web site—which can be quite large.
As the sidebar "Storing BLOBs in the Database or the File System?" discusses, integrating BLOB data and relational database data has its pros and cons. And until recently, SQL Server’s limited data types for storing BLOBs made working with them more difficult than working with traditional relational database data. But with the introduction of three new data types, SQL Server 2005 treats all data the same—whether your BLOB contains images, large amounts of text data, audio files such as MP3s, or even program executables (.exes). In this article, you’ll see how easy it is to use one of the new data types, VARBINARY(MAX), to import BLOB data into and retrieve it from a SQL Server 2005 database. You can download a simple example project that demonstrates this functionality. But before jumping into the code, let’s review the key elements of SQL Server’s BLOB storage.
What about BLOB?
SQL Server stores standard INT, CHAR, and VARCHAR data directly within a row. However, this approach limits the maximum capacity of each data type to 8000 bytes, slightly less than SQL Server’s maximum row size of 8060 bytes. (Note that with SQL Server 2005’s row-overflow feature, the maximum row size can exceed 8060 in certain cases; for information about this feature, see Kalen Delaney's "Stretching the 8K Row Limit.")
Although more than adequate for most data types, the 8KB limit is a problem for most BLOB data. To accommodate the needs of larger BLOB data, Microsoft designed SQL Server to handle BLOB storage differently than it handles storage for more common data types. Figure 1 shows an overview of how SQL Server 2005 and earlier releases store the IMAGE and VARBINARY(MAX) BLOB data types.
As you can see in Figure 1, SQL Server doesn’t store large BLOB data on the same data page as the data for the rest of the row. Instead, it stores BLOB data as a collection of 8KB pages organized in a B-tree structure. Each row’s BLOB column contains a 16-byte pointer to the root B-tree structure, which tracks the blocks of data that comprise the BLOB. If the data is less than 64 bytes, SQL Server stores it as part of the root structure. Otherwise, the root structure contains a series of pointers to the data blocks that comprise the binary object.
For BLOBs smaller than SQL Server’s 8KB page size, you have a couple of options for storing the BLOB data inline, as you would standard text and numeric data. For the old TEXT, NTEXT, and IMAGE data types, which SQL Server 2005 continues to support, you can use the text-in-row feature to store the data inline. And for the new VARCHAR(MAX), NVARCHAR(MAX), and VARBINARY(MAX) data types, you can use the backward-sounding large value types out of row option. Storing smaller BLOB data inline improves performance, avoiding the extra I/O needed to read the BLOB data record. (For more information about SQL Server’s text-in-row feature, see Kalen Delaney’s "Text in Row Internals.")
Note that BLOBs are sometimes called large objects (LOBs) or character large objects (CLOBs). The LOB designation can refer to both CLOBs and BLOBs. However, technically, CLOBs can contain only text data such as XML, whereas BLOBs can contain any type of data, including binary data. SQL Server 2005 can store LOB data by using the older TEXT, NTEXT, and IMAGE data types or the new VARCHAR(MAX), NVARCHAR(MAX), and VARBINARY(MAX) data types. Table 1 lists the attributes for SQL Server’s LOB data types.
In pre–SQL Server 2005 releases, IMAGE is the primary data type for BLOB storage and can contain 2GB of virtually any type of binary data. Although similar to IMAGE, TEXT and NTEXT data types can store only text data, not binary data. TEXT can accommodate up to 2GB of non-Unicode text data, and NTEXT can accommodate up to 1GB of Unicode text data.
However, these earlier BLOB data types have some frustrating limitations. You can’t use them as local variables in stored procedures or T-SQL batches, as part of an index, or in WHERE, ORDER BY, COMPUTE, or GROUP BY clauses. (The exception: You can use TEXT and NTEXT with the LIKE keyword.) However, the biggest problem with IMAGE, TEXT, and NTEXT data types is that to access them, you have to use a different programming model than you use to access other SQL Server data types.
SQL Server 2005’s VARCHAR(MAX), NVARCHAR(MAX), and VARBINARY(MAX) data types eliminate these limitations while still letting you store large amounts of data. Unlike with earlier BLOB data types, you access the various (MAX) data types the same way as other SQL Server data types. In addition, you can declare the new data types as local variables. And you can use SQL Server string-handling functions such as SUBSTRING() on VARCHAR(MAX) and NVARCHAR(MAX) columns. VARCHAR(MAX) and VARBINARY(MAX) both provide up to 2GB of storage, whereas NVARCHAR(MAX) provides 1GB of character storage. Now, let’s see how you can use the new VARBINARY(MAX) data type with T-SQL and ADO.NET to import BLOB data into a SQL Server 2005 database, then use ADO.NET to retrieve it.
Using T-SQL to Import BLOBs
The first step in working with BLOBs and the VARBINARY(MAX) data type is create a table containing a VARBINARY(MAX) column and use the column as target to import BLOB data. Use the following code to create the sample table MyBLOBTable:
CREATE TABLE MyBLOBTable (blob_id int IDENTITY(1,1), blob_description varchar(256), blob_data varbinary(max))
Because of T-SQL’s limited ability to address the IMAGE, TEXT, and NTEXT data types, these older BLOB types almost forced you to use ADO or ADO.NET to load BLOB data. But because you can use the new (MAX) data types as local variables, you can either use T-SQL to load them directly or use ADO or ADO.NET to load them.
Listing 1’s code uses T-SQL and the OPENROWSET function to load data into the VARBINARY(MAX) blob_data column. The INSERT statement inserts the results of the SELECT statement, which uses the OPENROWSET function to read the contents of the bitmap file win2003.bmp from the file system. Notice that OPENROWSET uses the new BULK rowset provider, which reads data from the file system, to return a file’s contents as a scalar value. The SINGLE_BLOB keyword tells the BULK provider that it will be reading binary data.
Using ADO.NET 2.0 to Import BLOBs
You can also use ADO.NET to import BLOB data into VARBINARY(MAX) columns. As I noted earlier, for the IMAGE data type, ADO.NET is the only effective way to load BLOB data. However, earlier versions of ADO.NET require a different programming model to work with BLOBs than to work with other character and numeric data types, which makes BLOB processing more difficult. In ADO.NET versions before 2.0, you essentially have to use either chucking techniques or Stream objects to access BLOB data. (See "BLOB Further Reading" for articles that discuss these techniques.) However, Microsoft has enhanced ADO.NET 2.0 to support the new VARBINARY(MAX) data type, greatly simplifying BLOB handling.
The code in Listing 2 uses ADO.NET 2.0 to load data into the blob_data column. First, the code reads the BLOB data from the file system into a variable. As callout A in Listing 2 shows, the code creates a new FileStream object named fs to read the file C:\tem\win2003.bmp". The fs FileStream object then reads the BLOB file’s contents into a byte array named bBLOBStorage.
After the code reads the BLOB data from the file system and assigns it to a variable, it writes the variable’s contents to the SQL Server database. Because the code requires I/O to the file system and access to the database, you need to add the following namespaces to your project:
- Imports System.IO
- Imports System.Data
- Imports System.Data.Sqlclient
At callout B in Listing 2, the code creates a new ADO.NET SqlConnection object named cn and a new ADO.NET SqlCommand object named cmd. As their names suggest, the cn SqlConnection object creates a connection to the SQL Server database, whereas the cmd SqlCommand object executes a T-SQL INSERT command that adds the BLOB data to the database. In this example, the INSERT statement uses two parameters: @blob_description to add a description of the BLOB to the database and @blob_data to add the BLOB itself to the database.
The code sets the Direction property of both parameters to input and sets the data type of the @blob_description parameter to SqlDbType.VarChar and the data type of @blob_data to SqlDbType.Image. Don’t be misled by the use of SqlDbType.Image for @blob_data; this data type isn’t restricted to just the IMAGE data type; it also works with VARBINARY(MAX). Next, the code assigns values to the parameters’ Value properties, assigning a string to @blob_description’s property and assigning the contents of the bBLOBStorage byte array, which was filled earlier, to @blob_data’s property. The code opens the cn SqlConnection object. Then, it uses the cmd SqlCommand object’s ExecuteNonQuery method to execute the T-SQL INSERT statement, which imports the BLOB data to MyBLOBTable’s blob_data column.
Retrieving Data from a VARBINARY(MAX) Column
After you’ve loaded BLOB data into the database, you need to be able to retrieve and display the data. The code in Listing 3 uses ADO.NET 2.0 to retrieve the BLOB data from MyBLOBTable’s VARBINARY(MAX) blob_data column and display it in a picture box. In this example, the code reads the BLOB into a MemoryStream, then assigns the MemoryStream to a picture box control’s Image property. This technique is fast and doesn’t require any intermediate files, eliminating the I/O required to first write the binary data to disk, then read it.
Listing 3’s RetrieveBLOB subroutine begins at callout A by creating a new SqlConnection object named cn and a new SqlDataAdapter object named da. The code constructs the SqlDataAdapter by using a T-SQL SELECT statement that retrieves the blob_data column from MyBLOBTable, where the value of the blob_id column equals 1. The code then creates a SqlCommandBuilder object and an empty DataSet. A Try block opens the connection and uses the SqlDataAdapter’s Fill method to populate the DataSet. The subroutine then evaluates the MyBLOBTable DataTable’s Rows.Count property to make sure some data was retrieved.
The real action begins at callout B with the creation of a byte array called bBLOBStorage, which is assigned the contents of the binary image in the DataSet’s blob_data column. The code then creates a new MemoryStream object named ms and assigns it the contents of the bBLOBStorage byte array. Finally, the code uses the picture box control’s FromStream method to assign the binary image data from the ms Memory-Stream object to the picture box control’s Image property. Figure 2 shows the results of the RetrieveBLOB subroutine.
BLOB with Ease
Although SQL Server 2005 still supports the TEXT, NTEXT,and IMAGE data types for backward compatibility, the VARCHAR(MAX), NVARCHAR(MAX), and VARBINARY(MAX) data types give you important capabilities for working with BLOBs. As you’ve seen in these examples using VARBINARY(MAX), you can now use these data types as local variables and in T-SQL parameters, so you can easily work with BLOB data T-SQL and ADO.NET 2.0.
BLOB FURTHER READING
MICHAEL OTEY"A BLOB of a Different Color,"
October 2003, InstantDoc ID 39867
"Using the SqlCommand Object to Import BLOBs,"
"Using WinForm Data Binding with BLOBs,"
"BLOB Access Technologies,"
"Return of the BLOB,"
"Who’s Afraid of the Big, Bad BLOB?"