Skip navigation

Tame the BLOB

Storing files as Binary Large OBjects (BLOBs) in the database can improve consistency and reliability.

DataStream

LANGUAGES: C#

ASP.NET VERSIONS: 1.0 | 1.1

 

Tame the BLOB

Storing files as Binary Large OBjects (BLOBs) in the database can improve consistency and reliability.

 

By Brian Noyes

 

A common requirement in applications is loading and saving data from files, and .NET makes performing file I/O a piece of cake. However, managing the storage of those files can get tricky sometimes, especially when dealing with volatile and automatically created files. You have to come up with a scheme for what folder structure you are going to use, ensure that you do not have naming conflicts, make sure you have appropriate access to those files, remove files when they are no longer being used, and so on. Basically, there are a lot of things to get straight and a lot of places for things to go wrong with file I/O. It is also common to need to store some metadata about the files you are creating and accessing in a database, so that you can query for that metadata or provide summary reports of the files that exist in your system or the data that they contain.

 

Separating the data contained in the files from the metadata stored in the database leads to all kinds of synchronization problems as well. What happens if the file gets stored, but the metadata for the file does not get created properly? What if someone deletes the file from the file system, but fails to clean up the corresponding records in the database? By separating the files from their associated data, updates and additions involve making changes to both a transactional resource manager (the database) and a non-transactional resource manager (the file system). Trying to ensure consistency with this separation of file data and metadata can be a real pain.

 

The answer to this dilemma lies in the ability for most modern databases to store binary data in fields within a table. Under the covers, the way the data is stored and accessed is going to vary widely, based on the particular database implementation. But at the logical level, the binary data appears to be just another field in a table where you can store other information as well.

 

For this article, I am going to focus on working with SQL Server, but the code and concepts apply to other databases as well, as long as they have corresponding database types.

 

SQL Server Binary Types Pave the Way

SQL Server defines several types that translate to binary storage under the covers. There are the binary and varbinary types, which store up to 8KB of binary data. The binary type has a fixed allocation size that you specify when creating the column, and varbinary fields will only require storage based on the actual size of the data put into each row. More commonly, what you will want to use are the image, text, and ntext types, each of which allows variable length storage of data from 0 to 2GB. The image type is for storing raw bytes, whether they correspond to a graphic image, some bits from a file, character data, or whatever you want. The text and ntext fields are optimized for storage and retrieval of large blocks of text. The text type will store the text in a format determined by the code page used by the database in question. The ntext type always stores characters in Unicode format.

 

ADO.NET eases the process of getting binary data into and out of the database. An ADO.NET-managed provider is responsible for defining a mapping between the underlying database type system and the .NET type system. For SQL Server, the text and ntext types just map to a string. This is a natural fit since .NET strings are Unicode under the covers and are only limited in size by the amount of memory your process can allocate. The image type in SQL Server maps to a byte array, again making it easy to work with since most binary operations in .NET deal with byte arrays as well. Specifically, whenever dealing with streams, it is easy to push a byte array into a stream or read it back out. The code in Figure 1 shows how to read a graphic image file out of a database field and into a Bitmap. This code calls a stored procedure, which takes the primary key for the table and returns the image field of the table containing the raw bytes read in from the image file when the table was populated.

 

private Bitmap LoadBitmap(int id)

{

   // Create the connection

   SqlConnection conn = new SqlConnection(m_connectionString);

   // Create a command for the GetImage proc

   SqlCommand cmdGetImage = new SqlCommand("GetImage",conn);

   cmdGetImage.CommandType = CommandType.StoredProcedure;

   // Populate the id param

   cmdGetImage.Parameters.Add("@ImageID",id);

   SqlDataReader reader = null;

   byte[] bits;

   try

   {

      conn.Open();

      // Execute a reader to get the data

      reader = cmdGetImage.ExecuteReader();

      reader.Read(); // Position on first and only record

      // Extract the bits

      bits = reader["ImageData"] as byte[];

   }

   finally

   {

      // Close the reader for this image

      if (reader != null)

         reader.Close();

      // Close the connection

      conn.Close();

   }

   // Load the bits into a Bitmap using a MemoryStream

   MemoryStream ms = new MemoryStream(bits);

   Bitmap bmp = new Bitmap(ms);

   return bmp;

 

}

Figure 1. Reading an image field in from a database returns a byte array. Depending on the expected contents of the field, those bytes can then be used to re-hydrate an object of the appropriate type. In this case, the bytes are placed in a MemoryStream, which is then used to create a Bitmap object. This is all done without any direct file I/O.

 

The code in Figure 1 takes the simple approach of just using a reader with the default command behavior because it is being used with a query that is expected to only return a single row. If you are using a DataReader with a query that will return multiple rows containing image data, you will want to use the CommandBehavior.SequentialAccess flag to have the reader only pull the data one row at a time as a stream. You pass this enumeration value to the ExecuteReader method, then access the data by calling the GetBytes method on the reader. The code below shows this approach:

 

SqlDataReader reader = cmd.ExecuteReader(CommandBehavior.SequentialAccess);

while (reader.Read())

{

   // Get the byte count by passing null for the buffer

   int byteCount = (int)reader.GetBytes(0,0,null,0,0);

   byte[] imgData = new byte[byteCount];

   // Read in the whole block for this row at once

   reader.GetBytes(0,0,imgData,0,byteCount);

}

 

If you expect the image field for a single row to be very large, you can read in the bytes in chunks using the GetBytes method by providing different indexes for the various arguments of the method. See the online docs for details of each parameter.

 

The download code for this article includes a sample application that allows you to read and write both graphic image files (bmp, jpg, and gif) and XML documents to some tables in a database. The app lets you populate the database by pointing it to a folder that contains image files or XML documents. The app also lets you read those files in from disk using equivalent code to the code that reads them in from the database, allowing you to experiment with the relative performance of loading files from disk or loading them from the DB. The graphic images in the ImageCollection table are passed into a Bitmap object constructor, and the XML file contents are used to load an XmlDocument instance to represent the kinds of objects you might create with the underlying binary data. The app code includes some simple profiling timing to see the differences between reading BLOBs from the DB and reading the same data from disk.

 

Why (and When to) Bother?

First, you need an application where it makes sense to use this technique. One example would be if you have pictures that are associated with other data presented by your app, such as product photos or thumbnails presented with product data in an online store Web site. Another would be if you are using XML as a structured storage mechanism for data that you don't want to have to define a rigid relational schema for, but you need to store and retrieve that data along with other metadata items about that structure storage (i.e. date/time of creation, number of data items contained, etc.).

 

But don't get carried away with using this technique. The file system still has a place in many applications. If the files are products of the application, as in end-user applications, and if the user might expect to be able to find those files outside the running application itself, then storing files as BLOBs in the database is not the right way to go.

 

What about performance? If you dealt with binary data and SQL Server prior to version 7.0, you may have found that the performance left something to be desired. In version 7.0 and later, the way SQL Server stores and retrieves binary data has been vastly improved. From profiling done with the download code sample app using ADO.NET and SQL Server 2000, and using a collection of images and documents on my machine, it is obvious that there is little to no performance penalty for retrieving files stored in a database compared to loading them from disk. Storing and retrieving text files is actually faster from the database than from disk. Your mileage may vary depending on the connection between your database and your executing code, network traffic, and other factors, so as always with data access and performance concerns - profile!

 

By storing your file content in a database you benefit from co-locating it with any associated metadata, and accessing that data and content within the context of a transactional resource manager. You can back up all the data atomically using replication or other database backups. You don't have to worry about file access permissions, name collisions on files or folders, or cleaning up obsolete files. If you are storing the content of text files, you can also include them in the Full-Text Search capabilities of SQL Server to perform deep searches of not only the metadata for files, but also their content.

 

The sample code in this article is available for download.

 

Brian Noyes is a consultant, trainer, speaker, and writer with IDesign, Inc. (http://www.idesign.net), a .NET focused architecture and design consulting firm. Brian specializes in designing and building data-driven distributed applications. He has over 12 years experience in programming, engineering, and project management, and is a contributing editor for C#PRO, asp.netPRO, and other publications. Contact him at mailto:[email protected].

 

 

 

 

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish