Reconceptualizing Storage Strategies

The dazzling array of new storage technologies such as Serial ATA (SATA), Serial Attached SCSI, and IP Storage Area Networks (SANs) that have or will soon be introduced presents storage administrators with an increasingly complicated set of choices for developing and enhancing their storage infrastructures. But building or augmenting an infrastructure from new, lower-cost options isn't just a matter of evaluating technologies. Rather, as Jerry Hoetger, senior manager of product marketing at VERITAS Software, contends, companies large and small have to reconceptualize the way they develop their storage infrastructure.

Long gone are the days of seeing more capacity as the solution to every storage problem. Yet some storage administrators continue to approach storage from a macro perspective, focusing primarily on the overall amount of data that they need to manage. The new storage options available mean that companies should instead categorize data, then construct a tiered storage infrastructure that is robust, safe, and appropriate for data of varying characteristics.

Data can be categorized in several ways. The most fundamental way is by type. Most companies have three types of data: structured transactional data, typically stored in databases; semistructured data, such as email; and unstructured data, such as documents. Within every company, each data category can be assigned a different importance, and thus a different value.

Typically, structured data is seen as the most valuable corporate asset and requires the most significant investment in tier-one storage. As we all know, the amount of structured data is growing rapidly, but in many cases the amount of semistructured and unstructured data is growing just as fast, or even faster. Although theoretically less valuable, that semistructured and unstructured data might still require the same high-performance, high-cost storage infrastructure as transactional data. The entertainment industry, for example, has seen a dramatic explosion in storage needs for image and audio files. And email, of course, is now essential in businesses of all sizes.

The question, then, becomes this: When is it prudent to store less-valuable data on lower-cost, lower-performance, less-reliable storage devices? For example, can a Hollywood production studio safely store movie files on a less expensive storage device than it uses for, say, its online-order-entry data? As another example, how should audio files be treated? Many universities now assign storage space to students; should students be allowed to fill that space with MP-3 files? Should the university regularly back up MP-3 files? Should those files be purged at the end of each semester, even if the student hasn't graduated? Should MP-3 data be stored on near-online, instead of online, devices?

Data type is only one criterion by which data can be categorized. Data can also be categorized by its timeliness or its access patterns. The concept of assigning a value to data according to its age is well known, for example. Studies have shown that the longer an email message goes unanswered, the less likely that it will ever be answered. Yet people who have access to sufficient storage space routinely keep thousands of email messages, although many of those saved messages will never be accessed again.

It seems intuitive that the newest data and data that's frequently accessed should be stored on the most robust, high-performance storage devices. But decisions to categorize data along those lines aren't always clear-cut. In large laboratories, for example, data generated by automated instruments might never be analyzed. But if a scientist does conceive of a question that she could use the collected data to answer, that data had better be available in a relatively timely fashion. Moreover, regulatory requirements in financial, health-care, and other industries now require companies to be able to access archived data quickly.

Interestingly, although categorizing data according to its type, age, and access pattern and building a storage infrastructure that reflects those categories seems like a logical approach, many companies don't have the internal organization to effectively carry out that kind of analysis. In many shops, a storage administrator or other IT professional has overall responsibility for the storage infrastructure and the most in-depth knowledge of possible storage solutions. But determining effective storage policies can require a deep analysis of data-access patterns and extensive consultation with end users. And, as Art Tolsma, CEO of Luminex Software points out, many end users aren't interested in storing their data on less expensive, less robust, less available devices. His company markets a controller that lets mainframe computer users store data on open-systems storage devices. But, he says, many customers buy tier-one storage because that's what users want.

Despite the challenges, developing a more nuanced assessment of data storage needs can lay the groundwork for more effective storage policies and a more cost-effective and efficient storage infrastructure. This exercise can be worthwhile for companies of all sizes--even those who store all their data on a 40GB hard disk.

Comments

Plain text