ILM Puts Data in Its Place

Classifying data by its business value could help contain storage costs

The data center at Wake Forest University Baptist Medical Center isn't remarkably different from those at other large institutions. It houses an assortment of mainframes and servers that provide services such as email and patient-record management to users in numerous departments. In 2000, the medical center—led by Bob Massengill, manager of technical services—implemented a storage area network (SAN) to store data generated by the medical center's applications. The move to a SAN had dramatic consequences. When the SAN was first implemented, it held "2TB of storage," says Massengill. "A year later, we were up to 18TB. Today, we're up to roughly between 60 and 70TB in the SAN." The explosion in the amount of data in the SAN led Massengill to explore a new concept in managing the medical center's storage needs: Information Lifecycle Management (ILM). "To me," says Massengill, "ILM is having the right data in the right format on the right storage platform at the appropriate time."

What's ILM?
ILM is one of the hottest concepts in storage today. ILM, which originated about 2 years ago, doesn't refer to a single product, product category, or family of disparate products. Instead, it's an approach to managing storage more efficiently and cost-effectively through a strategy that ensures that an organization's storage infrastructure aligns with business objectives. In a proposal to the Storage Networking Industry Association (SNIA), the SNIA Data Management Forum defined ILM as "a new set of management practices based on aligning the business value of information to the most appropriate and cost-effective infrastructure."

The core idea behind ILM is this: The value that specific pieces of information have for an organization changes over time. As the cost and complexity of the IT infrastructure in general and the storage infrastructure in particular grow, lower-value information should be stored on less-expensive storage devices.

This notion isn't entirely new. Records management is a well-developed corporate discipline, and records managers have long analyzed the time-value of records. ILM is also somewhat similar to the concept of hierarchical storage management (HSM). HSM, however, focuses primarily on data-migration policies. If a piece of data isn't accessed for a specific period of time, it's automatically moved from one storage platform to another. "HSM isn't value-based," says James Lee, vice president of product management at Princeton Softech, a vendor of database-archiving solutions. "It's time-of-access based."

ILM takes a much more sophisticated look at data and data movement. In many ways, it represents the marriage of records-management practices with the core concept of HSM and solves the problem of costly, inefficient data storage. "When you think about customer pain," says Todd Rief, senior director of product strategy at StorageTek, "90 percent of the world's data in data rooms is replicated information, and 80-plus \[percent of that\] information is never accessed again. But a large portion of that data is stored on the most expensive storage inefficiently for a long period of time."

Although the need to find ways to manage huge amounts of data is most pressing for large corporations, the idea of ILM is relevant for small-to-midsized business (SMBs), too. Organizations of all sizes can benefit from the ILM approach to classifying and storing data according to its business value.

Why ILM?
Four key factors have focused storage professionals' attention on the need for ILM. Perhaps the most obvious is the ongoing growth of data. Data growth has several significant implications. Too much data can slow the performance of essential applications. Archiving aging data potentially can extend applications' life and eliminate the need for costly hardware upgrades. Heavy data access can also hurt application performance, making it difficult to achieve desired service levels. In addition, enabling widespread access to data raises a myriad of security issues that companies must address.

Data is growing not only in quantity but in type. Email has increasingly become a significant driver of storage growth. The need to manage email archiving has become acute. Simply restricting the amount of storage space allocated to users is no longer sufficient to determine what data should be saved or discarded.

But email is only the tip of the iceberg for new data types that require a new storage management strategy. Image data and Web-based data and blogs are consuming storage space. And as VoIP becomes more common, it's only a matter of time that voice data will have to be stored somewhere, putting more pressure on storage infrastructures.

The third primary development driving ILM is regulatory-compliance issues. A host of recent regulations mandate that information be stored for a fixed period of time and that it be easily and readily accessible. The old notion that data can be migrated to a lower-cost platform, then tucked away in an offsite vault no longer works. Regulators can request data that could be years old, and companies must be able to produce it.

The need to retrieve data whose value is not determined merely by how frequently it's accessed is seen most clearly in archiving email. As Andrew Barnes, director of marketing at KVS, a business unit of VERITAS, notes, companies not only must be able to retrieve individual email messages, they must be able to produce entire email threads and multiple attachments.

The final factor that's spurring ILM is the innovation in storage technology itself. As SANs and Network Attached Storage (NAS) become more prevalent, storage has become more of a shared resource—some even argue that storage is becoming a service—instead of being tied to individual information silos. Furthermore, low-cost disk technology, such as ATA drives, has led to new storage hierarchies. Instead of two tiers—disk and tape—many storage infrastructures now have three or more tiers, whereby data is moved from high-performance production-oriented disk drives to less-expensive, lower-performance near-online (nearline) disk drives for backup and recovery operations, then to tape for archiving. The Wake Forest medical center uses a multitiered arrangement, similar to that shown in Figure 1.

The pace of storage innovation should continue to make ILM strategies easier to adopt. For example, storage virtualization, content-addressable storage, and new tape technology that's specifically developed with the idea that backup can ultimately be divorced from archiving applications, should all find a place in storage infrastructures developed around ILM.

Elements of ILM Solutions
As its name implies, ILM addresses information storage from the moment data is created until it's no longer useful and is deleted. Consequently, ILM solutions start with the primary production storage infrastructure, then migrate data to less-expensive storage tiers. But data migration is only one piece of the puzzle. ILM strategies must also address business-continuity issues, including data backup and restore and disaster recovery, as well as archiving and retrieval needs.

The first and perhaps key step in crafting an ILM strategy, according to Wake Forest's Massengill, is to enlist the support of the data owners. ILM is not strictly a storage issue but rather a business-process issue, and all the stakeholders in the specific business process must be involved and understand the business benefits at stake.

The second step in ILM is classifying data according to its business value. Information should be classified according to specific goals and using an agreed-upon methodology. In general, data can be classified according to several criteria. Perhaps the most obvious factor to consider is business criticality. Under what circumstances do users need specific information and within what time frame? What data is essential for business operations, and what are the interdependencies between critical and noncritical data? Other questions to ask in the data-classification process are what storage resources do specific data consume, and what management costs are involved? Where is the data physically located? And finally, how do the values associated with these criteria change over time?

The next step in forming an ILM solution is to develop what the SNIA Data Management Forum calls service-level objectives and policies that dictate the migration of data through the different storage tiers. Under what conditions should data be moved from tier one—high-end production systems—to nearline or offline systems? To succeed, the data-classification and policy-development processes must be a collaboration between storage administrators and data owners.

Wake Forest represents a model ILM scenario. Say a patient comes to the center with a broken arm. The break is captured in a digital image that might be accessed several times over the next 4 to 6 weeks as the patient visits the doctor. After the cast is removed, the image might be accessed only once or twice in the next 6 months. After that, the images must remain associated with the patient's medical record for 7 years after the patient's 18th birthday. So, if a child breaks her arm at age 4, for example, the medical center must keep a record of the image for 21 years, although it might never be accessed. Thus, during the first 4 to 6 weeks after an injury, the image must be readily accessible and probably stored on a high-performance storage subsystem. But where should it be located 4 months after the event—or 10 years later? And how should the migration process be managed? An ILM strategy must answer these sorts of questions. At Wake Forest, the IT staff concluded that it couldn't classify data because the process was too complex and had too many intervals. Instead, IT lets each department determine what data should reside on what level of storage. IT's role is to clearly explain to the departments the costs associated with their decisions.

The State of ILM
In an ideal world, ILM would be straightforward to implement and assess. Companies would have a standard methodology for classifying data and a set of resources to automate data migration according to specific policies. Moreover, companies would offer integrated ILM solution sets to manage data across the enterprise. In reality, enterprises use ILM concepts inconsistently. For example, data migration from one storage tier to another is often still done manually. And ILM concepts are often used to address specific, immediate problems instead of functioning as an overall strategy. "There are three primary pain points," says StorageTek's Rief: the misuse of primary disk storage space by keeping old data on it for too long, inadequate backup and recovery infrastructures, and archival applications that don't comply with government regulations.

Today, enterprise storage vendors provide best-of-breed solutions to address each of these pain points. As yet, no integrated, total ILM solution is available from any one vendor. Companies must integrate different ILM products from different vendors. A number of storage vendors have formed alliances to provide ILM solutions—for example, BMC Software and Princeton Softech, EMC and Outerbay Technologies, VERITAS and Network Appliance, and StorageTek and various partners. Nevertheless, using best-of-breed solutions lets an organization try out ILM and see its benefits firsthand. For example, a best-of-breed approach might include a storage resource-management solution from one vendor, an email archiving application from another, and database-archiving technology from a third. Lee suggests that after an organization has seen the advantages of ILM demonstrated in one area, it's more willing to begin to develop overall an ILM strategy. Often, says Lee, a company first applies ILM concepts either to email or database archiving, then moves forward with other applications.

Beyond the Hype
Although some pundits have dismissed ILM as simply the latest industry buzzword created by vendors to help sell their products, it's more than just a marketing strategy. ILM isn't really about technology at all. Instead, it's a framework in which storage administrators, working in conjunction with data owners, can more precisely align the business value of information with the cost of storing that information.

ILM is still a relatively nascent idea, but point ILM solutions are available—virtually every major storage vendor has an ILM strategy in place—and can yield a measurable ROI. For example, Wake Forest's Massengill used to upgrade servers every 18 months but now upgrades them only every 3 to 4 years. "We don't upgrade just to get more storage. We can extend the life of the servers," he says.

Moreover, ILM techniques can help companies consolidate both the number of servers they need and their storage infrastructure. In fact, because Massengill can demonstrate the cost savings that departments will realize by moving to an ILM strategy and migrating data to less-expensive storage platforms, the departments are more willing to move to a centralized infrastructure. Essentially, says StorageTek's Rief, ILM represents a key step in the development of the storage infrastructure. "It's a stepping stone to utility computing," he said. "It makes a lot of sense."

4 Steps to Building an ILM Strategy
  1. Enlist the support of the data owners (e.g., departments).
  2. Classify data according to its business value.
  3. Classify data according to additional criteria, such as where it's located, what storage resources it consumes, and its management costs.
  4. Develop service-level objectives and policies for migrating data through the different storage tiers.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.