The advent of storage networks and Storage Resource Management (SRM) software and the move to consolidate IT facilities and cut costs have compelled many companies to reshape their storage infrastructures. But an emerging new management discipline might have a long-term impact on the way administrators plan and implement storage capacity. The new discipline examines the data life cycle and is aimed at effectively managing the aging of data.
The concept behind the data life cycle is that data has different values at different moments in time. For example, when an email message first arrives, the information it contains might be urgent. With each day that the email stays in an Inbox, its urgency potentially diminishes. After 3 or 4 days, an email message might have no value at all to the user. However, as the Microsoft antitrust case and the recent cases involving the recommendations of analysts at New York stock brokerages demonstrate, email messages might retain value to other parties even if they're no longer useful to the recipient.
Administrators can use a data life-cycle management approach to assemble the proper combination of storage devices, media types, and network infrastructure to create an appropriate balance of performance, data accessibility, easy retrieval, and data reliability based on the relative value of the data. The data life-cycle management approach examines data capture, transfer, processing, analysis, storage, backup, retrieval, archiving, and deletion. In using this approach, you determine whether you need to store data online, near-online, or offline and when data should be deleted.
In many ways, data life-cycle management represents the evolution of Hierarchical Storage Management (HSM) techniques. Vendors first developed HSM products in the mid-1990s in the mainframe environment for distributed computing implementations. HSM offers several benefits. It reduces the total amount of expensive RAID disks an enterprise needs. Increasing storage use efficiency can improve performance. Moreover, you can perform some routine storage housekeeping tasks more easily with HSM products.
In HSM implementations, data automatically moves from expensive hard disks to less expensive optical media or to tape according to specific policies. Users don't have to know that their data has migrated to a less costly storage media because HSM products track data movement and create paths for data retrieval. When an HSM product moves data, it creates a pointer to a file's new location. When a user or application retrieves a file that has moved down the storage hierarchy, the HSM product automatically returns the data to the top level of the storage infrastructure.
Companies that use an HSM approach typically use two triggers to move data. The most common trigger is time. Data that workers haven't used within a specific time period moves to a less expensive storage device. The second trigger is capacity. As disks fill, data can move down the hierarchy.
But several trends are forcing companies to rethink their migration strategies, fearing HSM might be too simplistic an approach. For example, many companies now want realtime access to their data for longer time periods. Consider how realtime access affects credit-card transactions. In the past, credit-card transactions were generally completed within a 120-day cycle. In the first 30 days, a transaction occurs and the customer is billed. In the second 30 days, the customer pays what the bill. The last 60 days covers late payments, billing disputes, and other anomalies. By the end of 120 days, most transactions are closed.
But are the transactions closed? Because customers now have Web access to their credit-card accounts, they now want the ability to review their transactions for the past year, or perhaps longer. Even if customers don't use that data, it must be readily accessible or the value of the service is lost.
Data life-cycle management gives administrators a framework to understand the value of different records and helps them build storage infrastructures that reflect those determinations. Data life-cycle management is an under-developed discipline; data-intensive government research laboratories are pioneering its use. But data life-cycle management might become an important skill for storage administrators.