One of the biggest problems that storage administrators deal with is the amount of storage and backup space taken up by duplicate files. A 1MB document with wide enterprise distribution in a large corporate environment can easily require hundreds of megabytes of storage as individual users make their own copies of documents, which then end up on local computers or network shares that are backed up. Even with less obvious examples, such as files with much more limited distribution, there are hundreds, if not thousands of duplicated files on most corporate networks, adding additional overhead to storage and backup requirements.
Storage technology is keeping up with the duplicated data problem by introducing a concept known as data deduplication (dedupe). With this technique, duplicated data is deleted, leaving only a single physical copy of the data. All the links and references to every other copy of the deleted data will be redirected to the remaining instance of the data. Only the single copy of the data needs to be backed up and archived, allowing for much more efficient use of backup and storage media. Not only is less storage required for backup, but because deduplication is done at the front end of the backup process, the amount of time required for backup can be significantly reduced, providing additional benefits to those organizations with a limited backup window.
Data dedupe is such a high priority with customers and vendors that it has taken a key position in many new products, ranging from enterprise-focused, high-end, crossplatform dedicated hardware virtual tape libraries (VTLs) designed to protect enterprise networks and enterprise SAN environments to products designed to protect a single server or smaller corporate enterprise. The bigger the enterprise, the larger the impact that dedupe can have on the bottom line.
An example of the former type of product is the newly released Quantum DXi7500 Enterprise backup system. The DXi7500 is an enterprise-class hardware backup and data protection appliance that's focused on data de-duplication and remote data protection. It's a good example of a dedicated appliance that uses dedupe as a key component of its backup architecture. The DXi7500 provides a raw capacity of up to 240TB and lets you back up as much as 8TB an hour while de-duplicating the data backup on the fly.
An example of a more broadly targeted product that uses data dedupe is Microsoft System Center Data Protection Manager (DPM) 2007, currently in beta. With this newest release of its server-class data backup and protection application Microsoft includes on-the-fly data dedupe as one of the major improvements. You can install DPM 2007 on a Windows Server 2003 system or buy it as a network appliance server that uses Windows Unified Data Storage Server as a base platform.
Regardless of the size of your enterprise, considering a data dedupe solution as part of your data backup and protection scheme makes good sense, from both a technical and economical point of view. The advantages in improved backup times and reduced storage requirements have far-reaching consequences in your computing environment and should be key considerations when you evaluate data protection solutions.