For many years, RAID (redundant array of independent disks) was the primary tool for combining multiple hard disks into a structure that improved performance and/or guarded against data loss due to disk failure. More recently, however, a RAID alternative called erasure coding has begun to gain traction (although erasure coding has actually been around for a long time). So how do RAID and erasure coding differ from one another, and which should you be using?
Before I try to answer that question, let me quickly review some of the more commonly used RAID levels. These include:
- RAID 0 – Data is striped across multiple hard disks. The benefit to this approach is that it greatly enhances performance over that of a single disk. The disadvantage is that if even one disk fails then the entire volume is lost.
- RAID 1 – Data is mirrored to a secondary hard drive. This method provides protection against hard disk failure, but does nothing to improve performance.
- RAID 5 – RAID 5 stripes data across multiple hard disks, much like RAID 0. The difference, however, is that some parity information is also written to each disk. The advantage to this approach is that the storage volume can continue to function, even if a hard disk fails. The disadvantage is that much of the array’s capacity is lost due to the need for storing parity data. RAID 0 offers better performance than RAID 5.
- RAID 10 – RAID 10 (which is sometimes called RAID 1+0 or RAID 0+1) is a combination of striping and mirroring. It delivers the performance of a RAID 0 stripe set and the fault tolerance of mirroring. The disadvantage to this approach is that because each disk has to be mirrored, only half of the array’s total capacity is available for use.
There are several other RAID levels that have been defined, but the four that I have described are among the most commonly used. As you can see, with RAID there is always a tradeoff among performance, fault tolerance and cost.
Erasure coding has some similarities to RAID storage, in that both technologies are designed to guard against data loss. Whereas RAID storage protects against data loss through the use of mirroring or parity bit storage, however, erasure coding uses a technique that is more similar to what is used by error correcting memory. Erasure coding works by breaking data into fragments, and then mathematically encoding those fragments. Redundant fragments can then be used to reconstruct any data that might be lost.
Erasure coding has a number of benefits over RAID storage. For one, depending on how it is configured, erasure coding can have a much faster rebuild time for failed disks.
Suppose for a moment that a storage array is made up of several 8 TB disks that are configured as a RAID 5 array. If one of those disks were to fail, then the array would continue to function. When the failed disk was replaced, the array would gradually provision the new disk with all of the data that had been on the old disk. This is known as the rebuild process.
The problem with this is that as hard disks get larger, the rebuild times get longer. It can take days for a RAID 5 array to rebuild an 8 TB disk, and the process can take even longer if the array is under a heavy load. Furthermore, the array is vulnerable during the rebuild since it cannot sustain another disk failure until everything is back to normal.
Erasure coding has the potential to reduce rebuild times since the rebuild data can be simultaneously retrieved from multiple sources (depending on how the system is configured). More importantly, erasure coding can potentially (depending on its configuration) protect the system against multiple, simultaneous disk failures.
In all fairness, some RAID levels can protect against multiple disk failures. RAID 6, for example, can protect against two simultaneous disk failures. RAID 10 can also sustain multiple failures so long as a disk and its mirror do not fail at the same time. In contrast, though, erasure coding tends to be more flexible and offers better overall protection.