One of the key features that is built into Microsoft’s ReFS file system is block cloning. On the surface, block cloning looks a lot like deduplication. However, the two technologies are different from one another and have different use cases.
Like block cloning, most of the deduplication solutions that are available today are designed to work at the block level. Although the specifics vary by vendor, deduplication generally works by creating a hash of each storage block. These hashes are then compared with one another in an effort to locate duplicate storage blocks. If duplicate blocks are found, then only a single copy of the block is retained. Any files that use the information within that block are restructured so that they use the shared copy of the block rather than using a dedicated block copy. At that point, the duplicate blocks that had previously been associated with the files can be safely removed since they are no longer in use.
Again, the specific techniques used within the deduplication process tend to vary (sometimes widely) by vendor. Regardless, block level deduplication is designed to reduce the amount of physical storage space consumed by data. The amount of space that can be reclaimed varies based on factors such as the deduplication algorithm and the type of data that is being deduplicated, but deduplication is often able to reclaim a substantial amount of storage space.
The ReFS file system’s block cloning feature has some similarities to block level deduplication, but, unlike deduplication, it is not designed to reclaim storage space. Instead, block cloning is designed to decrease the amount of time it takes to perform copy operations, especially those related to synthetic full backups. Block cloning can also dramatically decrease the amount of time that it takes to merge Hyper-V virtual machine checkpoints (snapshots).
As we all know, copy operations generate storage IOPS. Read IOPS are generated as the file that is being copied is initially read, and write IOPS are generated as the file copy is created. The storage subsystem is capable of handling only a certain number of IOPS every second. As such, the file copy process must compete with other processes for available IOPS.
The ReFS file system’s block cloning process is based on the idea that there are certain types of copy and data transfer operations that can be completed without generating large numbers of storage IO requests.
Imagine for a moment that you need to copy a file to another folder on the same disk on which the file currently resides. A traditional file copy operation would require that the file be read into memory, then written to disk. This process creates a secondary copy of the file, but the copy process generates a number of storage IOPS. Storage consumption must also be considered since the file copy uses just as much storage space as the original file does.
But, think back to what I said earlier about how storage deduplication works. Deduplication is based on the idea that if two or more files contain identical storage blocks, then those files can share a storage block rather than each file requiring its own distinct copy. Block cloning is based on a somewhat similar concept. Since files can share logical clusters (which are physical locations within a volume), there is no need to actually copy a file. Instead, the file system can perform a metadata-level remapping operation. This allows the copy process to be completed nearly instantaneously, while saving physical storage space at the same time.
So, with this in mind, consider how block cloning can expedite Hyper-V checkpoint management. Normally, a checkpoint merge causes checkpoint contents to be physically ingested into a virtual hard disk file. This is a very IO-intensive process. If block cloning is used, however, the merge process could be performed almost entirely at the metadata level, greatly reducing the required storage IO and the time it takes to complete the operation.
Likewise, a backup application that uses an ReFS volume as a backup target could be designed to create a synthetic full backup by performing metadata operations. This approach isn’t all that different from the way that copy data management solutions work.
This article is meant to be a high-level overview of what ReFS block cloning technology is and what it does. For more on block cloning on the ReFS file system, check out Microsoft’s documentation.