In spite of all of the media hype surrounding public cloud services, cloud storage doesn’t always cleanly mesh with an organization’s on-premises storage. This can be true whether the cloud storage is being used as a backup target, a repository for unstructured data or for a variety of other purposes. The key to more seamlessly blending cloud storage with on-premises storage is to take advantage of a caching gateway. There are a variety of caching gateways on the market, marketed under various names. Although feature sets tend to vary among vendors, caching gateways are designed to solve some very fundamental challenges related to the use of cloud storage.
One of the big challenges that caching gateways solve is that of making cloud storage usable. Most of the file servers and NAS appliances that are used for on-premises storage use block storage. In other words, the physical storage is carved up into volumes, each of which is provisioned with a file system such as NTFS, NFS or ReFS. This file system maps the data to blocks of storage.
Cloud storage, on the other hand, is often based on an object-storage architecture. Cloud providers use object storage because it supports massive scalability. Unlike block storage, however, object storage does not use a file system. Instead, object storage systems use an index to keep track of where data is stored. Another important difference is that object storage does not support the use of storage protocols such as SMB. Object storage calls are usually made through the REST API.
The fact that block storage and object storage are so vastly different from one another can pose a problem for an organization that wants to augment its existing resources with cloud storage. In these types of situations, a caching gateway can provide translation services between the two storage types. A workload can communicate with the gateway using a storage protocol such as SMB or iSCSI. The gateway then translates that call into something appropriate for use with object storage, and then completes the request.
This functionality is often leveraged for disaster recovery purposes. In the past, an organization that wants to replicate its unstructured data to a remote location would have had to incur the expense of leasing space in a remote datacenter, and purchasing NAS hardware. Today however, an organization can use a gateway to replicate on premises storage to the cloud, thereby avoiding most of the hassle and expense that is so commonly associated with the remote replication of data.
Another challenge that a caching gateway help with is that of bandwidth limitations. Imagine for a moment that an organization has decided to store all of its unstructured data in the cloud. One of the implications of doing so is increased latency because file access operations have to traverse a WAN connection.
The key differentiator between a caching gateway and a cloud storage gateway, is that a caching gateway has its own internal storage. When a user writes a file, that file is not immediately written to cloud storage. Instead, the file is written to local storage within the gateway appliance. The data is then replicated to the cloud whenever sufficient bandwidth becomes available. The gateway may also use data deduplication to further reduce the bandwidth consumption.
Similarly, when a file is read from the cloud, that file is written to the caching gateway’s internal storage, where it is kept for a period of time. The idea behind this is that if the user needs to open the file again (or if someone else needs to access the file), then a WAN traversal will not be required because the data already exists on premises within the gateway’s cache. Most caching gateways are designed to keep hot data (data that has been accessed recently) in a cache so that the data can be accessed more quickly than would be possible if the data only existed in the cloud.
One more interesting use case for caching gateways is that some organizations have abandoned their branch office file storage infrastructures in favor of using cloud storage instead. Historically, file servers or NAS devices were traditionally placed in branch offices so that the data would be located in close proximity to the users. The problem with this approach is that the servers are expensive to maintain and difficult to back up.
The modern alternative is to replace these branch office file servers with caching gateways. This allows the organization to have all of its unstructured data in one place (in the cloud), and to eliminate the maintenance challenges associated with storing data in branch offices. A low maintenance caching gateway provides branch office users access to their data in the cloud, while the onboard cache helps to reduce latency for the users.
As previously noted, caching gateways vary widely in scope and capability. Even so, they should be considered a must have for any organization that wants to use a mixture of cloud storage and on premises resources.