There has been a lot written about where to store cold data—data that is rarely if ever accessed. But there has been far less insight on how and where to store frequently accessed data--otherwise known as hot data. ITPro Today spoke with Laz Vekiarides, CTO and founder of ClearSky, on the unique needs of storing and accessing hot data.
ITPro Today: What is your definition of hot data?
Vekiarides: Essentially, it’s any data an organization would access over the course of a week. Hot data tends to be anywhere from 7% to 12% of a company’s provisioned primary storage for a particular workload. We’ve actually measured it. I’ll never forget the first customer we ran the analytics on when we were first designing our system. We had half a petabyte of storage and found that, on average, they were accessing 5% to 7%.
ITPro Today: Why is figuring out how to store hot data a challenge?
Vekiarides: The challenge is that hot data has to be available at the edge to provide performance. You can’t break the laws of physics, and there’s not enough bandwidth in the world to eliminate latency if it has to be pulled down from a data center hundreds of miles away. It’s more important now than ever to make sure hot data is stored near users and apps so analysis can happen instantaneously, enabling users to make real-time decisions.
ITPro Today: It’s still fairly common to store hot data locally instead of in the cloud. What’s wrong with doing it that way?
Vekiarides: One of the biggest reasons is access. Not only do people in different locations need access to the same data; developers and apps working in different cloud environments also need access. Moving data from one place to another is also difficult. Then there is latency, which happens when multiple users need to access data at the same time. And protecting all of that data at the edge is a huge, expensive undertaking that requires a separate system for backup and a secondary data center for disaster recovery.
ITPro Today: If companies were to move to a cloud-based storage model for hot data, would that solve all of these problems?
Vekiarides: It’s definitely part of the solution, but it doesn’t solve every problem. And remember: Even basic cloud storage is object storage. Applications don’t talk object; it’s a different protocol.
ITPro Today: So what is the solution? If cloud-based storage isn’t the answer and on-premise storage isn’t the right way to go, what’s the best way to store and access hot data?
Vekiarides: A hybrid approach—one that incorporates the cloud, on-premise storage and storage within 100 miles of your location. Our approach starts by ensuring that all data is stored and protected as a single copy in the cloud, with hot data cached at the edge—on premise, in applications, in private clouds, etc. Hot and “warm” data is also stored within about 100 miles of the customer’s location in one of our PoP [point of presence] sites. We do that to reduce latency to 1 or 2 milliseconds for data that’s needed immediately.
ITPRO Today: It seems to me like it’s a more personal or hands-on approach to storage that straddles the lines.
Vekiarides: I think so. It’s more interactive with the workload and more tailored to how users are actually using the data. Here is an example: One of our customers, Nuance, whose technology underlies Apple Siri, was running very old storage that was provisioned as primary, with no backups. They were concerned that if any of them were infected with malware, they wouldn’t be able to recover their developers’ work. Just for the price of adding backup to this obsolete storage array, they got a much higher-performance system because we use flash. They also were able to cover all of their backup and DR needs from the cost of a practical durable copy. They had over 60% TCO.
ITPRO Today: Is this hybrid approach to hot data storage more cost-effective than either on-premise or cloud-based methods?
Vekiarides: They are getting rid of secondary storage and don’t need multiple copies of their data, which saves money. Typically for a protected terabyte of data–backed up with a DR site–we can save a customer at least 50%. But, more importantly, they can get rid of a bunch of headaches around managing backups and everything that has to happen to keep machines running.