Managing data can be challenging in any environment. But data management in the cloud is especially difficult, given the unique security, cost and performance issues at play. With that reality in mind, here are some tips to help IT teams optimize cloud data management and strike the right balance among the various competing priorities that shape data in public, private or hybrid cloud environments.
Why Is Cloud Data Management Challenging?
Before delving into best practices for cloud data management, let’s briefly discuss why managing data in the cloud can be particularly challenging. The main reasons include:
- Security: Data stored in the cloud may be easier to breach because the cloud is, by its very nature, connected to the internet. You typically can’t “air gap” cloud data in the way you can on-premises data.
- Cost: The way you manage cloud data can have major implications for your overall cloud computing bill.
- Complexity: Public and private clouds offer multiple data storage options--object storage, block storage, conventional databases and even blockchain-based storage--and choosing the right type for a given data workload can be tricky.
- Performance: Typically, data movement in the cloud is much slower than on-premises, given the bandwidth limitations of the networks that connect cloud resources. Optimizing for data performance in the cloud is therefore a special priority.
Optimizing Cloud Data Management
Those are the problems. Now, let’s look at five ways to tackle them.
1. Data storage tiering
A basic best practice for striking the right balance between cloud storage costs and performance is to use data storage tiers. Most public cloud providers offer different storage tiers (or classes, as they are called on some clouds) for at least their object storage services.
The higher-cost tiers offer instant access to data. With lower-cost tiers, you may have to wait some amount of time--which could range from minutes to hours--to access your data. Data that doesn’t require frequent or quick access, then, can be stored much more cheaply using lower-cost tiers.
2. Object storage (sometimes)
For many teams, object storage services like AWS S3 or Azure Blob Storage are the default solution for storing data in the cloud. These services let you upload data in any form and retrieve it quickly. You don’t have to worry about structuring the data in a particular way or configuring a database.
The downside of cloud object storage is that you usually have to pay fees to interact with the data. For instance, if you want to list the contents of your storage bucket or copy a file, you’ll pay a fee for each request. The request fees are very small--fractions of a penny--but they can add up if you are constantly accessing or modifying object storage data.
You don’t typically have to pay special request fees to perform data operations on other types of cloud storage services, like block storage or cloud databases. Thus, from a cost optimization perspective, it may be worth forgoing the convenience of object storage in order to save money.
3. Cloud data loss prevention
One of the key security challenges that teams face when managing cloud data is the risk that they don’t actually know where all of their sensitive data is within cloud environments. It can be easy to upload files containing personally identifiable information or other types of private data into the cloud and lose track of it (especially if your cloud environment is shared by a number of users within your organization, each doing their own thing with few governance policies to manage operations).
Cloud data loss prevention (DLP) tools address this problem by automatically scanning cloud storage for sensitive data. Public cloud vendors offer such tools, such as Google Cloud DLP and AWS Macie. There are also third-party DLP tools, like Open Raven, that can work within public cloud environments.
Cloud DLP won’t guarantee that your cloud data is stored securely--DLP tools can overlook sensitive information--but it goes a long way toward helping you find data that is stored in an insecure way before the bad guys discover it.
4. Anti-egress cloud architectures
Data egress--which means the movement of data out of a public cloud environment--is the bane of cloud data cost and performance optimization. The more egress you have, the more you’ll pay because cloud providers bill for every gigabyte of data that moves out of their clouds. Egress also leads to poorer performance due to the time it takes to move data out of the cloud via the Internet.
To mitigate these issues, make data egress mitigation a key priority when designing your cloud architecture. Don’t treat egress costs and performance degradations as inevitable; instead, figure out how to store data as “close” as possible to the applications that process it or the users who consume it.
5. Cloud-based data analytics
In addition to allowing you to store data, all of the major clouds now also let you process it using a variety of managed data analytics services, such as AWS OpenSearch and Azure Data Lake Analytics.
If you want to analyze your data without having to move it out of the cloud (and pay those nasty egress fees), these services may come in handy. However, you’ll typically have to pay for the services themselves, which can cost a lot depending on how much data you process. There may also be data privacy issues to consider when analyzing sensitive cloud data using a third-party service.
As an alternative, you can consider installing your own, self-managed data analytics platform in a public cloud, using open source tools like the ELK Stack. That way, you can avoid egress by keeping data in the cloud, without having to pay for a third-party managed service. (You’ll pay for the cloud infrastructure that hosts the service, but that is likely to cost much less than a managed data analytics service.)
The bottom line here: Managed cloud data analytics may be a useful tool, but deploy them wisely.
Conclusion: Smart Cloud Data Management
Like many other things, data management is just harder when you have to do it in the cloud. The good news is that, by being strategic about which cloud storage services you use, how you manage data in the cloud and how you factor data management into your cloud architecture, you can avoid the cost, performance and security pitfalls of cloud data.