The vast majority of enterprise data is unstructured — think audio and video files, medical images, genomics research data, electric cars, and the digital exhaust of IoT products. As storage costs comprise more than 30% of IT budgets in most organizations, the cloud has become a cheaper, simpler alternative for unstructured file and object data storage. A survey of U.S. and U.K. IT managers and directors found that more than half (56%) say that moving more data to the cloud is their top priority with unstructured data.
Yet these cloud data migrations are fraught with complexity and risk. Moving large volumes of data to the cloud can result in errors and data loss. They also take an inordinately long time to complete — sometimes months — and may not result in predicted cost savings. For these reasons, enterprise IT teams may delay or forgo cloud file migrations altogether.
The leading cloud file data migration issues and decisions include:
- Deciding which unstructured data should move to cloud storage;
- Understanding the different storage tiers and when it makes sense to use lower-cost object storage tiers such as Amazon S3 Glacier Instant Retrieval or Azure Blob and when a higher-performing file storage option like Azure Files or Amazon FSx for NetApp ONTAP is ideal and the process for moving data between storage classes once in the cloud;
- Security — the configuration of cloud storage presents new challenges and complications especially when blending hybrid environments;
- If multi-cloud architecture is in place, deciding which cloud to use for which data and workloads;
- Understanding the potential uses of cloud-native services for machine learning and AI projects and considerations for successfully moving data into those services.
Opportunity abounds: but which cloud and which file and object storage?
As demand for cloud file storage has accelerated, the options for customers are changing continually. While this is great news, it's also confusing. The major cloud vendors have dozens of classes of file and object storage from which to choose, each with tradeoffs on cost and performance. Plus, there's always the risk of getting burned on cloud egress fees if users wind up needing to bring that data back out of the cloud more frequently than expected.
The analysis-first file data migration strategy
Typically, cloud file migrations are executed as lift and shift programs. IT organizations migrate entire file shares and directories to the cloud. You may not be able to get the best cost advantage of the cloud from a one-size-fits-all strategy and lift-and-shift strategies are often "set and forget". These moves don't account for long-term plans for unstructured data — such as making data in the cloud available for cloud-based machine learning and AI.
With so much emphasis on data as a strategic lever for competitive advantage and operational efficiencies, it makes sense to institute an analysis-first approach to migrations. Start by getting visibility into data usage and growth — across on-premises, edge and clouds — to understand not only your overall data profile but the requirements of different data sets.
Strive to answer questions such as:
- What data do I have and where is it stored?
- What data sets are accessed most frequently (a.k.a. hot data)?
- What data sets are rarely accessed (a.k.a. cold data)?
- Who uses the data currently and is there value in enabling collaboration outside of your organization?
- What data/files haven't been accessed for more than 3-5 years and should be considered for deep archival storage or confinement and deletion?
- What types of files do we have and which comprise the most storage: a.k.a. image files, video or audio files, sensor data, text data.
- What is the cost of storing these different file types?
- Which types of files should be stored in a higher security level — a.k.a. those containing PII or IP data or belonging to mission-critical projects?
- Are we complying with regulations and internal policies with our data management practices?
The benefits of an analytics-first approach are manifest:
- Cost savings: Based on analysis of unstructured data before you migrate or backup, you may decide to first tier 60% of the data to archive storage in the cloud (like AWS S3 Glacier), and then migrate the remaining 40% to cloud file storage. This can cut down your cloud storage bill significantly.
- Faster migrations with lower risks: By first analyzing data and then migrating or moving by workload, data type or other key value, you can also be more agile: you'll break a massive and disruptive task into smaller bites which is faster and less risky. An added benefit of granular data sets is the ability to pivot to use new cloud resources as they become available with ease.
- Comprehensive data lifecycle management: With regular analysis running on your data assets, you can continually optimize data over its lifecycle — from expensive hot storage to lower-priced warm storage to cold (rarely if ever accessed) storage and then eventually, deletion.
Storage architects still need regular interaction with users, data owners and applications teams to understand their concerns and priorities about migration. A collaborative approach leads to better decisions, faster migrations, and fewer surprises.
After you have designed the migration strategy, don't forget to undertake performance, security and data integrity planning and testing to ensure that all the data moves properly, is protected adequately at the target location and that users don't have issues accessing files after they have been moved. This again goes back to understanding data profiles and requirements so that you aren't moving warm or hot data to a high-latency, deep archive storage target or compromising sensitive data by migrating it to cloud storage without adequate encryption or other protections.
Hybrid and multi-cloud considerations
Beyond migrating data from an on-prem NAS to secondary storage or to the cloud, IT organizations also may need or wish to move data from one cloud to another. In a hybrid or multi-cloud environment, data visibility is more important than ever. Building a data index that identifies all data across your hybrid environment delivers intelligence on where data resides at any moment. This global data index should be flexible enough to support new data formats, storage locations and protocols as requirements change.
When moving data from one cloud to another, you can incur costs if your data was tiered or migrated using a storage vendor's tools. You will first need to rehydrate the data back to the original storage device before moving it to a new cloud service. You may also have to pay egress fees when taking the data out. Therefore, consider carefully the use cases for cloud-to-cloud migration so you don't get burned.
Migrating data for the endgame: native cloud analytics
The major cloud providers now have dozens of services which go far beyond hosting and storage into IoT, DevOps and data lakes. Cloud providers are investing billions into quantum computing, AI and ML to give customers powerful analytics capabilities they'd otherwise need to build and support internally at a high price.
IT organizations will need to carefully consider the data management tools and platforms they are using to tier and migrate data into the cloud, so that they can easily access and move data elsewhere as needed to leverage cloud-native analytics tools. Storage vendors may implement proprietary data formats that prevent direct access to data tiered or migrated to the cloud outside of their own appliance. This approach locks customers into a static storage strategy and prevents access by cutting edge analytics and AI/ML services that access data via open APIs.
Accelerating file data migrations to the cloud can bring a host of benefits, from cost savings and automation to using cloud tools to uncover hidden insights from massive volumes of unstructured data. Maximizing ROI from these migrations requires a nuanced, analytics-based approach using open unstructured data management tools and processes. This will right-place data into the appropriate storage class based on age, usage, compliance needs and/or business priority and allows IT teams to easily move the data again and again as new enterprise storage innovations come to light.
Darren Cunningham is a vice president at Komprise.