Migrating data from on-premises locations to the cloud isn't without its challenges, but those challenges increase drastically when you're talking about hundreds of terabytes, or even petabytes, of data. For one thing, migrations involving that much data can take a long time, making it difficult to ensure that the migrated data volumes are fully and completely transferred and synced. Large data migrations also can cause temporary spurts of downtime with on-premises applications during the process.
"Even if you have great system that works fast and has low latency, moving petabytes of data just takes time," said Merv Adrian, a Gartner analyst. "If the system is in use during the migration, you can't just shut it down. Changes are being made to the data, and you will lose those changes or you'll have to build a very complex system of capturing and applying changes that are made while the transition goes on."
The larger the scale of the data migration, the more opportunity for issues along the way. Daud Khan, a vice president at WANdisco, puts it this way: At gigabyte scale, typical outages can range from minutes to a few days, but at larger scales, outages could reach weeks or months.
In addition, unstructured data on premises is mainly stored as Hadoop data, but when moving that data to the cloud, the data must either move to an object data store like AWS S3 or a hierarchical storage technology. This type of move can cause all types of complications.
WANdisco says it has a better way. The company's new LiveData Migrator takes an automated approach using a Hadoop-to-Hadoop migration method. This allows large amounts of data to be moved to the cloud without disrupting business operations or risking data loss. It also allows applications to continue modifying a source system's data without causing issues between the source and the target.
To ensure that data stays in sync, LiveData Migrator uses single-scan technology that scans the data only once. It then uses information provided by the Hadoop name node to notify of any changes in the cluster, which LiveData Migrator then pushes to the destination zone. This allows applications to continue running during migrations without disruption.
While this capability isn't necessary for smaller data migrations, Adrian said it makes senses for very large migrations.
"It takes a lot of time, effort, management and disruption to try to manage all of it yourself," he said. "You would have to develop, tune and specialize new scripts, which is risky, and you're only likely to use it once or twice. That doesn't make a lot of sense."