Originally published on InfoWorld
Moving petabytes of production data is a trick best done with mirrors. Follow these steps to minimize risk and cost and maximize flexibility
Enterprises that are embracing a cloud deployment need cost-effective and practical ways to migrate their corporate data into the cloud. This is sometimes referred to as “hydrating the cloud.” Given the challenge of moving massive enterprise data sets anywhere non-disruptively and accurately, the task can be a lengthy,complicated, and risky process.
Not every organization has enough dedicated bandwidth to transfer multiple petabytes without causing performance degradation to the core business, or enough spare hardware to migrate to the cloud. In some cases, those organizations in a physically isolated location, or without cost-effective high-speed Internet connections, face an impediment to getting onto a target cloud. Data must be secured, backed-up, and in the case of production environments, migrated without missing a beat.
[ Working with data in the cloud requires new thinking. InfoWorld shows you the way: How Cosmos DB ensures data consistency in the global cloud. | Stay up on the cloud with InfoWorld’s Cloud Computing Report newsletter. ]
AWS made hydration cool, so to speak. In fall 2016 AWS branded such offerings as Snowball, a petabyte-scale data transfer service using one or more AWS-supplied appliances, and Snowmobile, an exabyte-scale transport service using an 18-wheeler truck that carries data point to point. These vehicles make it easy to buy and deploy migration services for data that resides in the AWS cloud. It would take 120 days to migrate 100TB of data using a dedicated 100Mbps connection. The same transfer using multiple Snowballs would require about a week.
Yet for the remaining 55 percent of the public cloud market that is not using AWS – or those enterprises with private, hybrid, or multi-cloud deployments that want more flexibility – other cloud migration options may be more appealing than AWS’s native offerings. This may be especially true when moving production data, where uploading static data onto appliances leaves the IT team with a partial copy during the transfer. They need a way to resynchronize the data.
The following is a guide to cloud hydration best practices, which differ depending on whether your data is static, and thus resources are offline, or in production. I will also offer helpful tips for integrating with the new datacenter resources, and accommodating hybrid or multicloud architectures.
Static data
Unless data volumes are under 1TB, you’ll want to leverage physical media such as an appliance to expedite the hydration process for file, block, or object storage. This works elegantly in environments where the data does not need to be continuously online, or the transfer requires the use of a slow, unreliable, or expensive Internet connection.
1. Copy the static data to a local hydration appliance. Use a small, portable, easily shipped NAS appliance, configured with RAID for durability while shipping the between sites. The appliance should include encryption – either 128-bit AES, or preferably 256-bit AES, to protect against unauthorized access after the NAS leaves the client facility.
Using a very fast 10G connection, teams can upload 100MB to 200MB of data per second onto a NAS appliance. The appliance should support the target environment (Windows, Linux, etc.) and file access mechanism (NFS, CIFS, Fibre Channel, etc.). One appliance is usually sufficient to transfer up to 30TB of data. For larger data volumes, teams can use multiple appliances or repeat the process several times to move data in logical chunks or segments.
2. Ship the appliance to the cloud environment. The shipping destination could be a co-location facility near the target cloud or the cloud datacenter itself. Regardless of whether the target is a public cloud or hybrid/multi-cloud setting, two other considerations distinguish the smooth and easy migration from those that can become more protracted.
3. Copy the data to a storage target in the cloud. The storage target should be connected to the AWS, Azure, Google, or other target cloud infrastructure using VPN access via high-speed fiber.
For example, law firms routinely need to source all emails from a client site for the e-discovery purposes during litigation. Typically, the email capture spans a static, defined date-range from months or years prior. The law firm will have its cloud hydration vendor ship an appliance to the litigant’s site, direct them to copy all emails as needed, then ship the appliance to the cloud hydration vendor for processing.
While some providers require the purchase of the appliance, others allow for one-time use of the appliance during migration, after which it is returned and the IT team is charged on a per terabyte basis. No capital expenditure or long-term commitment required.
Production data
This process requires some method of moving the data and resynchronizing once the data is moved to the cloud. Mirroring represents an elegant answer to migrating production data.
Cloud hydration using mirroring requires two local on-premises appliances that have the capability to keep track of incremental changes to the production environment while data is being moved to the new cloud target.
1. Production data is mirrored to the first appliance, creating an online copy of the data set. Then a second mirror is created from the first mirror, creating a second online copy.
2. The second mirror is “broken” and the appliance is shipped to the cloud environment.
3. The mirror is then reconnected between the on-premises copy and the remote copy and data synchronization is re-established.
4. An online copy of the data is now in the cloud and the servers can fail over to the cloud.
For example, a federal agency had 2PB of on-premises data that it wanted to deploy in a private cloud. The agency’s IT team set up two on-premises storage resources adjacent to each other in one datacenter, moved production data onto one mirror, then set up a second mirror so that everything was copied. Then the team broke the mirror and shipped the entire rack to a second datacenter several thousand miles away, where its cloud hydration vendor (Zadara Storage) re-established the mirrors.
When reconnected, data were synchronized to represent a full, up-to-date mirror copy. Once the process was complete, the hardware that was used during the data migration process was sent to a remote location to serve as a second disaster recovery copy.
In another example, a global management consulting firm used 10G links to move smaller sets of data from its datacenter to the target storage cloud, and hydration appliances to move petabytes of critical data. Once the 10G link data uploads were copied to the storage resource, the cloud hydration provider used a AWS Direct Connect link to AWS. In this way the resources were isolated from the public cloud, yet made readily available to it. Other static data were copied onto the NAS appliances and shipped to locations that are available to the AWS cloud.
Features for easy integration
Regardless of whether the target is a public cloud or a hybrid or multicloud setting, three other factors distinguish the smooth and easy migrations from the more difficult and protracted ones.
– Format preservation. It’s ideal when the data migration process retains the desired data format, so that IT teams can copy the data into the cloud and instantly make use of it, versus converting copied data into a native format that is used locally but is not accessible from within the cloud itself. IT managers need to be to get at the data right away, without the extra step of having to create volumes to access it. With terabytes of data, the extra few hours of delay may not seem like a big deal, but at petabyte scale, the delay can become insufferable.
– Enterprise format support. Traditional storage device formats such as CIFS and NFS are either minimally supported by public cloud providers or not supported at all. Yet the applications these file systems serve often yield the most savings, in terms of management time and expense, when moved to the cloud. Having the ability to copy CIFS, NFS, or other legacy file types and retain the same format for use in the cloud saves time, potential errors, and hassle from the conversion, and helps assure the hydration timeline.
– Efficient export. No vendor wants to see a customer decommission its cloud, but when needs change, bidirectional data migration or exporting of cloud data for use elsewhere needs to proceed just as efficiently – through the same static and production approaches as described above.
Hybrid cloud or multicloud support
A final consideration with any cloud hydration is making sure it’s seeded to last. With 85 percent of enterprises having a strategy to use multiple clouds, and 20 percent of enterprises planning to use multiple public clouds (RightScale State of the Cloud Report 2017), IT teams are revising their architectures with hybrid or multicloud capabilities in mind. No company wants to be locked into any one cloud provider, with no escape from the impact of the inevitable outage or disruption.
Cloud hydration approaches that allow asynchronous replication between cloud platforms make it a no-brainer for IT teams to optimize their cloud infrastructures for both performance and cost. Organizations can migrate specific workloads to one cloud platform or another (e.g., Windows applications on Azure, open source on AWS) or move them to where they can leverage the best negotiated prices and terms for given requirements. A cloud migration approach that enables concurrent access to other clouds also enables ready transfer and almost instant fail-over between clouds, in the event of an outage on one provider.
Experts have called 2017 the year of the “great migration.” Projections by Cisco and 451 Research suggest that by 2020, 83 percent of all datacenter traffic and 60 percent of enterprise workloads will be based in the cloud. New data migration options enable IT teams to “hydrate” their clouds in ways that minimize risk, cost, and hassle, and that maximize agility.
Howard Young is a solutions architect at Zadara Storage, an enterprise Storage-as-a-Service (STaaS) provider for on-premises, public, hybrid, and multicloud settings that performs cloud hydrations as one of its services. Howard has personally assisted in dozens of cloud hydrations covering billions of bits of data.
New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.