Summary: Spatial data is implicitly storage-intensive. Companies deploying ArcGIS applications in the cloud need robust block and file storage solutions with large shared volumes, high availability and large cache pools. However, public cloud services often don’t stack up. Learn how one of North America’s largest geospatial solutions firms solved their storage demands by deploying Zadara Storage Virtual Private Storage Arrays, a SAN and NAS Enterprise Storage as a Service solution, at AWS cloud.
Deploying ArcGIS Applications in the AWS Cloud
Spatial data is implicitly storage-intensive which can complicate meeting client requests for delivering data on the Web instead of on hard drives.
Five years ago one of North America’s largest full service geospatial solutions company that specialized in multi-disciplinary geospatial data generation, integration, enablement, and analytics delivered through a unique cloud-based infrastructure, began deploying customer-facing applications based on ArcGIS at the Amazon Web Services (AWS) cloud. The company uses ArcGIS desktop to build maps, merge imagery and vector line work, then used ArcGIS to publish it to ArcGIS Online as well as to AWS.
Spatial Data is Too Storage Intensive for AWS Alone
However the storage component of the application at AWS quickly proved to be problematic. Within a short time, its AWS application had grown to over 100 TB. Because they ran the Windows version of ArcGIS it needed file storage attached to a Windows server instead of Amazon’s object storage-based S3 offering. Amazon’s other storage product, Elastic Block Store (EBS), fit the requirements of being Windows compatible, but its 1TB volume limit and the single EC2 instance restriction meant they had to utilize software RAID on the EC2 server to connect the 1TB volumes with the larger data sets. This was costly from a time perspective and did not provide the elastic nature the company was seeking from the cloud.
To get around EBS restrictions, the company briefly tried an open source-based storage approach, which only added to management complexity and required additional capacity and AWS instances – hence additional costs. In exploring options, the company also learned that using products like Gluster with EBS storage didn’t avoid the issue of single EC2 access either.
Deploying AWS Compute with Zadara Storage
In September 2013 our customer began using Zadara Storage’s Virtual Private Storage Array (VPSA). With Zadara, they were able to design an architecture that effectively attached one large storage device directly to multiple Windows machines and had it behave just like a standard NAS devices that ArcGIS expects.
The Nitty-Gritty of the Storage Architecture
Their architecture shares Zadara’s storage across three servers at AWS East 1 data center, using over 100 TB of RAID 10 storage on over 70 3TB disk drives, with an 8 vCore controller providing a 32 GB cache. It also uses an additional 20TB of EBS storage for the native ArcGIS application and some Oracle databases, and separately archives about 100 TB of infrequently updated files in Amazon S3 as a backup.
With this approach, they had the best of all worlds –highly reliable enterprise-grade storage with features not available from AWS, the full geographic data management power of ArcGIS, and a storage architecture that was far more flexible and scalable.
Assuring Client Demands with Disaster Recovery and High Availability
Zadara storage at AWS also enables architectures that eliminate any potential single points of failure, since volumes are shared and support the standard file access protocols used by Linux and Windows systems, NFS and CIFS respectively, for global replication for disaster recovery and multi-site collaboration use cases. Even though provisioning takes place over the Web, this firm had ready access to extremely knowledgeable customer support to help explain the many options it had to enable superior capacity and performance.
Conclusion
From start to finish, deploying with Zadara took only a matter of weeks. The ability to share storage volumes and the fact storage volumes do not have a size limit made it easy to grow the AWS application literally in minutes. The ArcGIS administrator simply logs in to the firm’s own management portal – not shared with other Zadara customers – and adjusts the amount of the type of underlying storage (disks of different sizes/types and SSDs) or controllers.
Most technologies come with all kinds of gotcha’s, yet the team found our approach to genuinely work as advertised: as a flexible, scalable approach to running a client-facing ArcGIS data storage application in the public cloud. It saved time, management complexity and costs, while meeting client needs for Web access, and providing a framework for further growth with client-facing offerings.
More importantly, because the deployment was painless, they could stay focused on its core mission of serving smart geospatial information to clients, and spend less time addressing the details of maintaining an enterprise cloud solution.