Storage Data Protection Options and Best Practices

There are many horror stories told about the risks of not implementing an adequate data protection regime. Here are just a few to set the scene:

  • 30% of all business that have a major fire go out of business within a year – 70% fail within five years (Home Office Computing Magazine).
  • 31% of all PC users have lost all their files due to events beyond their control.
  • 34% of companies fail to test their tape backups and of those that do, 77% have found tape backup failures.
  • 60% of companies that lose their data will shut down within 6 months of the disaster.

Whether we like it or not, in today’s IT world, data protection is a must-have feature that only the most reckless CIO would avoid implementing. For firms under regulatory control, demonstrating that a data protection plan is in place (and being able to prove it works) is part of the day job. So what are our choices and what should we be thinking of as best practice?

 

Step 1 – Establish Your Requirements

Without a clear view of what your business and applications need in terms of availability, it’s impossible to come up with a comprehensive BC/DR (business continuity/disaster recovery) plan. BC/DR requirements will be based on metrics such as RPO (Recovery Point Objective) – how old the recovered data can be – and RTO (Recovery Time Objective) – how long is acceptable to get back to normal operations. Availability requirements should be applied from a practical standpoint; a line of business may request that all of their applications have RTO=0 & RPO=0 but this isn’t a financially viable solution. Some applications (such as back-end report generation) could have longer recovery times and even accept data that’s up to 24 hours old, for example.

 

Step 2 – Evaluate Recovery Strategies

There are many ways to ensure data is available and consistent. Most of these are dependent on the recovery requirements of the application. Techniques include:

  • Replication – this can be synchronous or asynchronous, the difference being the accuracy of data replicated to a secondary location. Synchronous guarantees both primary and secondary copies are kept identical; asynchronous means the remote copy could be seconds or minutes out of date, depending on the volume of data being moved and the replication technique (streamed or snapshots) used.
  • Application Replication – data can be replicated at the application layer, for example through database log shipping or creating mirrors of the data on the host. Application-based DR may require more intensive recovery efforts as each host/application typically needs manual recovery steps.
  • Hypervisor Backup – systems like VMware’s vSphere and Microsoft Hyper-V provide interfaces that allow traditional backup systems to copy data out with little or no host impact. These solutions also integrate with storage arrays to offload the backup process while maintaining data integrity.
  • Backup to Disk/Tape – this is simply taking a point-in-time copy of the data to another system using a backup tool and is the traditional way backup has been done for at least the last 30+ years. Unfortunately, backup and restore to tape (or even disk) is time consuming and doesn’t take advantage of the underlying storage hardware that can offload the backup task more efficiently.
  • Snapshots – although strictly not a backup (because the data doesn’t get moved to another piece of hardware), snapshots can be an effective way of recovering from minor issues like deleted files or corrupted data. However, this technique shouldn’t be used in replacement of a proper BC/DR strategy.

The protection method can then be mapped to the application, providing recovery that is most appropriate to the service levels (RTO/RPO) required. In many cases, multiple layers of protection provide a “belt and braces” approach. Replication, for example is great at protecting against hardware failure but will faithfully replicate data corruption issues, so it makes sense to also use snapshots and/or backups to give a secondary recovery option.

 

Step 3 – Data Protection Implementation

Implementing BC/DR may seem the obvious next step but there are some non-obvious points to consider. Data protection needs to be monitored and reported on. This includes the standard checks like ensuring replication links are in place and working, and checking that backups complete every day. Repeatedly failing backups need to be flagged for attention.

There are also a few other points that many organisations fail to think of or implement. Firstly, backups need to be tested. As one of the stats quoted above shows, 34% of companies never bother to check if the backups their company could depend on will actually work! Testing can be invasive and needs to be carefully managed to ensure that any recovered system can be tested without affecting the production application. Fortunately, with server virtualisation, application recovery testing has become much easier to do and in many cases can be almost completely automated.

Testing the backup works is one part, however the DR recovery process needs to put in place rules that define application recovery order, depending on the severity of the incident. For example, what services (DNS, DHCP, Active Directory) need to be available before any applications can be brought back? What order do applications need to be restored (and what are the dependencies)? A bank, for instance would want core applications to be recovered and other ancillary ones (like mortgage applications) to be deferred until later.

Third is the question of how backups will be tracked over time. Today server virtualisation allows VMs to be created and destroyed at will, in many cases between backup cycles. This will be even more likely as we move to containers. VMs are mobile entities and can move between physical hosts, so some way of tracking the backup of these transient systems needs to be in place. Imagine having to locate a virtual server deleted six months ago that contains some critical data – how will you know which process backed it up? It won’t appear in the hypervisor inventory.

One side issue to the last point is ensuring that all applications and data come under some backup regime. I’ve seen scenarios where data (either volumes) or entire servers have failed to be added to the backup process, for the problem only to be identified some 6 months later, at which point the missing data is totally lost. In virtual environments, one solution to this problem is to trigger a backup as soon as a VM is built, or to build the backup setup process into the VM build process itself.

 

Step 4 – Review

Of course systems change over time. Applications grow or their requirements change. Servers (both virtual and physical) get decommissioned. So it’s worth having a process that re-evaluates the backup requirements approximately every 12 months. This recertification may be as simple as getting the application owner to confirm that backups in place are still fit for purpose.

One final thought. Like everything in IT, moving to a cloud-based model means implementing self-service. End users can auto-build virtual machines and applications. Backup/restore should fall into the same category. Where possible, end users should be empowered to perform their own recovery of data, as long as that doesn’t impact production operations. Something as simple as providing access to snapshots is a great way to start this process off.

To learn more about data protection, download our Data Protection Tip Sheet:


a

Share This Post

More To Explore