Hardware Will Fail, Storage Will Work

Storage hardware fail

There’s a great truism in the IT industry that says over time, all hardware will fail and all software will work (eventually). Let’s think about that for a moment.

Take any hardware device (in this instance storage) and we expect that hardware to reliably work from day one. However, we also expect that at some stage the device will also fail. From an individual hard drive perspective, the head could crash, the controller board could fail or some other part of the rotational mechanics could go wrong. Solid-state disks have limited endurance and are bound to fail at some time in the future, setting aside other issues with the controller components.

Within a storage array, we cater for failure by using multiple redundant drives (protected with RAID or erasure coding), plus additional power supplies, controllers, backplanes and so on. If we spend enough time, money and effort, we can eliminate almost all of the potential failure scenarios, including having duplicate equipment in another location to cater for failure outside the array.

Now let’s look at software. With the exception of the most basic “Hello World” program, every piece of software has bugs. In fact, one of the simplest programs ever, IBM’s IEFBR14 on the mainframe – a one-instruction program – had a bug. IEFBR14 “does nothing” allowing programmers to use JCL to allocate and delete files. The name comes from the initial single instruction used in the program – BR14 or branch to register 14, which in mainframe terms means return to where you were called from. IEFBR14 had to be amended to include an extra instruction to clear register 15, which set the status code of the program. Without this, IEFBR14 erroneously returned the status of the previous step. Over time, software becomes more reliable as bugs are identified and eliminated (with the minor caveat that new features also introduce new potential bugs). Some of today’s oldest code is the most reliable as it’s been through failure scenario testing for almost every conceivable problem.

Software Defined Storage is based on the above assumptions. Hardware is assumed to be inherently unreliable and prone to failure. The software is designed to accept and manage these failures by implementing protection schemes that automatically manage the recovery of failed components, such as hard drives, or an entire node in a scale-out storage solution. This is the so-called pets versus cattle scenario. Legacy architectures, including storage, have been focused on the application hardware as a “pet” that has to be nurtured and cared for over time. This is contrast to modern software-defined storage solutions that treat individual hardware nodes as “cattle”, that are one of many “expendable” resources; lose a server/node and simply replace it with another one.

So what does this mean for ongoing storage design? It highlights the fact that the majority of today’s development and innovation is being achieved in software. Vendors are able to take commodity components and use them to build storage solutions at a much lower cost than was achievable 20 years ago.

If we look at the industry, we can see this trend occurring. Two decades ago, storage vendors built proprietary hardware solutions that were expensive to manage and maintain. The storage hardware had to be over-designed and managed and cared for. Over time, the proprietary components have been gradually eliminated with hardware functionality replaced in software, as hardware has commoditized and become more powerful (almost all storage solutions today are based on the standard Intel x86 architecture). Whilst there is still a place for bespoke storage hardware (which is moving to the super-high performance, low latency requirements), the vast majority of customer needs can now be delivered on commodity devices with advanced software features. Most important, this move to commodity hardware with software defined storage has allowed prices to fall significantly and the incumbent vendors will find increasingly hard to justify the markup they place on hardware.

Learn how software-defined storage differs from traditional SAN and NAS by downloading this white paper.

Share This Post

More To Explore