For IT departments, managing data storage can feel like a never-ending task.
Organisations capture and manage more data than ever, keep it for longer, and are completely dependent on it to run the business.
Firms can adopt two approaches when designing storage to match the need for increased capacity – they can scale up or scale out.
So it pays to know the difference, and in this article we look at horizontal vs vertical scaling in storage, the pros and cons of each and the scenarios in which you might carry out each of them, whether SAN, NAS, hyper-converged, object, or in cloud storage.
Horizontal vs vertical scaling
IT systems can scale vertically, horizontally, and sometimes both.
In broad terms, vertical scaling, or scale-up, entails installing more powerful systems or upgrading to more powerful components.
This could mean putting a new processor in an x86 system, deploying a more powerful server, or even moving workloads to a new hardware platform entirely. In the cloud, it can be done via upgrades to processors and memory.
Meanwhile, horizontal scaling adds to resources by expanding the number of servers or other processing units. Rather than rely on a more powerful system, scale-out upgrades operate by splitting the workload between a large number of lower-cost, commodity units.
Google’s search system is one example of a massively parallel, scaled-out system. In fact, Google holds some of the key patents for MapReduce systems that allow tasks to be split into massive, parallel processing clusters.
Horizontal vs vertical scaling in storage
Scaling for storage follows a similar approach. IT departments can scale capacity via bigger or more drives in the storage subsystem, or by spreading workloads across more devices.
To scale up, with bigger drives in servers and hyper-converged infrastructure (HCI) or increased capacity in NAS and SAN systems, is technically relatively straightforward. However, even with the larger-capacity NVMe, SSD and conventional drives available today, those with larger systems can still hit bottlenecks.
Either the system will not scale well as it nears capacity limits, or other bottlenecks will appear. Typically, bottlenecks in vertically scaled storage arise from throughput limits in storage controllers because most storage subsystems can only accommodate two controllers. Some systems do, of course, allow the controllers themselves to be upgraded. On network storage, the network interface can also become a bottleneck.
The alternative is to scale storage out by adding more nodes to work in parallel. Here, storage nodes operate together in clusters, but present their capacity as a “pool” to the application.
Adding nodes removes controller and network interface bottlenecks, because each node has its own resources. HCI and computational storage take the idea a step further. HCI combines storage, networking and compute in one unit, whereas computational storage allows the storage subsystem itself to take on some processing tasks, such as encryption or compression, close to the storage.
“Hyper-converged infrastructure brought this horizontal scaling model to the limelight,” says Naveeen Chhabra, an analyst at Forrester. “This concept of horizontal scaling was introduced by the hyperscalers and is being used for the storage services they offer to the market.”
Scaling storage on-premise
To scale up storage in an on-premise environment can be relatively simple. At the most basic level, IT teams can simply add more or higher-capacity drives. This applies to internal storage, direct-attached storage, and storage within HCI systems.
For networked storage, adding or swapping out drives is also the simplest option. Hardware suppliers largely support tool-free upgrades, and storage management software is able to reconfigure RAID configurations automatically in NAS and SAN systems.
Changing or upgrading controllers or network interfaces will be more labour-intensive, and is likely to require powering down the array.
In both cases, downtime will be an issue. Hardware upgrades mean taking systems offline, and RAID groups will need to be rebuilt. Also, systems can only be upgraded if they are provisioned for additional capacity – such as with spare drive bays or swappable controllers – up front. This can mean buying a larger array than is initially needed.
The alternative – to swap over to a newer, larger system – can minimise downtime, but firms need to allow for time needed to transfer data, and the risks of data loss.
Scale-out systems might, therefore, seem easier. Modern NAS and SAN systems, and HCI, are designed to scale out (as well as up, to some extent). Adding further nodes or arrays expands the storage pool and should be possible with little or limited downtime. There is no need to touch existing hardware, and software will add the new capacity to the storage pool.
Sometimes, scaling out is the only way to handle rapid growth in demand for storage, especially of unstructured data – but it has its limitations. Scale-out systems are less suited to applications such as transactional databases, for example.
Scaling storage in the cloud
Cloud storage is built on scale-out architectures. The building blocks – parallel commodity hardware and object storage – were designed from the outset to accommodate ever-larger datasets.
Public cloud systems are, therefore, largely scale-out systems. This works well for elastic workloads, where organisations want to start small and build, and where applications can operate on horizontally scaled systems, such as scale-out databases.
Scale-out cloud systems are usually built with x86 servers with direct-attached storage that act as nodes or HCI clusters, each running object storage software and using erasure coding to create the equivalent of RAID protection. All this allows cloud users to add capacity quickly, or even automatically.
But this does not mean that the only way to scale in a public cloud environment is to add capacity. IT architects can specify different performance tiers from the main cloud providers.
Amazon Web Services, Google Cloud Platform and Microsoft Azure each provide a range of storage performance, based on their SSD (and spinning disk) systems.
AWS, for example, has IOPS options that run from 16,000 to 64,000 per volume via EBS. Azure Managed Disk reaches up to 160,000 IOPs and Azure Files up to 100,000 IOPS.
GCP’s persistent disk runs up to 100,000 IOPS read and its Local SSD up to 2,400,000 IOPS read. On all platforms, write is generally slower.
Up or out?
Of course, costs increase with the higher-performance tiers, so CIOs will need to balance capacity and performance across their cloud estate.
Increasingly, hybrid architectures support the best of both worlds. Organisations can scale up their on-premise hardware, but use the public cloud to scale out with easy-to-deploy additional capacity.
Nor do processing and storage have to move in lock-step. It is quite possible, and increasingly common, to scale up computing for performance, and scale out storage, on-premise or via the cloud, to make use of the capacity and resilience of object technology.