Storage virtualization is the pooling of physical storage from multiple storage devices into what appears to be a single storage device or pool of available storage capacity. A central console is used to manage the storage.
What is meant by storage virtualization?
Storage virtualization or virtualized storage aims to abstract physical storage systems and drives in order to present them as a single pool of storage capacity. The capacity of this single virtual device can be centrally managed, thus simplifying storage allocation, maintenance and overall management.
Storage virtualization disguises the actual complexity of a storage system, such as a storage area network (SAN), which helps a storage administrator perform the tasks of backup, data archiving and recovery more easily and in less time. The virtualization software used intercepts input/output (I/O) requests from physical or virtual machines (VMs) and sends those requests to the appropriate physical location of the storage devices that are part of the overall pool of storage in the virtualized environment. To a user, however, the various storage resources that make up the virtualized pool are unseen, so the virtual storage appears like a single physical drive, share or logical unit number (LUN) that can accept standard reads and writes.
The virtualization and centralization capabilities make the overall approach different from bare metal storage systems where physical storage devices must be addressed directly. This is also why virtualization offers significant operational efficiencies over bare-metal provisioning of storage. Additionally, by allowing IT teams to address a single device as opposed to many, storage virtualization improves the performance of storage environments and minimizes compatibility and security issues.
How storage virtualization works
To build a virtualized storage environment, multiple physical storage devices are grouped so that they use a single server. The server is assigned virtual storage blocks that can redirect the I/O traffic. The devices are divided into small blocks of data (LUNs). They are then presented to remote servers as a virtual disk. However, the servers see the LUNs as physical disks. A software virtualization layer separates the storage hardware from the virtual volume. This makes it possible for the operating systems (OSes) and applications to access and use the storage.
Storage virtualization technology relies on software to identify available storage capacity from physical devices, to create a barrier between the physical and virtual storage devices and to then aggregate the available capacity as a pool of storage that can be used by traditional architecture servers or in a virtual environment by VMs. In addition to identifying and compiling the available storage capacity, the software makes the capacity available to various applications to use.
To provide access to the data stored on the physical storage devices, the virtualization software needs to either create a map using metadata or use an algorithm to dynamically locate the data faster or on the fly. The software intercepts read and write requests from applications. Using the map it has created, it can find or save the data to the appropriate physical device. This process is similar to the method used by OSes when retrieving or saving application data.
A redundant array of independent disks or RAID array can sometimes be considered a type of storage virtualization. Multiple physical drives in the array are presented to the user as a single storage device that, in the background, stripes and replicates data to multiple disks to improve I/O performance and protect data in case a single drive fails.
Benefits and uses of storage virtualization
Some of the benefits and uses of storage virtualization include the following:
- Easier management. A single management console — often called a single pane of glass — to monitor and maintain multiple virtualized storage arrays cuts down on the time and effort necessary to manage the physical systems individually. This is particularly beneficial when a large number of storage systems or storage systems from multiple vendors are in the virtualization pool.
- Better storage utilization. Pooling storage capacity across multiple systems makes it easier to allocate and use the available capacity. In contrast, with unconnected, disparate systems, some systems might end up operating at or near capacity, while others are barely used, thus adversely affecting storage capacity utilization and efficiency.
- Lower cost. Virtual storage requires fewer hardware devices and software licenses than traditional enterprise storage architectures. This can save organizations a significant amount of money. Furthermore, virtualization supports dynamic storage provisioning, offering a more scalable and cost-effective way to add storage as the organization’s needs change.
- Less downtime risk. Virtualized environments provide fault tolerance, allowing the migration of data and applications from one server to another with minimal downtime. Also, virtual redundancy reduces the risk of disruption, increases storage flexibility, improves storage performance and reduces failure risk.
- High availability. The use of physical SANs and network-attached storage (NAS) devices in a virtualized manner creates an environment that’s not only easy to deploy and manage, but also highly available and capable of delivering very high uptime.
- Extended life of older storage systems. Virtualization offers a great way to extend the usefulness of older storage gear by including them in the pool as a storage tier to handle archival or less critical data.
- Universal advanced features. Enterprises can implement advanced storage features like tiering, caching and replication at the virtualization level. This helps standardize these practices across all member systems and further simplifies storage management and maintenance.
Disadvantages of storage virtualization
When first introduced more than two decades ago, storage virtualization tended to be difficult to implement. It also had limited applicability. Also, because it was originally host-based, virtualization software had to be installed and maintained on all servers needing access to the pooled storage resources.
Storage virtualization could also create compatibility and interoperability issues. For example, the virtualization environment might not be fully compatible with protocols like Network File System (NFS), or it might not integrate with the automation tools, OSes or hypervisors used by an organization. This could lead to operational disruptions. It could also necessitate additional purchases to facilitate integration, orchestration and interoperability between the virtualization environment and the existing IT infrastructure.
Another potential issue was related to performance. Some virtual environments have high latency and, therefore, cannot meet the performance requirements of certain applications. Admins needed to consider many aspects, including storage controller capabilities and caching mechanisms to minimize the impact of virtualization on performance.
Data security was another concern that hindered the adoption of storage virtualization. If the virtualized environment does not support data encryption or does not provide strong authentication/access controls, it puts the security and integrity of data (at-rest and in-transit) at risk. To protect their data in a virtualized storage environment, organizations need to implement these measures, as well as effective data backup procedures.
Fortunately, many of these drawbacks have already been addressed or minimized. As virtualization technology has matured, organizations are able to implement it for many different use cases. Also, they can choose from multiple virtualization methods and select the method that makes the most operational and financial sense for their existing infrastructure and IT requirements.
Developments in virtualization software have also made it easier to deploy storage virtualization in different environments. Also, the emergence of standards such as the Storage Management Initiative Specification enables virtualization products to work with a wider variety of storage systems. For these reasons, virtualization is an attractive option for enterprises looking to increase storage capacities and simplify storage management, while controlling storage costs.
Types of storage virtualization: Block vs. file
There are two basic methods of virtualizing storage: file-based and block-based.
File-based storage virtualization. File-based storage virtualization is applied to NAS systems. Using Server Message Block (SMB) in Windows server environments or NFS protocols for Linux systems, file-based storage virtualization breaks the dependency in a normal NAS array between the data being accessed and the location of physical memory.
The pooling of NAS resources makes it easier to handle file migrations in the background, which will help improve performance. Typically, NAS systems are not that complex to manage, but storage virtualization further simplifies their management through a single management console.
Block-based storage virtualization. In block-based storage virtualization, the virtualization management software collects the capacity of the available blocks of storage space across all virtualized arrays. It pools them into a shared resource to be assigned to any number of VMs, bare-metal servers or containers.
The storage resources are typically accessed via a Fiber Channel (FC) or Internet Small Computer System Interface (iSCSI) SAN. Block-based systems abstract the logical storage, such as a drive partition, from the actual physical memory blocks in a storage device, such as a hard disk drive (HDD) or solid-state memory device (SSD). Because it operates in a similar fashion to the native drive software, there’s less overhead for read and write processes, so block storage systems perform better than file-based systems.
Notwithstanding the benefits of SANs, managing SANs can be a time-consuming process. Consolidating multiple block storage systems under a single management interface that often shields users from the tedious steps of LUN configuration, for example, can be a significant timesaver. Block-based virtualization is also known as block access storage.
In-band vs. out-of-band virtualization
There are generally two types of virtualization that can be applied to a storage infrastructure:
- In-band virtualization. Also called symmetric virtualization, in-band virtualization handles the data that’s being read or saved and the control information, such as I/O instructions and metadata, in the same channel or layer. A single virtualization device sits between the host systems and storage devices to process data and control the data paths. This setup enables the storage virtualization to provide more advanced operational and management functions, such as data caching, backup and replication services. However, it can also create performance bottlenecks and is not very scalable. For these reasons, it is more suitable for smaller environments where data storage demands are unlikely to substantially increase over time.
- Out-of-band virtualization. This storage virtualization approach, which is also known as asymmetric virtualization, splits the data and control paths. This means that the virtualization facility only sees the control instructions so it needs to handle only management tasks. The data transfers happen directly between the host systems and storage devices, minimizing the potential for bottlenecks. This approach is suitable for large organizations with growing data storage needs and high-performance requirements. That said, the separation between the data and control paths can add complexity to the virtualization environment. Also, advanced storage features are usually unavailable.
Tape media and storage virtualization
Although waning as a backup target media, tape storage is still widely used for archiving infrequently accessed data. Archival data tends to be voluminous; tape media can employ storage virtualization to make it easier to manage large data stores.
Linear tape file system (LTFS) is a form of tape virtualization that makes a tape look like a typical NAS file storage device. It makes it much easier to find and restore data from tape using a file-level directory of the tape’s contents.
Virtualization methods
There are multiple approaches to storage virtualization:
- Host-based storage virtualization. This approach is software-based and most often seen in hyper-converged infrastructure (HCI) systems and cloud storage. In this type of virtualization, the host, or a hyper-converged system made up of multiple hosts, presents virtual drives of varying capacity to the guest machines, whether they are VMs in an enterprise environment, physical servers or PCs accessing file shares or cloud storage. All of the virtualization and management are done at the host level through software, and the physical storage can be almost any device or array. Some server OSes have virtualization capabilities built in, such as Windows Storage Spaces.
- Array-based storage virtualization. This is built around a storage array that acts as the primary storage controller and runs virtualization software, enabling it to pool the storage resources of other arrays and to present different types of physical storage for use as storage tiers. A storage tier may comprise SSDs or HDDs on the various virtualized storage arrays; the physical location and specific array are hidden from the servers or users accessing the storage.
- Network-based storage virtualization. This is the most common form of storage virtualization that enterprises use. A network device, such as a smart switch or purpose-built server, connects to all storage devices in a Fibre Channel (FC) or iSCSI SAN and presents the storage in the network as a single, virtual pool. The FC switch virtualizes and redirects I/O requests to physical storage so the server consuming the storage doesn’t need to know the underlying storage architecture.
In addition to the above, storage can also be applied to a virtual environment via OS-level or file-system virtualization. With the former, the OS includes features that allow for the creation of tiered storage. The latter refers to using technologies that provide users with a consolidated view of file data even though those files might be scattered on many different file servers. Users might also be able to access the files remotely due to the file replication capability provided by the file-system virtualization technology.

History of storage virtualization
In the late 1960s and early 1970s, IBM developed the concept of virtualization in the context of time-sharing for mainframe computers — the idea that multiple users could share the usage of expensive mainframe devices without having to purchase or lease them. This approach helped to reduce the cost of providing computing capabilities, and allowed more users and organizations to use those capabilities in a cost-effective manner. Similar potential benefits drove the development of storage virtualization technology and solutions.
IBM SAN Volume Controller was an early version of a block-based virtualization appliance. Now called the IBM Spectrum Virtualize, the appliance supports large-scale workloads and enables hybrid cloud storage deployments for 500-plus supported storage systems. The Spectrum Virtualize software provides insulation from physical storage and can be used in the appliance along with other server virtualization and containerization technologies.
Another early storage virtualization product was Hitachi Data Systems’ TagmaStore Universal Storage Platform. That product evolved into Hitachi Vantara’s Virtual Storage Platform One (VSP One) which offers virtualization and aggregation so organizations can create large-scale storage pools and then logically partition them to optimize application quality of service. The platform also reduces storage-management complexity and offers high configuration flexibility.
In the late 1990s, VMware released the VMware Workstation, a virtualization product that included a hypervisor to help IT admins set up VMs on a single machine running either Linux or Windows (x86) OSes. The hypervisor enables organizations to simultaneously run multiple applications on a single piece of hardware, thus simplifying hardware management and also reducing costs. Advanced hypervisors include features like fault tolerance and high availability to reduce the likelihood of downtime events and minimize the impact of these events on business continuity and productivity.
From the 2000s onwards, many more companies entered the virtualization space, including Microsoft, Red Hat and Citrix Systems. Today, many enterprise data centers use the virtualization techniques and solutions developed by these organizations to create large aggregated pools of storage and other resources and offer those resources to the organization as agile and scalable VMs.
Storage virtualization today usually refers to capacity that is accumulated from multiple physical devices and then made available to be reallocated in a virtualized environment. Modern IT methodologies, such as hyperconverged infrastructure and containerization, take advantage of virtual storage, in addition to virtual compute power and often virtual network capacity.
Edge computing also relies on storage virtualization. Virtualization allows organizations to meet their storage requirements and simplify storage management and maintenance in edge computing environments. Also, virtualized storage environments are more compact than physical environments and require less hardware and management resources. All of this can deliver big cost efficiencies and also benefit organizations with limited space and smaller IT teams.
Although storage virtualization is by no means extinct, it is largely overshadowed by cloud computing. In this new computing paradigm, organizations determine the amount and type of storage they need. The cloud service provider (CSP) then configures and provisions this storage from their virtualized storage pools and makes it available to the organization on-demand. With cloud-based virtualized storage, organizations can access the storage resources they need without having to worry about various storage management tasks. Furthermore, since the CSP provides the resources on a “pay as you go” basis, the business can control its costs, and, in many cases, achieve faster time to value.
Virtualization refers to full-scale virtualization; paravirtualization is a different approach involving partial virtualization. Learn the differences between virtualization and paravirtualization, and explore their advantages and disadvantages. Also, read more about the history and development of virtualization technology.