The goal of a computational storage architecture is either to reduce the need to move large datasets around, or else to alleviate constraints on existing compute or storage resources, such as in an edge deployment, for example.
One factor driving the development of computational storage is data – or, to be more precise, the growing volumes of data that organisations increasingly have to contend with. Organisations are turning to data science, data analytics and machine learning to glean insights from all this data, but these are very data-intensive and tend to be bound by input/output (I/O) speeds or are latency-sensitive. It makes more sense, therefore, to process the data as close as possible to where it is stored, rather than shuffling gigabytes or terabytes into memory and back again.
The companies that are developing computational storage products have taken differing architectural approaches, from integrating processors into drives to accelerators that plug into a PCIe slot and access existing data stores via NVMe.
To avoid a balkanisation of the nascent computational storage ecosystem into mutually incompatible product lines, the Storage Networking Industry Association (SNIA) formed a Computational Storage Technical Work Group (TWG). The group is working to define standards and develop a common programming model that will allow applications to discover and use any computational storage resources that may be attached to a computer system.
SNIA has split the definition of computational storage devices into computational storage processors (CSPs), computational storage drives (CSDs) and computational storage arrays (CSAs). A CSP contains a compute engine, but does not actually contain any storage itself. A CSD (typically a solid-state drive/SSD) contains both compute and storage. A CSA contains one or more compute engines and storage devices.
The SNIA model includes a list of computational storage functions that might be performed by computational storage devices, such as compression and decompression. Some computational storage products have been designed to carry out specific functions, such as video encoding or decoding, while others have been designed to be user-programmable.
Notable providers
NGD Systems is one of the more prominent computational storage suppliers. Its products are CSDs under the SNIA definition, integrating compute processing into an NVMe SSD. This is achieved by the use of a custom application-specific integrated circuit (ASIC) that incorporates both the SSD controller functions and a quad-core Arm Cortex-A53 CPU block.
There are several advantages of this architecture. The ASIC has direct access to the Nand flash chips in the drive via common flash interface (CFI) channels, and these provide high-bandwidth and low-latency access to the data, compared with transferring data into memory for the host CPU to process it.
Thanks to the embedded Arm cores, NGD’s devices can run a version of Ubuntu Linux, which simplifies the development and deployment of applications, or Microsoft’s Azure IoT Edge. The drive itself can also be accessed as simply a standard SSD.
This type of architecture is well suited to edge deployments, where there may only be enough space or sufficient power for a single edge server, but with demanding requirements to analyse data in real time, such as a video feed from a security camera. NGD has a Solution Brief on its website that describes how a MongoDB database can be sharded across multiple CSD SSDs inside a single server instead of across multiple server nodes, reducing the datacentre footprint and the overall cost while delivering lower latency when replicating data.
NGD also cites as use cases automotive artificial intelligence (AI), content delivery networks and hyperscale datacentres, and offers a fully integrated In-Situ Processing Development System (ISDP) that enables developers and integrators to build and deploy applications.
Samsung has a similar CSD product, but its SmartSSD integrates a Xilinx field-programmable gate array (FPGA) and Samsung NVMe SSD controller inside a standard 2.5in (U.2) form factor SSD with a capacity of up to 4TB. The resulting product is marketed by Xilinx.
Xilinx provides a development platform, Vitis, which allows development in C, C++, or OpenCL. It also enables organisations to build accelerated applications via a set of open source libraries optimised for the Xilinx FPGA in the SmartSSD. There are Vitis libraries for accelerating AI inferencing, data analytics, quantitative finance, and others. Xilinx claims that using Bigstream’s hyper-acceleration layer, SmartSSD can make Apache Spark analytics 10 times faster.
Meanwhile, the NoLoad products from Eideticom are CSPs, in that they contain an accelerator engine but no storage. Instead, they connect with storage and the host CPU via NVMe, which allows compute and storage to be scaled independently. In fact, with support for NVMe-oF, the data could equally be held in external storage arrays.
The NoLoad devices use an FPGA as the accelerator, and are available as a PCIe card, a U.2 form factor like a drive enclosure, or EDSFF format, based on Intel’s Ruler SSD format. NoLoad can support a range of functions, such as compression, encryption, erasure coding, deduplication, data analytics and machine learning (ML).
NoLoad devices have already been deployed at the Los Alamos National Laboratory (LANL) as part of a next-generation storage system for high-performance computing (HPC). This has seen NoLoad devices used to offload key storage tasks in a Lustre/ZFS file system, leading to improved performance and reduced costs for the storage system.
Also targeting storage is Pliops, which uses a PCIe card with an FPGA to accelerate key-value operations that are used in applications such as databases. The Pliops Storage Processor (PSP) implements an optimised data structure for database-related storage operations, such as indexing, searching or sorting, and accelerates them without requiring any software changes to the application. It does this by replacing the underlying key-value storage engine, such as InnoDB, the default option for MySQL, with its hardware accelerator. Pliops claims that this implementation can deliver 10 times the number of queries per second, while making more efficient use of SSD storage space, delivering immediate business value.
GPUs can do computational storage too
Perhaps the most extreme computational storage accelerator example is Nyriad. The firm has developed a software-defined storage platform called Nsulate that uses an Nvidia GPU to accelerate erasure coding functions. It is intended as an alternative to RAID for high-performance scale-out storage deployments requiring a high level of reliability.
In fact, it is claimed to be able to cope with dozens of simultaneous device failures in real time, with no performance degradation, as Nsulate can rebuild any missing data faster than the data can be fetched from storage. This means that replacing a failed drive does not need to be a high priority for the IT team. Nyriad claims that the GPU can simultaneously be used for other workloads such as machine learning.
Nsulate is currently available as part of pre-built systems by partners such as Boston Limited, which offers a Supermicro-based Nsulate storage server.
Computational storage is still at an early stage of development, although some suppliers have been offering deployable products for several years. Organisations evaluating it for their datacentre therefore need to use caution, but there are already benefits to be had from using computational storage products in certain applications. They can lead to lower overall power consumption and the need for fewer CPU cores per server node, for example, as well as delivering a significant boost in performance in many cases.