The IT industry is always striving to overcome bottlenecks, and one of the biggest is input/output (I/O). Whether it is memory bandwidth, network bandwidth or how quickly a high-resolution screen can be refreshed, the speed at which data can be copied by the CPU (central processing unit or microprocessor) to an external device limits overall performance.
New generations of dynamic RAM (random access memory) improve the I/O between the CPU and the computer’s main memory. GPUs (graphics processing units) take over graphics processing, reducing the I/O needed for rendering graphics while also boosting performance significantly, especially in computer games.
But the GPU’s immense power has also led to new application areas where highly parallelised computations are required. A GPU will accelerate machine learning and inference engines for artificial intelligence (AI)-powered decision-making.
Is there a case for in-storage data processing?
Software runs on data and data is often regarded as the “new oil”. So it makes sense to put data as close as possible to where it is being processed, in order to reduce latency for performance-hungry processing tasks. Some architectures call for big chunks of memory-like storage located near the compute function, while, conversely, in some cases it makes more sense to move the compute nearer to the bulk storage.
The growth of data has led to some in the industry asking whether storage devices can be used in a way analogous to the GPU, to accelerate data-processing tasks. This is the realm of computational storage, a term used to describe a combination of software and hardware to offload and alleviate constraints on existing compute, memory and storage, in a bid to improve application performance and/or infrastructure efficiency.
Earlier this year, Antonio Barbalace, a senior lecturer at the University of Edinburgh’s Institute for Computing Systems Architecture, published a paper, co-written with Microsoft Research, Computational storage: where are we today?, looking at the current state of computational storage.
“Can we do something with storage?” he says, pointing out that organisations are investing large quantities of data, which then needs to be processed. “For example, databases are extremely large,” he adds. “They copy data from storage devices to process in RAM. It takes a lot of time to move a database into memory.”
There is, therefore, a valid case to run database querying on the storage device, to avoid the I/O bottleneck when data is copied back and forth from the storage device to the computer’s RAM.
Some tasks are already being run on the storage controllers used to manage physical devices such as disk arrays, says Matt Armstrong-Barnes, CTO at HPE. “Deduplication, compression and decompression are already handled by storage arrays,” he says. Such uses are not classed as computational storage, but they illustrate how storage controllers are getting smarter.
Hardware acceleration
But for Barbalace, computational storage has higher aspirations. He says a computational storage device could run simple operations on the data to reduce the amount of data that needs to be sent to the CPU. Data processing at the edge, such as on an internet of things (IoT) device, is one of the possible application areas, where sensor data is streamed directly to a storage device. The CPU on the edge device would then be alerted as and when there is an anomaly or at a regular time interval, to upload the sensor data to the cloud.
Some manufacturers have developed smart SSD devices based on application specific integrated circuits (ASICs) to accelerate fixed functions, such as a video transcoding algorithms, that run directly on the devices.
Another option is the use of field programmable gate arrays (FPGAs) for accelerating fixed functions. Xilinx has developed an FPGA-based platform, which is used in Samsung’s SmartSSD computational storage device.
The company reported a 20% increase in its datacentre business for the fourth quarter of 2021 and storage has been one of the growth areas. Xilinx’s fourth-quarter 2021 filing shows that annual revenue growth is being driven by adoption among its hyperscale customers across compute, networking and storage workloads.
“Xilinx maintains strong engagements with hyperscalers to deliver solutions for AI compute, video acceleration, composable networking and computational storage,” the company said in its financial statement for Q4 2021.
One of its partners, Lewis Rhodes Labs, offers what it describes as a cyber forensics search in storage appliance. This is a regular expression search engine appliance, which the company says has been optimised for anomaly detection. According to Lewis Rhodes Labs, the FPGA-accelerated appliance, equipped with 24 SmartSSDs, can search 96Tbytes of storage at a rate of 60Gbps, delivering results in less than 25 minutes.
NGD Systems is another company that is often mentioned in conversations about computational storage. It offers a smart SSD based on the ARM processor, which means its products can use the Linux operating system on which more general-purpose algorithms can then be run.
In February 2020, NGD Systems announced a $20m Series C funding round to support and accelerate the production and deployment of what it claims is the world’s first NVMe (non-volatile memory express) computational storage drive. Applications areas include providing a way to run AI and machine learning within the device where the data resides.
Booking.com has been using this technology in its own datacentres. Power and write latency are key datacentre metrics in the travel website’s datacentres.
Peter Buschman, product owner, storage at Booking.com, says: “We found the NGD Systems drives to be best in class with respect to this combination of characteristics. The latency, in particular, was consistently low for a device with such a small power draw. With power, not space, being our greatest constraint, and environmental impact a growing concern, this technology holds great promise for use in next-generation datacentre environments.”
Computational storage is not only limited to adding smart functionality directly to an SSD. Just as graphics cards equipped with GPUs are used to accelerate applications optimised for parallel computing, a computational storage expansion card could be plugged into a PC motherboard to accelerate certain data-processing functions.
Programming computational storage
In the paper he co-authored with Microsoft Research, Barbalace looked at how applications can be adapted to take advantage of computational storage. He says there are many algorithms that can be classified as dataflows. One example is AWS Lamda, which is used to process data streams. “An application can break down data to flow to multiple parts,” he says. “One of these could be assigned to computational storage.”
For instance, an AI workload can be split so that some parts run directly on computational storage, while other parts use the CPU. Highly distributed high-performance computing workloads, such as weather forecasting, may also be able to take advantage of computational storage. “The question is whether data can be processed more efficiently on a computational storage device,” says Barbalace.
This applies both to on-premise and cloud-hosted data. A recent example from Amazon Web Services (AWS) illustrates how data processing can be moved closer to where it is stored to gain efficiency. Although not strictly computational storage, in a blog posted in March 2020, AWS architects David Green and Mustafa Rahimi discussed how a feature of S3 cloud storage called S3 Select could be used to execute SQL queries directly on data stored in the Amazon cloud.
They wrote: “Customers could upload data directly to S3 using AWS SFTP [secure shell file transfer protocol] and then query the data using S3 Select. This work can be automatically triggered by an AWS Lambda execution after a new CSV [comma separated value] object is uploaded to S3 with S3 Event Notifications. Searching through your data using S3 Select can potentially save you time and money spent on combing through data in other ways.”
The paper from Barbalace and Microsoft also covers two other options for programming computational storage. Shared memory is a technique often used in multiprocessor hardware to enable different CPUs to work on the same set of data. This technique can also be applied to computational storage, if system software is modified accordingly.
Client/server computing is the third category of computational storage that Barbalace identifies in his research. A paper from NGD Systems and researchers from the University of California and the University of Tehran, published in the Journal of Big Data in 2019, discussed how computational storage could build on the highly distributed approach to data storage and processing that Hadoop MapReduce uses with its DataNodes, which are used to store and process data.
“Hadoop-enabled computational storage devices can play both roles of fast storage units for conventional Hadoop DataNodes and in-storage processing-enabled DataNodes simultaneously, resulting in augmentation of processing horsepower,” the report’s authors wrote.
Challenges and future direction
It is still early days for computational storage. CCS Insight principal analyst Bola Rotibi believes that one of the challenges is how storage managers evolve into programmers. “Storage people do not do a lot of programming,” she says.
HPE’s Armstrong-Barnes is not convinced that smart SSDs and computational storage will achieve the same success as GPU in mainstream computing. “Oil doesn’t mix very well and this is the challenge when adding data science workloads from different places,” he says.
For Barbalace, one area that still remains unsolved is multi-tenancy, as and when computational storage is provided on-demand by public cloud providers. Because data is stored in the public cloud across multiple storage pools, computational storage may need to run on a specific subset of data that may be split across different physical servers.
Despite these challenges, the reason people are thinking about computational storage is the exponential growth in data volumes. “Today, data is stored in certain ways purely because of the way CPU architectures have evolved,” says Adrian Fern, founder and CTO at Prizsm Technologies. “But it is not fit for purpose when it comes to accessing the volumes of data available now and the exponential growth we will experience as we approach the quantum age.”
So while it is still early days for computational storage, mainstream quantum computing is also at an early stage of development. However, as these two areas of computing evolve, computational storage may be necessary to keep up with the processing appetite of a quantum computer.
 



