Use an Nvidia graphics card as a RAID controller – that’s the idea of SupremeRAID from US startup GRAID. For less than $4,000, it can control 32 NVMe SSDs and achieve read throughput of 110GBps and IOPS of 19 million.
By comparison, Broadcom’s LSI MegaRAID card – the closest competitor – only supports four NVMe SSDs with reads at a maximum of 13.5GBps throughput and less than 200,000 IOPS on reads.
The difference between the two cards? The Broadcom one uses a bespoke ASIC from the manufacturer.
“We started by trying to develop a RAID card from a specialised chip from the start, in our case an FPGA,” said Leander Yu, CEO of GRAID during a recent IT Press Tour presentation to ComputerWeekly.com sister publication LeMagIT. “But we were quickly disillusioned. To get the 19 million IOPS we wanted, our FPGAs would have cost $30,000 each, but that would make no sense.
“But RAID is just sharing access between parallel flows and GPUs are made to parallelise a flow of pixels, so we had the idea of taking a simple graphics card that’s commercially available for a few hundred dollars.”
So, the SupremeRAID SR-1010 is really just an Nvidia RTX A2000, an entry-level graphics card for workstations that is available for less than €600.
“Honestly, we use less than 50% of its performance,” said Yu. “But then we do use a professional-level card. Those aimed at the general public don’t have the memory for error correction and are less reliable. We can’t tolerate the loss of a single byte in storage use cases.”
SupremeRAID SR-1010 is the latest incarnation and runs on PCIe 4.0. Previously, there was an SR-1000 version on PCIe 3.0 that achieved 16 million read IOPS and sold for $2,500. The SR-1010 has a price of $3,995.
Yu said he thought the Broadcom solution was developed during the era of spinning-disk HDDs. That, he said, makes them obsolete for NVMe SSDs, which don’t communicate with SATA or SAS ports, but instead uses PCIe channels.
12x faster on writes
GRAID’s technical knowhow provides the software for RAID 0, 1, 5, 6, 10 and erasure coding on GPUs.
“We sell the complete solution, with the card,” said Tom Paquette, senior VP and GM at GRAID. “But we’re also in discussions with server makers and cloud providers who buy large quantities of graphics cards and we’re open to potentially just selling the software.”
In practice, the card only works as a gateway during writes, because it is at that phase that it takes the flow and creates fragments that will be written in parallel at an optimal maximum speed. In this configuration, throughput on writes from the server point of view is 12GBps, ie the maximum speed of the PCIe 4.0 16 channel bus by which the SupremeRAID is connected.
During output, the card communicates with each SSD at an individual throughput of 7GBps, which is achieved on a PCIe 4.0 bus when it uses the maximum four channels. But if you count the global throughput to the SSDs, that goes up to 22GBps because of the duplicates created by the RAID engine.
GRAID compares its product to “a competitive hardware RAID controller” without citing Broadcom directly, which can write a total of 4GBps to the SSDs
No bottleneck during reads
It is during reads that the product is particularly efficient.
“The key point of our solution is that it doesn’t communicate to the server the data it must load,” said Paquette. “It tells it which SSDs to load them on and the server reads them directly from several SSDs in parallel.”
That is how 110GBps can be achieved on a server with two Xeon 6338 2GHz CPUs with 32 cores each, he said.
By contrast, the Broadcom product continues to operate as a gateway while the data is read from the SSD.
Paquette added: “We are not limited to internal SSDs. We can communicate with an NVMe-over-fabrics controller installed in another PCIe slot that drives SSD in an external shelf. In other words, we can, in extreme cases, provide RAID to reads and writes for all data on internal and external SSDs. We are the only vendor to offer RAID compatible with NVMe-over-fabrics.”
Paquette said the 32 SSD limit is not due to the capacity of the Nvidia card, but to the GRAID software. “These numbers will improve gradually with our updates and the capacities of hardware that supports the SSDs,” he added. “But it is already planned that our software can be used with CXL composable architecture.”
GPU speed applied to RAID rebuilds
Disk rebuilds and the time they takes are a big deal in RAID. They are carried out when a drive fails and via referencing duplicate data across remaining disks. Normally, that requires plenty of processing power, and it is here that the GPU on the graphics card can show its potential.
“It’s very simple – we can rebuild a RAID system in two hours that would normally take three weeks,” said Paquette, without specifying a capacity for such a rebuild.
According to more precise measures, GRAID SupremeRAID card allows for an access rate of 5.5 million read IOPS and 1.1 write IOPS during rebuilds. The Broadcom MegaRAID, meanwhile, achieves 36,000 and 18,000 IOPS, respectively.
For machine learning and continuous video recording
Among the use cases cited for SupremeRAID, GRAID highlights machine learning, where there is a constant need to put large volumes of data in memory. According to the vendor, an algorithm that usually takes 12 hours to run with data stored in NAS and via NFS will take only two hours using NVMe-over-ROCE connectivity and a storage server supplied with a controller from GRAID.
Another example is video recording, with the example given of car racing and required throughput of 10GBps between the camera and the storage and which can benefit from RAID 5 or 6 protection.
“Organisations buy increasing amounts of NVMe SSD for their performance, but find themselves in the paradoxical situation where this performance has an impact on their applications,” said Paquette. That is because CPUs have devote cycles to managing RAID on these SSDs.
Coming soon: PCIe 5.0 and erasure coding
Right now, SupremeRAID software is moving to version 1.3 and to support more Linux distributions. It will also become possible to install two cards in a server for redundancy purposes (but not to double performance or the number of drives).
In 2023, the software will get an admin GUI in Windows, erasure coding that will allow RAID between several cards, and support for the next-gen PCIe 5.0. Roadmap items will also include VMware and Kubernetes compatibility, as well as compression, encryption and on-the-fly deduplication.