Cloud

Vast targets AI checkpointing write performance with distributed RAID

Door

19 maart 2024

312

Vast Data will boost write performance in its storage by 50% in an operating system upgrade in April, followed by a 100% boost expected later in 2024 in a further OS upgrade. Both moves are aimed at checkpointing operations in artificial intelligence (AI) workloads.

That roadmap pointer comes after Vast recently announced it would support Nvidia Bluefield-3 data processing units (DPUs) to create an AI architecture. Handily, it also struck a deal with Super Micro, whose servers are often used to build out graphics processing unit (GPU)-equipped AI compute clusters.

Vast’s core offer is based on bulk, relatively cheap and rapidly accessible QLC flash with fast cache to smooth reads and writes. It is file storage, mostly suited to unstructured or semi-structured data, and Vast envisages it as large pools of datacentre storage, an alternative to the cloud.

Last year, Vast – which is HPE’s file storage partner – announced the Vast Data Platform that aims to provide customers with a distributed net of AI and machine learning-focused storage.

To date, Vast’s storage operating system has been heavily biased towards read performance. That’s not unusual, however, as most workloads it targets major on reads rather than writes.

Vast therefore focused on that side of the input/output equation in its R&D, said John Mao, global head of business development. “For nearly all our customers, all they have needed are reads rather than writes,” he said. “So, we pushed the envelope on reads.”

To date, writes have been handled by a simple RAID 1 mirroring. As soon as data landed in the storage, it was mirrored to duplicate media. “It was an easy win for something not many people needed,” said Mao.

The release of version 5.1 of Vast OS in April will see a 50% improvement in write performance, with 100% later in the year with the release of v5.2.

The first of these – dubbed SCM RAID – comes from a change that sees writes distributed across multiple media, said Mao, with data RAIDed (in a 6+2 configuration) as soon as it hits the write buffer. “To boost performance here, we have upgraded to distributed RAID,” said Mao. “So, instead of the entirety of a write going to one storage target, it is now split between multiple QLC drives in parallel, cutting down on time taken per write.”

Later in the year, version 5.2 will detect more sustained bursts of write activity – such as checkpoint writes – and automatically offload those writes to QLC flash, in a set of functionality known as Spillover. “The one case where it will be very useful is in [write operations in] checkpointing in AI workloads,” he said. “You can have, for example, clusters of tens of thousands of GPUs. It can get very complex. You don’t want that many GPUs running and something goes wrong.”

Checkpointing in AI periodically saves model states during AI training. It allows the model to be rolled back should a disruption occur during processing.

Vast recently announced it will support Nvidia Bluefield-3 DPUs in a move that will position itself as storage for large-scale AI workloads.

Bluefield-3 is a smart NIC with ARM 16-core processors that allows customers to offload security, networking and data services. Usually on GPU-equipped servers.

Vast also announced a partnership with Super Micro in which Vast Data software is ported to commodity servers. “We’re talking x86 systems that build out to PB of storage,” said Mao. “Reading what’s between the lines, Super Micro sells a lot of Nvidia GPU-equipped servers that will have Bloomfield on board, so it’s a good fit for Vast.”

Source is ComputerWeekly.com

Vast targets AI checkpointing write performance with distributed RAID

Recente posts

Global file systems: A single view of on-premise and cloud data

Closing in on quantum computing with error mitigation

IT Sustainability Think Tank: The importance of measurement in closing the sustainability gap

BIT-nieuws – Dringend advies dragen van mond- en neusmasker binnen de datacenters van BIT

Pro AV Market Analysis 2020: By keyplayers Anixter International Inc., AVI Systems Inc., AVI-SPL...

Meest bekeken posts

Netwerk & Security Specialist

ChatGPT: Everything you need to know

Interview: David Walmsley, chief digital and technology officer, Pandora

Nasa and IBM apply artificial intelligence to tackle solar digital disruption

HPE taps into AI market demand with Nvidia Blackwell-powered servers

POPULAIRE BERICHTEN

BIT-blogs – Security monitoring bij BIT

Booming Segments of AI Conversational Platform Market 2020-2028 with AWS, Google,...

Okta picks up Auth0 for $6.5bn

POPULAIRE CATEGORIE