Weka has taken its parallel file system multicloud, with version 4 extended cloud working from AWS to Microsoft Azure, Google Cloud Platform and Oracle Cloud Infrastructure. Weka works across datacentre and public cloud to provide file access storage, and is often targeted at artificial intelligence/machine learning and analytics workloads.
“The main advantage of being able to move from one cloud to another is to be able to store your data with the provider that is most advantageous for you,” said Nilesh Patel, chief product officer at Weka, in an interview with ComputerWeekly.com’s French sister publication LeMagIT.
“Without Weka, you can suffer from ‘seam effects’ that can range from the difficulty of converting data between clouds to raised costs resulting from the extraction of data from a cloud service,” added Patel.
So, Weka is able to present the same NAS to users and applications while the files could be on cheap object storage or high-performance block storage.
Weka’s system – initially called Matrix, now known as Data Platform, but usually referred to by most as just Weka – can recognise media and tier data between storage, including very fast NVMe SSD, cheaper QLC flash and connected via 100Gbps networking.
Weka’s key strength lies in having multiple methods of connecting to its storage to optimise reads and writes for files. It interfaces with Nvidia’s GPUDirect on GPU-equipped processing clusters, and with Kubernetes containers clusters via a CSI driver, for example. For “classic” storage access methods, Weka can share via NFS (up to v4.1), in SMB but with the SMB-W variant that accelerates access for small files, and via S3.
“In version 4 of Weka, we have added a new data reduction process that allows rapid movement of information from one medium to another and can drastically reduce consumed storage capacity,” said Patel. “That’s with files, their metadata and even snapshots that correspond to point-in-time images of drives. As an example, it is possible to extract archived data from S3 Glacier in barely milliseconds.”
S3 Glacier is the cheapest of all AWS storage services, but also the one with the longest access times. So, to access data from it in milliseconds requires a trick. Patel said Weka recovers archived data from fragments that are not all on S3 Glacier. In fact, while the user sees one set of directories, Weka makes use of others in the background to organise data as optimally as possible.
WekaIO developed a file system that allows rapid access, via flash storage in particular, to very large sets of unstructured data.
It claims to have overcome the limits of network file system (NFS) – developed in the 1980s – and that its surpasses the performance of rivals such as NetApp and Dell EMC’s Isilon scale-out NAS.
Weka execs have pointed out that NFS was standardised in 1984 and claim it to be “very chatty” and working in “a very serialised fashion” and that therefore it doesn’t scale well.
Weka says it has parallelised access to directories and metadata by breaking things down into lots of smaller chunks to make it faster than a local file system.
WekaIO targets workloads that need access to large amounts of unstructured data, including artificial intelligence/machine learning, financial analytics, life sciences and engineering design. It aims at customers that are currently using scale-out NAS with file systems such as IBM’s GPFS/Spectrum Scale and the open source Lustre.
There was no talk of product costs during the LeMagIT interview. Customers pay for Weka in a subscription format.
Weka is used by Hitachi Vantara in its HNAS platform gateways, where data can be shared from its VSP disk arrays as well as extending capacity to the public cloud.