Unstructured data is huge – in all senses. There is lots of it, and file or object sizes can be large.
Go back just a decade and the predominant method of storage for unstructured data would have been NAS, or rather parallel or scale-out NAS that allowed for very large numbers of files to be retained on a grid-like network of arrays.
But as numbers and size of files grew, NAS began to creak a little. There are inherent problems of scale as numbers of files get into the millions and billions.
And so, object storage – which lacks the tree-like hierarchical file system – began to emerge into the mainstream. That was also driven by the emergence of the cloud with its need to be able to address objects directly rather than via file paths.
But there is no smooth path for object to supplant file-access storage. Many, if not most, enterprise applications are built for Posix-compliant file storage and are not easily refactored. But at the same time, many newer applications, especially those that can port to the cloud, are built for object storage.
Therefore, many organisations have a need for file and object storage.
This article will look at how some storage providers are delivering file and object storage in the same system.
We asked some key storage providers about their stance on unified file and object storage. NetApp, Pure Storage and Scality responded, and revealed quite different approaches to how they provide unified file and object storage.
Overview: Pure, Scality and NetApp file and object
Pure Storage offers file and object storage together in its FlashBlade product line. It’s a hardware appliance approach – but with as-a-service purchasing options – in which file and object protocols can be enabled by the customer, with controller hardware handling each.
In FlashBlade, file and object work side-by-side and it seems very much an approach that beats any issues or overheads by throwing hardware performance at it. Its storage back-end comprises Pure’s proprietary flash modules, made up of QLC flash but with some wizardry to allow SLC-like performance on part of the drive for metadata storage.
Scality’s RING is a software-defined approach to file and object that deploys onto commodity hardware. RING is based around an object store with S3 access, but also allows for a Posix layer to provide access to file storage (NFS, SMB) integrated directly into the object store.
The mechanism for this is that the Posix metadata layer is in a database whose tables are stored in the distributed object store. Scality says that means that the file system shares all the same distributed structure (in terms of metadata, etc) as the underlying object store.
NetApp’s Ontap OS and file system enabled S3 object storage access in 2020 and it is available in hardware, software-defined and cloud products alongside file and block protocols. Unlike Pure and Scality’s solutions, however, S3 access in Ontap appears intended as an ingest and/or pre-processing point, potentially for edge-type use cases, with customer requirements for object stores in excess of 300TB being directed to NetApp’s StorageGRID enterprise object storage.
Different workloads aimed at
Pure Storage’s converged file and object product range – FlashBlade – is aimed at pretty demanding workloads in terms of volume but also performance, so the word “fast” figures heavily in its branding. With FlashBlade, they have in mind what would have been secondary use cases in the past, such as artificial intelligence (AI)/machine learning (ML)/analytics/high-performance computing (HPC), image-heavy workloads such as healthcare imaging and engineering, and even backup data that may need rapid restore.
“Organisations are dealing with rapidly growing amounts of unstructured data generated by modern applications,” says Amy Fowler, VP FlashBlade at Pure Storage. “We believe the market is looking for consolidation of diverse workloads on a unified fast file and object storage platform to deliver unmatched performance and the simplicity to support the demanding needs of unstructured data workloads.”
Scality makes more of the ability to handle legacy Posix-compliant workloads and modern cloud-native applications, so bringing file access together with the likes of RESTful protocols such as S3.
“The key advantage of file and object storage in the same system is that it provides customers with a single system to manage data from legacy applications and modern cloud-native applications,” says Paul Speciale, CMO at Scality. “From a business point of view, a combined file/object storage solution helps companies as they transform and modernise, because it provides a smooth pathway storage solution from legacy to modern applications.”
NetApp makes the point that its converged NAS (and SAN) and object storage capability is more an entry point than its main enterprise object storage offering, StorageGRID.
“The advantages of having file and object storage in the same system include the simplicity of managing one system and standardised features around data protection, management and security,” says Grant Caley, chief technologist at NetApp UK & Ireland. “If your existing NAS/SAN can offer object and the object requirement is small, then the cost of entry is reduced.
“NetApp’s StorageGRID might be the answer for those wanting a full-featured object global namespace that is scalable to hundreds of objects with dynamic policy management.”
File and object: How unified?
As mentioned earlier, each supplier has a different way of architecting file and object access in its products, and this can determine the overall character of suitable deployments and workloads.
Pure, as we saw, goes big on high-performance hardware with access to file and object storage from the same FlashBlade arrays. In terms of how these relate to each other, it appears the two sides exist in parallel.
“FlashBlade offers consolidated storage with a natively built unified fast file and object platform with no bolt-on architecture and provides flash performance equally for all unstructured data,” says Fowler. “FlashBlade serves files from a file system and objects from an object store natively independent of each other, without impacting any of these workloads for any latency.”
In Scality’s RING architecture, file and object appear to co-exist in a more interleaved fashion, with the file system element relying on use of the object store and so expanding with it in a distributed fashion.
Having said that, RING still appears to access files as files and objects as objects.
“The file namespace cannot be accessed via the S3 protocol, but the S3 namespace has a lightweight NFS connector which can be used to access S3 data” says Speciale. “Its targeted use is to allow for migrations from file-based systems over to S3, so essentially giving an NFS access method into data that now lives in S3 while the source application is migrated.”
NetApp’s limited object support in terms of scale is quite clear. Files and objects are only ever accessible as such.
“ONTAP provides S3 access only to objects and file access only to files,” says Caley. “Both files and objects are stored on our multi-PB scalable FlexGroups. When an object is written to ONTAP using the S3 protocol, we store that as an object and you can only use the S3 protocol to retrieve that object. When a file is written to ONTAP using NFS or SMB protocols, we store that as a file, and you can only retrieve that file using NFS or SMB.”
Workload examples
Pure says its target workload is “consolidation of diverse workloads on products that combine file and object protocols in the same system [to provide] the ability to simultaneously support multiple use cases”.
Here it has in mind AI/ML/analytics data pipelines, DevOps and containers, imaging such as medical PACS and VNA (vendor-neutral archives), financial simulations, genomics sequencing, seismic interpretation, log analytics and rapid restore, from ransomware for example.
Fowler adds: “Many HPC manufacturing use cases have Windows-based applications and keep CAD and CNC drawings and image analytics over file protocols. After the analytics is done, the data can be moved to object storage on the same system or to the cloud.”
Meanwhile, Scality says about two-thirds of its customers have combined file and object workloads in place. These include media and entertainment customers that store and access media over file (SMB or NFS) interfaces from content creation and editing tools, but where the same media is streamed for content distribution using AWS S3 API RESTful interfaces.
“In hospitals, we have customers storing medical images from commercial PACS applications over SMB, but using modern backup applications such as Veeam over object interfaces,” says Speciale.
Scality emphasises the huge scale of some customer deployments, citing SMB shares of up to 20PB and 150PB under object storage, with 220 billion objects stored elsewhere.