65TB per hour, from anywhere on earth to anywhere else, no matter the quality of the connection between them. That’s the transfer speed achieved by the shared storage solution from Arcitecta, which operates across cloud storage and on-site locations.
Like Hammerspace, Nasuni, Panzura and Ctera, Arcitecta’s Mediaflux solution enables file shares across multiple sites internationally, allowing the workforce to see files on local network-attached storage (NAS) and public cloud created by colleagues elsewhere.
“Unlike the others, we are not a storage company,” Graham Beasley, operations director at Arcitecta, told Computer Weekly’s French sister site LeMagIT during a recent IT Press Tour event.
“Our expertise is in databases. We sell our database system to enterprises with arrays from NetApp, IBM Spectrum Scale, Dell Isilon or others to allow them to manage documents in file or object format across their sites. We make data available anywhere, whenever it is needed, via the Livewire module on Mediaflux,” he added.
“It has often been said that the transfer speeds we obtain are theoretically impossible,” said Jason Lohrey, CEO and founder of Arcitecta. “Our secret is that it’s not just a question of speed. We resolve the connection problem. And to do that is a question of managing data. You have to move the right data at the right time.”
The speed solution: An XODB database
“We work with millions of file formats,” said Lohrey. “That allows us to index trillions of data fragments in our metadata database. One save in our database represents about 1kb per file indexed. From there, this is synchronised across all enterprise sites and any file can be found from anywhere else.
Graham Beasley, Arcitecta
“When you have one, 10 or 100 million files shared between branches in a multinational company, our search engine can find it and show it to you in a few dozen milliseconds, which would take hours in a competing product,” he added.
According to Lohrey, the genius in the product lies in the way files are fragmented. The file fragments are transferred from one location to another by parallel channels and not necessarily in chronological order, or by anticipation.
“The object is to move the minimum amount of data at the moment the user wants to open the document,” said Lohrey. “From when a file is ingested into the array, we reference it and copy its blocks to the locations where our data says it will be used.
“If you update a file of 70TB, we don’t have to rescan all the metadata,” said Lohrey. “Our system just updates it from the previous version of the file in the blink of an eye.”
Lohrey is keen to highlight functionality in its database that manages temporal and spatial coordinates for each document. Known as XODB, it is a binary-XML object database embedded in Mediaflux that manages vectors between files to allow it to understand which might need to be replicated to other locations based on previous activity.
“If this system works efficiently, it is because we have not only written its database, we have also rewritten from scratch our NFS, SMB and S3 sharing protocols so they articulate better the information available in XODB,” said Lohrey. “That brings a few advantages, including being able to manage virtual hierarchies that correspond to a search.”
In the latest version, Livewire takes into account available bandwidth on network links to compress on the fly via different methods before transferring to another site. The way it resends lost packets during TCP/IP communication is optimised too.
An interface you develop yourself
Mediaflux is not delivered with a console in which to search for files. In its place, Arcitecta supplies a kit to develop an interface best adapted to the customer.
“The capacity to exploit your data is often connected to its visualisation and use,” said Beasley. “It’s not possible to create a generic interface that knows how to manage all types of data. We have therefore developed a framework that allows users to assemble the best interface for their needs very quickly. If applicable, we can write that interface if the customer wants it. Recently, we developed an interface for a museum in just four days.”
Besides file searches, the management interface allows for definition of rules about data placement. These are manually set rules that help Mediaflux anticipate the placement of files to give the impression they have been moved rapidly from the other side of the world when a user tries to access them.
On the roadmap for Mediaflux, Arcitecta highlights increased density of metadata and the ability to ingest content into generative artificial intelligence (GenAI) using retrieval augmented generation (RAG) capability. The latter requires data to be in a vectorial format, which suits XODB as it already has that structure.