Suggest to a chief information officer that they could soon store 10 million times as much data as the capacity of a single hard drive, and at the very least, they are likely to be sceptical.
But such advances could be possible – and within the next few years. The reason is DNA storage. Instead of using hard drives, magnetic tape or flash memory, DNA storage holds data using the code of life itself.
With today’s science, a DNA storage system can hold 10 zettabytes of data in a device the size of a shoebox, according to John Monroe, vice-president and analyst at industry researcher Gartner. “These beautiful four-letter codes could be the ideal way to store digital data,” he says. “It is huge in terms of capacity – it holds more promise than any other archival storage format.”
Researchers estimate that data stored in DNA could last between 700,000 and a million years, far beyond the lifespan of any current storage technology. Monroe sees DNA storage replacing tape, or optical drives for nearline or offline storage.
DNA itself is extremely robust, able to withstand heat and cold. And once the information has been encoded and synthesised into DNA – the “write” phase – it needs no power to keep it in that state. DNA sequencing and decoding, the “read” phase, converts the DNA’s four-letter nucleotide code back into a form that a computer can process.
But despite this promise, the idea is still some way from being a practical technology. The IT industry has yet to come up with workable, production-scale DNA storage devices. “People are still struggling with how that looks,” Monroe admits.
He believes the equipment will be the size of a kitchen appliance; others predict it could be the size of a school bus. Microsoft has already developed a more practically sized DNA encoding and retrieval machine, with the University of Washington. It is very much still a prototype, however, and not something an IT department could simply drop into an existing 19U IT rack.
Chemical romance
Current DNA encoding and sequencing is still largely a chemical process, however. That’s the reason why the Microsoft and University of Washington prototype looks closer to something you might find in a school science lab than a datacentre. And the process is currently expensive.
Sequencing 1MB of data costs about $3,500 (£2,500). And although costs are falling, this is vastly more than the cost of writing the same volume of data to flash or disk. Gartner believes the technology will not become mainstream until the cost falls to about $0.01 per gigabyte.
Alternative technologies include enzymatic DNA synthesis (EDS), which is being developed by the Wyss Institute, part of Harvard University. Researchers believe this will reduce the cost of DNA synthesis by many orders of magnitude. The team at Wyss is developing an electronic device that can synthesise data into DNA. They believe this will scale up the process by allowing the synthesis process to be parallelised.
Researchers, though, are confident that the cost and practical barriers will be overcome, if only because few, if any, technologies offer the potential of storing the vast quantities of data that can be stored in DNA.
Unsurprisingly, governments and intelligence agencies are behind much of the interest in DNA storage. In the US, the Intelligence Advanced Research Projects Activity (IARPA), part of the Office of the Director of National Intelligence, runs MIST, the Molecular Information Storage programme, which is tasked with writing one terabyte and reading 10 terabytes of data within 24 hours at a cost of $1,000.
Other researchers, at Los Alamos National Laboratory, are being funded by IARPA to work on systems to translate DNA information into computer-readable code. Their system, ADS Codex, handles encoding and decoding back to binary, independently of the method used for the DNA synthesis itself.
Also, ADS Codex provides advanced error correction. Write errors are higher in DNA storage than in conventional digital storage, a problem made worse by the fact that DNA has four letter states, rather than binary’s zeros and ones. ADS Codex verifies the data and removes the errors. The code is available on GitHub.
Europe, too, has contributed to the field. The EU-based DNA DS project, coordinated by Slovenian researchers, is looking at storing 450 petabytes of data in a single molecule. Potentially, a whole datacentre could fit into single vial of liquid. The researchers have also examined another benefit of DNA storage. Although writing data to DNA remains slow, even a full vial can be replicated in just hours, with almost no costs and using little energy.
Tech alliance
Now that academic researchers have proved DNA storage is possible, the focus is turning to practicalities.
In 2020, a group of computing industry heavyweights, including Microsoft and Western Digital, formed the DNA Data Storage Alliance along with biotech companies Twist Bioscience and Illumina, and academic researchers.
The goal is to create a viable ecosystem around DNA storage, with Microsoft and others pointing out that the field is moving from academic and scientific research towards practical data storage applications for IT. The most attractive application, at least at first, is cold storage data that is written once, and read rarely.
Other applications include media. Last year Twist encoded – appropriately enough – an episode of the Netflix series Biohackers to DNA. Being able to record effectively limitless quantities of data, store them indefinitely and replicate them quickly, could suit the movie and other creative industries.
Other potential applications include medical data storage, and legal and compliance archiving.
This does, however, pose a few other problems, and these are as much around standards as technology. “For data such as WORM – write once, read many – or WORN – write once, read never – it is important that the data is immutable,” cautions Gartner’s Monroe. “You need to know that what you write, say an image of a brain today, will be exactly the same in 10 years’ time.”
If researchers can ensure that is the case, then the double helix of life could yet emerge as the best way to store our data into the distant future.