Synthetic DNA is rapidly getting closer to becoming a viable commercial storage solution, according to Microsoft, which has demonstrated the first fully automated system to store and retrieve data in manufactured DNA. An entire warehouse-sized data centre’s material would fit on a dice-sized device were this to happen.
In the first demonstration of an automated system (rather than one involving numerous laboratory technicians and a wide range of intermediate steps) the researchers built a device that encodes data into a DNA sequence, which is then written to a short DNA molecule using a custom DNA synthesizer, pooled for liquid storage, and read using a nanopore sequencer and a “novel, minimal preparation protocol.”
The system cost just $10,000 to build – and costs could easily be brought down to as little as $3,000, the researchers said in a paper published in Nature Scientific Reports.
Successful commercial rollout would require the costs of synthesising DNA — custom building strands with meaningful sequences — and the sequencing process that extracts the information to fall markedly. Trends are moving rapidly in that direction, researchers say, after demonstrating a simple proof-of-concept test.
For the test, a team of Microsoft and University of Washington researchers successfully encoded the word “hello” in snippets of fabricated DNA and converted it back to digital data. Exciting: but the process took the process took 21 hours, mostly because of the slow chemical reactions involved in writing DNA, MIT explained.
A closer look at the research paper also reveals stupendously high error rates; expected for such an early stage technology but also revealing some of the challenges in turning this POC into anything closely resembling a commercially viable technology.
“Of 25,592 reads in this new dataset, 286 aligned well in the −100 to −1 region (score > 400) and contained enough bases to attempt decoding. Of those 251 had uncorrectable corruption, 11 had invalid checksum bases after correction, 8 were corrupted but correctable and of those 3 had hashes in agreement, 16 were perfect reads”.
DNA Data Storage: How it Works
Microsoft explained: “The automated DNA data storage system uses software developed by the Microsoft and UW team that converts the ones and zeros of digital data into the As, Ts, Cs and Gs that make up the building blocks of DNA.”
“Then it uses inexpensive, largely off-the-shelf lab equipment to flow the necessary liquids and chemicals into a synthesizer that builds manufactured snippets of DNA and to push them into a storage vessel.”
“When the system needs to retrieve the information, it adds other chemicals to properly prepare the DNA and uses microfluidic pumps to push the liquids into other parts of the system that “read” the DNA sequences and convert it back to information that a computer can understand. The goal of the project was simply to demonstrate that automation is possible…”
Synthetic DNA: “Orders of Magnitude” Improvements
The researchers added in the paper: “This device establishes a baseline from which new improvements may be made toward a device that eventually operates at a commercially viable scale and throughput. While 5 bytes in 21 hours is not yet commercially viable, there is precedent for many orders of magnitude improvement in data storage. Infact, recent storage advances by Erlich et al of 2 Mbytes and Organick et al. of 200 Mbytes demonstrate orders of magnitude improvements in the past two years and the underlying physics and chemistry show impressive upper bounds for density.”
See also: Microsoft Open Sources Homomorphic Encryption Library