DNA is emerging as a robust data storage medium that offers ultrahigh storage densities. However, existing DNA storage systems suffer from high latency caused by the inherently sequential writing process.
Scientists addressed this problem by expanding DNA’s molecular makeup and developing a precise new sequencing method. They transformed the double helix into a robust, sustainable data storage platform.
Their prototype of a DNA data storage system uses an extended molecular alphabet combining natural and chemically modified nucleotides.
Kasra Tabatabaei, a researcher at the Beckman Institute for Advanced Science and Technology and a co-author of this study, said, “Every day, several petabytes of data are generated on the internet. Only one gram of DNA would be sufficient to store that data. That’s how dense DNA is as a storage medium.”
Picturing the future of data storage, scientists in this new study examined DNA’s millennia-old MO. They then added a 21st-century twist.
Every strand of DNA contains four chemicals: adenine (A), guanine (G), cytosine (C), and thymine (T). These chemicals arrange and rearrange themselves along the double helix into combinations that scientists can decode, or sequence, to make meaning.
Scientists added seven synthetic nucleobases to the existing four-letter lineup to boost the DNA’s capacity for data storage.
Tabatabaei said, “Imagine the English alphabet. If you only had four letters to use, you could only create many words. If you had the full alphabet, you could produce limitless word combinations. That’s the same with DNA. Instead of converting zeroes and ones to A, G, C, and T, we can convert zeroes and ones to A, G, C, T, and the seven new letters in the storage alphabet.”
This is the first time scientists chemically modified nucleotides for information storage in DNA. They could do this by combining machine learning and artificial intelligence to develop a first-of-its-kind DNA sequence readout processing method.
Chao Pan, a graduate student at the University of Illinois Urbana-Champaign, said, “We tried 77 different combinations of the 11 nucleotides, and our method was able to differentiate each of them perfectly. The deep learning framework as part of our method to identify different nucleotides is universal, which enables the generalizability of our approach to many other applications.”
This letter-perfect translation comes courtesy of nanopores: proteins with an opening in the middle through which a DNA strand can undoubtedly pass. Scientists found that nanopores can detect and distinguish each monomer unit along the DNA strand — whether the units have natural or chemical origins.
Charles Schroeder, the James Economy Professor of Materials Science and Engineering, said, “This work provides an exciting proof-of-principle demonstration of extending macromolecular data storage to non-natural chemistries, which hold the potential to increase storage density in non-traditional storage media drastically.”
- S. Kasra Tabatabaei et al. Expanding the Molecular Alphabet of DNA-Based Data Storage Systems with Neural Network Nanopore Readout Processing. DOI: 10.1021/acs.nanolett.1c04203