Ian Anzlowar, UC Irvine
Just how much space would you need to store all of the world’s data? A building? A block? A city?
The amount of global data is estimated to be around 44 zettabytes. A 15-million-square-foot warehouse can hold 1 billion gigabytes, or .001 zettabyte. So you would need 44,000 such warehouses — which would cover nearly the entire state of West Virginia.
John Chaput is hoping to change all that.
A professor of pharmaceutical sciences at UC Irvine with appointments in chemistry and molecular biology & biochemistry, he and his lab team are striving to improve a technique that’s already on the bleeding edge of synthetic biology and data storage. By employing an artificial variation of DNA, Chaput is transforming the field of semipermanent data storage.
“Unnatural genetic polymers offer a nice paradigm for developing novel soft materials that are capable of low-energy, high-density information storage without the liabilities of DNA,” he says.
Genetic data encoding is relatively new. Scientists have only been able to effectively record data on and recover data from DNA for about eight years, with the most significant advances happening over the past two. While the process is quickly becoming more cost- and time-effective, other setbacks still inhibit the long-term practicality of the method.
For example, DNA is an inherently fragile molecule, susceptible to degradation from numerous naturally occurring enzymes, sunlight, and a slew of acids and bases. For a more robust medium of genetic storage, Chaput chose threose nucleic acid. TNA is much hardier and less prone to degradation from physical factors, including enzymes and acids and bases, but it is not indestructible. TNA can be damaged or destroyed by biological contamination, though this is uncommon.
What makes genetic storage so effective is the intrinsic complexity of each molecule versus digital techniques, which use a binary coding system of ones and zeros. Computers convert every symbol, image and sound into a binary sequence and transcribe it to a magnetic or solid-state drive. This process has made incredible leaps and bounds over the past few decades, but soon it may not be enough.
“At some time, we will start making more info than we can store,” Chaput says. “What do we do then?”
By employing the four-letter nucleotide code used in DNA, rather than the binary system, Chaput’s team can effectively transcribe data to a strand of DNA, which is made up of four components: adenine, thymine, cytosine and guanine, referred to as A, T, C and G. By sequentially assigning each nucleotide a specific binary number, the researchers can essentially write a binary sequence using these nucleotides. When retrieval of the genetic code is required, a special enzyme that connects the two sequences is added, and the genomic sequence is converted back into the original binary form.
TNA also comprises A, T, C and G, but it’s a synthetic genetic polymer created by organic chemist Albert Eschenmoser and modified by Chaput to carry information. It’s one of several improvements developed by humans to address the innate fragility of DNA.
Made using an artificial sugar called threose, TNA has quickly become an important synthetic genetic polymer because of its ability to base pair with other sequences of DNA and RNA, as well as its 100 percent biostability and lack of degrading enzymes.
Chaput and his team have already tested this mechanism by transcribing the Declaration of Independence and the UC Irvine seal to a solution of TNA and recovering them.
He has theorized that — due to the medium’s incredible complexity — all of human history, every book ever written, every song ever sung and every Instagram brunch photo ever taken could be stored in half a cup of liquid TNA.
“These systems open the door to new possibilities,” Chaput says. They’re “quite different than the ones used by nature.”