In three months time, this story will be encoded into a sample of synthetic DNA and sent into deep space as part of an intergalactic communiqué.
If you have a message you’d like to share with whoever might discover this data on the other side, the comments section is all you.
Our species is caught in a unique and entirely self-created double-bind: We produce more information than we know how to keep, and our attempts to hold on are destroying the planet we call home.
The digitization of everything is driving exponential increases in recorded data. 90% of digital information ever created has occurred in the last two years. Beneath the familiar landscape of photos and videos lies a subvisible world of heavy-hitter industrial grade information. Data mining in the fields of robotics, smart cities, autonomous vehicles, astronomy and climate science are hyperscaling the demand for data storage, and way outstripping the current supply.
We go to great lengths to preserve this titanic collection of memories. Digital data centers alone will consume 10% of global electricity by 2030. It seems that we really do not like to forget.
This built-in drive to remember and its accompanying energy consumption comes at a staggering cost to our environment. ‘Data warming’ contributes 2% of all greenhouse gas emissions. As bioinformatician Dina Zielisnki puts it, “Big data has become a big problem.”
There is hope though. It lies not in the world around us, but in the worlds within us. Our very own DNA has spent millennia engineering the perfect hard drive. It’s energy efficient, it’s space-efficient and it remembers for hundreds of thousands of years. Synthetic DNA company Twist Bioscience, who I interview in Part 2 of this series, articulates this sprawling available storage space succinctly: Every movie released in the 21st century consists of 47 terabytes of data; the storage capacity of the DNA in one human body is 150 billion terabytes. If that doesn’t paint a picture, here’s another one: A single gram of DNA can store 215 million gigabytes. That’s 13 million times more storage capacity than this 2018 MacBook Pro and five times larger than the total annual data footprint of Twitter.
The last 10 years have seen breathtaking strides in writing data into DNA and retrieving it when we want. While many challenges lie ahead, genetic code presents the most promising solution to our unrelenting data storage needs.
This is the first installment of a five-part exploration into the hopeful world of DNA data storage. In this editorial package, EXO will explore the history, breakthroughs, mechanics, applications, obstacles and event horizon of this extraordinary development in the relationship between humans, our information and the building blocks of life.
There’s Plenty of Room At The Bottom
The very idea of storing information in DNA was born outside the laboratory, in the imagination of one particularly powerful mind. Two days before New Years Eve 1959, a 41-year old physics superstar named Richard P. Feynman stepped in front of a packed banquet hall at the annual American Physical Society in Pasadena, California. He was empty-handed - not a single speaking note in sight. Feynman was about to invent DNA data writing, and conjure what would become the field of nanotechnology, seemingly in riff-mode. The title of his lecture was There's Plenty of Room at the Bottom: An Invitation to Enter a New Field of Physics. For the elite group of scientists in attendance, Feynman’s lecture was a rallying cry to take a bigger look at a tiny world. Its primary focus was on “manipulating and controlling things at a small scale.”
Feynman opened with a prediction, that all volumes of the Encyclopedia Brittanica could eventually be written on to the head of a pin. Pushing the premise further, he then considered what it might take to write the contents of all the books ever written on the head of a pin. Assuming that each of the 24 million books was the size of an Encyclopedia volume, Feynman jellybean counted the total info-load at around 10 billion bits. If each bit was given an ample 100 atoms of space, Feynman concluded, every book in the world could be written “in a cube of material one two-hundredth of an inch wide— which is the barest piece of dust that can be made out by the human eye.”
For Feynman, establishing a meaningful dialogue with this miniature world would require an effort of interdisciplinary dimensions. In his lecture, he called for denser computer circuitry and more powerful electron microscopes. Ultimately though, according to Feynman, it was biologists that held the keys to this micro-kingdom. “All this information,” he reminded his peers that pivotal evening, “is contained in a very tiny fraction of the cell in the form of long-chain DNA molecules in which approximately 50 atoms are used for one bit of information.”
Feynman concluded his lecture with a challenge to the community. He offered $1,000 to anyone who could shrink a book’s page to 1/25,000th of its original size while ensuring the contents remained clear enough to be read by an electron microscope. Feynman closed the evening with the only off-base forecast of his lecture. “I do not expect that such prizes will have to wait very long for claimants,” he mused.
It would take 30 years for applied technology to catch up with Feynman’s predictions. In 1985, a Stanford graduate named Tom Newman successfully reduced the first paragraph of A Tale of Two Cities by 25,000 times its original size and collected the $1,000 Feynman Prize.
The Bitcoin Challenge
Another 30 years later, in 2015, European Bioinformatics Institute (EBI) senior scientist Nick Goldman took the stage at the World Economic Forum in Davos, Switzerland. Goldman was there to discuss his pioneering Nature paper on high-capacity DNA data storage. Goldman opened with a reminder to the audience that, “DNA is the hard drive, the memory, in every cell of every living organism.” In reference to the DNA sequence itself, he explained, "If you read out the information that's there, it's just like a ticker tape of letters...there's 3 billion of those and that mere 3 billion letters defines your genome and all the instructions to make a living human." For Goldman, "DNA is a digital storage medium. It's a sequence of a discrete alphabet of four letters. And if we could manipulate some DNA we could put a message in there ourselves."
To test DNA’s data storage capabilities, Goldman and his team at EBI devised an experiment. They selected all of Shakespeare's 154 sonnets, an mp3 file of Martin Luther King's I Have a Dream speech, and a PDF copy of Watson and Crick’s 1953 paper describing the helical structure of DNA. The data was sent to Agilent, a life sciences outfit in Santa Clara, California. Two weeks later, a test tube arrived back at EBI headquarters in the UK. The tube appeared almost empty, except for a tiny speck of matter. That speck contained all three files. The team was able to decode or ‘read’ the data. Its fidelity came in at a flawless 100%.
At the time of Goldman's speech in 2015, the cost of writing data into DNA came in at a prohibitive $12,500 per megabyte. Considering the exorbitant price tag, Goldman envisioned that for the near-term, only high-value information would be worth encoding into DNA. US Presidential Archives or perhaps a directory of nuclear waste disposal locations would get to go first.
Another high-value piece of information gaining recognition at the time was the cryptocurrency Bitcoin. To call attention to his mission, Goldman announced a race. He had bought one Bitcoin, and encoded the key to the wallet containing the digital cash into multiple samples of DNA. Assistants then shuffled through crowd handing out test tubes containing the samples. The first person to decode the sequence could then collect the bounty, which at the time was worth $300. A deadline was set for three years from the announcement.
In December of 2017, Goldman’s Bitcoin remained unclaimed. With the value of the currency soaring 6,500% above 2015 levels to $19,500, he Tweeted a reminder that the contest was ending in 50 days.
A computational microbiology PhD student in Belgium named Sander Wuyts saw the Tweet. Wuyts wrote Goldman, to obtain a sample of the DNA. Goldman responded, requesting a set of reasons for why Wuyts might be the man for the job. Wuyts outlined his bonafides in an email back to Goldman. One week later a test tube arrived.
Wuyts and his colleagues dove into the challenge, which is documented in detail on Wuyts’ blog. The team gathered their first clues for how to proceed from Goldman’s Nature paper. To encode information into DNA, a text or binary file is rewritten in base-3. This conversion transforms instructions in ones and zeroes (binary) to zeroes, ones, and twos (base-3). This base-3 layer is then used to encode the data into the four DNA nucleobases cytosine, thymine, adenine and guanine. We’ll get into how DNA sequencing works in part 2. But for now, for Wuyts, the order of operations looked like this:
By reverse-engineering these steps, Wuyts was able to decode the genetic information into text.
The data was hidden behind one more veil of secrecy though. Goldman had used a keystream, which is a random series of characters designed to obfuscate the final meaning. The keystream code had been provided by Goldman in a document explaining the competition. Using this guide, Wuyts and his team were able to convert the DNA sequence into plain text, revealing the private key to the crypto wallet.
Wuyts collected his hard-earned Bitcoin five days before the contest deadline. Dubious about the currency’s future value, he cashed it out immediately.
Current Status
Between Feynman’s challenge, Goldman’s race and present day, DNA data storage has undergone a revolution in capacity. The decrease in costs combined with a quantum leap in automation and miniaturization have turbocharged adoption.
To get a better sense of how DNA data storage works and its evolution heretofore and beyond, I sat down with the leadership at Twist Bioscience. Twist is a leading synthetic DNA company, and a founding member of The DNA Data Storage Alliance.
Stay tuned for Part 2 of Save As: DNA.
I remember reading about this topic a few years ago. It is indeed very interesting.
But I see 2 obvious challenges:
1. Coming up with a standard "encoding" to use these four values (nucleobases) to represent actual data.
2. An affordable and efficient way to read and write. Because even a flash drive has a formatting... I can only imagine what sort of device(s) you would need to go from "chemistry" to "zeros and ones", and vice-versa.
But super cool post. Very informative.
"The Elon army is not good for mankind" -Rex Baird. This is going in to space right?