Peter Diamandis Describes the Next Data Storage Revolution – It’s Biological

April 24, 2018

684

April 24, 2018 – Can DNA, the molecule of life, become the next data storage medium? It appears likely as researchers recognize the versatility of the double-stranded helical molecule that constitutes all of the life we know that exists in The Universe. In his latest sharing about the growth in digitized information, Peter Diamandis provides both a bit of history lesson on data storage as well as a description of the future of archived information.

In the age of big data, we are quickly producing far more digital information than we can possibly store. Last year, $20 billion was spent on new data centers in the U.S. alone, doubling the capital expenditure on data center infrastructure from 2016. And even with skyrocketing investment in data storage, corporations and the public sector are falling behind.

But there’s hope. With a nascent technology leveraging DNA for data storage, this may soon become a problem of the past. By encoding bits of data into tiny molecules of DNA, researchers and companies like Microsoft hope to fit entire data centers in a few flasks of DNA by the end of the decade.

Where We Have Come From

After the 20th century, we graduated from magnetic tape, floppy disks and CDs to sophisticated semiconductor memory chips capable of holding data in countless tiny transistors. In keeping with Moore’s Law, we’ve seen an exponential increase in the storage capacity of silicon chips. At the same time, however, the rate at which humanity produces new digital information is exploding (as seen in the graph below).

*The size of the global* datasphere is increasing exponentially, predicted to reach 160 zettabytes (160 trillion gigabytes) by 2025.

As of 2016, digital users produced over 44 billion gigabytes of data per day. By 2025, the International Data Corporation (IDC) estimates this figure will surpass 460 billion. And with private sector efforts to improve global connectivity—such as OneWeb and Google’s Project Loon—we’re about to see an influx of data from 5 billion new minds.

*By 2020, 3 billion new minds will join the web bringing the total number of users to 5 billion.*

While companies and services are profiting enormously from this influx, it’s extremely costly to build data centers at the rate needed. At present, about $50 million worth of new data center construction is required annually just to keep up, not to mention the millions spent on furnishings, equipment, power and HVAC.

And memory-grade silicon, rarely found pure in nature, may soon run out with researchers predicting this happening as early as 2040.

DNA is the Answer

Take DNA, on the other hand. At its theoretical limit, we could fit 215 million gigabytes of data in a single gram of DNA. But how?

DNA is built from a double helix chain of four nucleotide bases—A, T, C and G. Once formed, these chains fold tightly to form extremely dense, space-saving data stores. To encode data files into these bases, we can use various algorithms that convert binary to base nucleotides—zeroes and ones become A, T, C and G. ‘00’ might be encoded as A, ‘01’ as G, ‘10’ as C, and ‘11’ as T, for instance. Once encoded, the information is then stored by synthesizing DNA with specific base patterns, and the final encoded sequences are stored in vials with an extraordinary shelf-life. To retrieve data, encoded DNA can then be read using any number of sequencing technologies, such as Oxford Nanopore’s portable MinION.

Still in its deceptive growth phase, DNA data storage—or NAM (nucleic acid memory)—is only beginning to approach the knee joint of the exponential growth curve. But while the process remains costly and slow, several players are beginning to crack its greatest challenge: retrieval. Just as you might click on a specific file and filter a search term on your desktop, random-access across large data stores has become a top priority for scientists at Microsoft Research and the University of Washington. Storing over 400 DNA-encoded megabytes of data, University of Washington’s DNA storage system now offers random access across all its data with no bit errors.

Applications of DNA Data Storage

Even before we guarantee random access for data retrieval, DNA data storage has immediate market applications. As seen in the graph below, a huge proportion of enterprise data goes straight to an archive.

*This graph compares the criticality of data over time. It shows that the majority of stored information becomes less critical for immediate retrieval.*

Particularly for storing past legal documents, medical records and other archive data, why waste precious computing power, infrastructure and overhead?

Data-encoded DNA can last ten thousand years—guaranteed—in cold, dark and dry conditions at a fraction of the storage cost. Now that we can easily use natural enzymes to replicate DNA, corporations and SMEs have tons to gain (literally) by using DNA as a backup system—duplicating files for later retrieval and risk mitigation. And as retrieval algorithms and biochemical technologies improve, random access across data-encoded DNA may become as easy as clicking a file on your desktop.

Researchers are already investigating the potential of molecular computing, completely devoid of silicon and electronics. Harvard University professor George Church and his research colleagues, for instance, envision capturing data directly into DNA. As Church has stated, “I’m interested in making biological cameras that don’t have any electronic or mechanical components,” whereby information “goes straight into DNA.” According to Church, DNA recorders would capture audiovisual data automatically. “You could paint it up on walls, and if anything interesting happens, just scrape a little bit off and read it—it’s not that far off.” And one day we may even be able to record biological events in the body. In pursuit of this end, Church’s lab is working to develop an in vivo DNA recorder of neural activity, skipping electrodes entirely.

Future Potential

Perhaps the most ultra-compact, long-lasting and universal storage mechanism at our fingertips, DNA offers us unprecedented applications in data storage—perhaps even computing. As DNA data storage plummets in tech costs and rises in speed, commercial user interfaces will become both critical and wildly profitable. Once corporations, startups and people alike can easily save files, images or even neural activity to DNA, opportunities for disruption abound.

Imagine uploading files to the cloud, which travel to encrypted DNA vials, as opposed to massive and inefficient silicon-enabled data centers. Corporations could have their own warehouses and local data networks could allow for heightened cybersecurity—particularly for archives. And since DNA lasts millennia without maintenance, forget the need to copy databases and power digital archives. As long as we’re human, regardless of technological advances and changes, DNA will always be relevant and readable for generations to come.

But perhaps the most exciting potential of DNA is its portability. If we were to send a single exabyte of data (one billion gigabytes) to Mars using silicon binary media, it would take 5 Falcon Heavies and cost $486 million in freight alone. With DNA, we would need 5 cubic centimeters. At scale, DNA has the true potential to dematerialize entire space colonies worth of data.

Throughout evolution, DNA has unlocked extraordinary possibilities—from humans to bacteria. Soon hosting limitless data in almost zero space, it may one day unlock many more.