Can DNA preserve our civilization’s data?

Mar 12, 2022

Can DNA preserve our civilization’s data?

The desire to preserve and record is baked into humanity. Historically, humans have used different mediums to record and transfer knowledge for future generations. Cave paintings in Lascaux, Mesopotamian clay tablets, the Rosetta Stone and Egyptian hieroglyphs, Ogham stone engravings, the Dead Sea Scrolls and the Library of Alexandria are but a few examples.

We fondly admire these ancient relics of communication and information storage from our past while basking in the glory of our smart devices - dissociated from the hum and blinking lights of the data warehouse fortresses floating all this data at our fingertips.

Have we reached the technological pinnacle in this long lineage of archival - or what is the next evolution?

Today we are accumulating and generating data at unprecedented rates in human history. How we store, access and archive this data has revolutionized information access. Never before has it been possible to leverage billions of concurrent data points to understand, predict and even manipulate human behaviour on a global scale.

It’s remarkable - but just how enduring is the technology used to store our collective data?

Can our disks and drives withstand the test of time? What would survive of our servers in the case of an electromagnetic pulse (EMP) or meteor strike? How resilient is our technological architecture to power outages? How do hard drives cope even without environmental challenges?

HDD fossil

DNA is life’s hard drive

DNA is life’s hard drive. Tried and tested, it has faithfully preserved life for billions of years.

Now - can it preserve civilization’s data?

Many scientists and companies believe so.

Microsoft and the U.S. government are heavily invested in DNA data storage research. They’ve even formed a DNA data storage alliance.

My photo and meme collection arguably don’t need to be saved - but Earth’s history, humanity’s technological blueprints, scientific advancements and classic literature probably should.

Seed vault Svalbard Seed Vault

It might sound far-fetched to encode all this data onto DNA molecules and store them in an Antarctic base as a fail-safe time capsule for future humanity - but it has already been done.

Things successfully encoded and retrieved from DNA include:

All of Your Data in a Drop of DNA Video

All of Your Data in a Drop of DNA

Current DNA storage research focuses on:

  • Increasing storage density
  • Improving encryption
  • Enabling random access retrieval

The need for DNA data storage

Hard disk drives (HDDs) are currently the standard for long-term storage — despite being guaranteed to function smoothly for only 3–5 years.

HDD vs. SDD

Solid-state drives (SSDs) are often considered more reliable, but Backblaze data shows they fail at roughly similar annualized rates. Cost and speed matter more than reliability differences.

The best long-term offline storage we have is Linear Tape-Open (LTO), with an approximate lifespan of 20 years. The current LTO-9 model holds 18TB uncompressed.

Modern data warehouses rely on mixtures of DRAM and HDD.

But beyond failure rates, there is a scaling issue: our current architecture cannot sustainably scale with the zettabytes of data being generated.

Server farm bytess

Current data storage limitations

  • Physical material supply

    • Silicon and rare metal demand far outweigh supply.
  • Hardware lifespan

    • Drives degrade and become unreliable.
  • Physical limits (Moore’s Law)

    • Miniaturization of magnetic storage is approaching its limits.
  • Space limitations

    • Server farms require enormous physical space.
    • Nick Goldman famously stated that all of Earth’s data could fit in the back of a Toyota HiAce if encoded on DNA.
  • Energy requirements

    • Data centres account for 4–8% of the global energy footprint.
  • Environmental toll

    • The ecological impact of internet infrastructure has been termed “Big Data Ecologies.”
    • California hosts ~800 data centres — yet their water and energy use is rarely discussed alongside drought and wildfire crises.

Properties of DNA as a storage medium

When considering long-term storage, DNA is arguably unmatched.

Scientists are not restricted to inserting data into existing organisms - we can synthesize custom DNA sequences.

This synthetic DNA can encode any series of bases required.

Why not encode binary bits into bases?

Example encoding:

A → 00

T → 01

C → 10

G → 11

Advantages of DNA storage

  • No energy required once written

  • Highly stable

  • Extremely long lifespan

  • DNA has been recovered from ancient remains and mummified Egyptians

  • Ultra-dense

  • 1g DNA can store ~215 million GB

  • ~800x denser than HDDs

  • Self-replicating

    • Can be amplified via PCR
    • Can be inserted into living cells
  • Universal format

    • Not OS-dependent
    • Sequencing tech will likely always exist
  • Can be stored as liquid

DNA data storage medium comparison

So… how does it work?

bytess

  1. A digital file is converted to binary.

  2. Binary is encoded into DNA bases.

  3. The DNA strand is synthesized.

  4. DNA is stored.

  5. To retrieve data, DNA is sequenced and decoded back to binary.

Challenges of DNA as a storage medium

The main bottleneck is DNA synthesis.

  • Long strands cannot be synthesized easily.

  • DNA must be written in short oligonucleotides.

  • DNA synthesis remains expensive.

To scale DNA as a storage platform, synthesis costs must dramatically decrease.

Sci-fi future

We’re not quite ready to ditch our hard drives for strands of life - but DNA storage is an extraordinarily interesting field to watch.

It seems that much of the scientific groundwork is already in place to create a functional DNA data storage system. The chemistry works. The encoding works. Retrieval works. The barriers are cost, scale, and speed - not possibility.

In the short term, DNA will likely serve as an ultra-long-term archival medium rather than a replacement for everyday computing. But the question naturally follows: if we can store data in DNA, could we eventually compute directly on DNA as well?

DNA computing is not science fiction. Molecular logic gates and strand displacement systems already exist. Today they are slow and experimental - but so were silicon transistors in the 1940s.

And in the extreme distant future, could humanity’s knowledge itself be archived in synthetic DNA vaults?

How large would that be, really? The entire English Wikipedia is roughly 16 GB uncompressed. Estimates suggest that all written works in human history might only amount to a few hundred terabytes - trivial compared to DNA’s storage density. One gram of DNA can theoretically store ~215 million gigabytes.

Civilization, condensed into a test tube.

Perhaps one day, long after our server farms have rusted and our magnetic tapes have degraded, fragments of synthetic DNA will remain - waiting quietly to be sequenced by whatever intelligence comes next.