The desire to preserve and record is baked into humanity. Historically, humans have used different mediums to record and transfer knowledge for future generations. Cave paintings in Lascaux, Mesopotamian clay tablets, the Rosetta stone and Egyptian hieroglyphs, Ogham stone engravings, the Dead Sea scrolls and the Library of Alexandria are but a few examples. We fondly admire these ancient relics of communication and information storage from our past while basking in the glory of our smart devices, dissociated from the hum and blinking lights of the data warehouse fortresses floating all this data and information at our fingertips.
Have we reached the technological pinnacle in this long lineage of archival or what is the next evolution?
Today we are accumulating and generating data at unprecedented rates seen in our history as a species. How we store, access and archive this data has revolutionized information access. Never before has it been possible to leverage billions of concurrent data points to understand, predict and even manipulate human behaviour and events on a global scale. It’s remarkable really, but just how enduring is the current technology and methods used to store our collective data?
Can our disks and drives withstand the test of time? What would survive of our hard drives and servers in the case of an electromagnetic pulse (EMP) or meteor colliding with Earth? How resilient is our technological architecture and information storage to a power outage? How do hard drives cope without even facing any environmental challenges?
DNA is life’s hard drive
DNA is life’s hard drive. Tried and tested, it has faithfully preserved life for billions of years. Now can it preserve our civilization’s data?
Many scientists and companies believe so. They are investing time and funding into exploring DNA as a data storage medium. Microsoft and the U.S. government are heavy into DNA data storage research. They’ve even formed a DNA data storage alliance.
My photo and meme collection arguably don’t need to be saved, but the Earth’s history, humanity’s technological and architectural blueprints, alongside our scientific advancements and classic works of literature probably should.
It might sound far-fetched to lump all this data onto DNA molecules and store them in an Antarctic base as a fail-safe time capsule for future humanity but it has been done and is not a new idea.
Things which have been successfully encoded and retreived from DNA:
- a Massive Attack album
- Linux operating system
- a 1895 French film, “Arrival of a train at La Ciotat”
- a $50 Amazon gift card
- a short film on living DNA in bacterial cells
- crypto wallet passwords
- a computer virus
- 70 billion copies of the book Regenesis
- English Wikipedia
Current iterations on DNA data storage focus on optimizing the amount of data that can be stored, improving encryption methods and random access of the data from DNA.
The need for DNA Data storage
Hard disk drives (HDD) are currently the best way to store data long term, despite being guaranteed to only function smoothly for 3-5 years. Due to their mechanical nature and moving parts, their failure rate is more unpredictable.
Solid state drives or flash (SSDs) are purported to be more reliable than HDD but data released by backblaze on annualized failure rates (AFR) suggest they actually fail at around the same rate as HDDs and to prioritize factoring cost and speed into whether you would choose a SSD over HDD, and not reliability. How reassuring!
The best long-term offline data storage we have is Linear Tape-Open (LTO), a magnetic storage tape developed in the 90’s which has an approximate lifespan of 20 years. It’s current model, the LTO-9, can hold 18TB of uncompressed data.
A quick online search tells me that current data warehouse conditions are a mixture of DRAM and HDD.
I’m sure there are redundancy mechanisms in place in data warehouses to save data if a drive fails but are they just going to continuously transfer the data to new hardware and storage formats when they threaten to fail or are nearing the end of their lifespan? Do we even have enough physical resources to produce the hardware needed?
So apart from facing the limited lifespan and inevitable failure of the drives, another issue is that our current data storage architecture simply cannot scale with the amount of data being generated. One could ask does it really need to scale with the zettabytes of data being generated (or whatever the latest buzzword for an insane quantity of 1’s and 0’s encoded on silicon is) when it comes with a steep ecological cost.
Current data storage limitations:
- Physical material supply: the projected demand for silicon and rare metals-based memory technology far outweighs the supply which will lead to a lack of raw material required to build storage mediums.
- Hardware memory life span: drives don’t last forever. The older they are, the greater the liability and unreliability.
- Hardware physical limitations: the amount of memory that can be stored per mm disk is reaching the extremes of its limits (Moore’s Law). Attempts to further miniaturize traditional storage architectures, such as hard-drives and magnetic tapes, are becoming increasingly difficult.
- Space limitations: server farms and hardware eat up physical space and require a lot of maintenance. There’s a famous statement by Nick Goldman that all of Earth’s data could fit into the back of a Toyota hiAce if encoded on DNA.
- Energy requirements: maintaining the internet and server farms require an awful lot of land, cables, water and electricity.
- Data storage in data centres makes up 4 to 8% of our global energy footprint.
- Environmental toll: scientists across disciplines have been documenting the environmental impacts of internet infrastructures, and of data storage in particular, with discussion around this movement being termed ‘Big Data Ecologies‘.
- Huge financial and political support for Big Tech’s expansion is occurring despite an honest conversation and observation about their impact on ecological and environmental instability.
- For example: California is seeing increases in drought and forest fires. California is also home to the largest cloud computing facilities in the US – with upward of 800 data centers –yet the drought is rarely discussed in relation to internet infrastructures, the electricity they consume, or the tremendous water required by data centers to cool their servers. (The California Energy Code, 2013).
Properties of DNA as a storage medium
When considering long-term storage, there’s no better candidate for the job. Scientists aren’t even restricted to tacking on information to existing organism’s DNA. We can create and design our own DNA template.
This synthetic DNA can be constructed to encode any series of bases or letters required. So why not have an encryption system to encode information – bits – onto these bases.
- DNA doesn’t consume any energy. You can write information to DNA and store it forever.
- DNA is highly stable.
- DNA has a long lifespan.
- The half-life of DNA highly correlates with temperature and the fragment length.
- DNA can be recovered from the ancient remains of extant animals or even mummified Egyptians which can remarkably be decoded to reconstruct a picture of what their lives were like or who are their current relatives.
- DNA is ultra-dense. DNA information density is 1.47 terabit/mm2 or 950 terabit/in2, or more than 800 times the density of HDDs. 1g of DNA can store 215 million GBs of information.
- DNA is self-replicating and self-preserving. DNA can easily be induced to make more copies of itself.
- Data encoded on DNA can even be inserted into a living organism such as a bacteria, which will faithfully reproduce copies of the DNA to every daughter cell.
- Exists in a terminal format. DNA’s format is universal and isn’t specific to one operating system or platform. It can also be presumed that we will want to continue reading DNA so sequencing technology will always be around.
- Can be stored as a liquid. This gives greater flexibility for storage.
In a nutshell, How does it work?
The binary code of a file is encoded into a string of DNA bases. Binary can be encoded into bases a number of ways. One example of an encoding would be:
Then the DNA is synthesized with the order of nucleotides positioned to encode the binary file.
Challenges of DNA as a storage medium
So far the biggest challenge in using DNA as a storage medium lies in the synthesis step. Writing really long strings of DNA is not possible so DNA has to be written onto shorter strings (oligonucleotides). DNA synthesis also remains very expensive. To realistically scale up DNA as a storage medium, the cost of DNA synthesis must be brought down.
We’re not quite ready to ditch our hard drives for the strands of life, but DNA as a storage platform is a very promising and an exciting field to watch out for in the future!