Is DNA a 3.85 billion year old life-encoding data structure from space?

Jan 20, 2020

Is DNA a 3.85 billion year old life-encoding data structure from space?

Abiogenesis is the theory of life originating from natural processes on Earth. It purports that organic compounds comprising the building blocks and constituents of life arose and assembled from non-living matter.

Simpsons evolution

Although experiments have shown that it is possible to form amino acids and other compounds in the lab under controlled conditions, this simulated scenario has failed to yield a double-stranded, progenitor molecule with instructions for creating life.

The molecule I’m talking about is of course DNA, the original subject of the chicken-and-egg conundrum. In a somewhat longer blog post than I anticipated, I am going to explore our current understanding of molecular evolution and illuminate the tricky problem that is pinning down the origin of life and address the complexities in answering which came first — the DNA, the RNA or the enzyme?

The three greatest mysteries in science are generally considered to be the origin of the universe, the origin of life and the origin of consciousness. “How did we get here?”, “How was life formed?” and “Who created it?” are natural questions for the curious mind to ponder.

When studying the mechanisms of cell replication and genetics in university I was really fascinated by the topic. I accepted what my Professors outlined as “the central dogma” — the established scientific worldview that from DNA, RNA is made, and from RNA, protein is made. This bidirectional ebb and flow of cellular construction, was something I had to learn to get good grades, not to philosophically analyze.

For a biology module, Richard Dawkins’ The Selfish Gene was recommended reading. Cleverly written and entirely captivating, I was sold on the idea that humans are vessels or meat suits for genetic life to survive and reproduce through, and that the genetic information of life can be reduced to mechanistic nuts and bolts which randomly arose through chance conditions in a “primordial soup”.


DNA as a data structure

It wasn’t until after uni when I started questioning my friends’ atheistic world views and delved deep inside to find some answers, I returned to reflect on the complexities of cellular biology. It was as if I was seeing genetics and molecular biology in a new light and could finally appreciate the wonder that is DNA. Even now, the more I sit with the concept and mechanism of DNA, the more questions I have.

DNA is a huge and complex molecule found in every cell of every organism on Earth. Double-stranded, containing 3 billion base pairs per strand, it is composed of 4 distinct nucleotides (A, T, C and G). Each strand is complementary to the other with an inverse copy of nucleotides appearing on the intertwining strand bonding through A–T and C–G base pairing.

DNA helix DNA Nucelotides

Nice slides explaining DNA Structure visually & concisely

DNA has built-in encoded patterns of nucleotides that trigger the start and stop of transcription (the first step in protein synthesis/the central dogma). This linear sequence of letters forms the genetic code of life. Through the lens of a programmer you would be forgiven for comparing a DNA strand to an array (data structure commonly used in programming to store and retrieve data), or a 2D array taking both strands into consideration.

Like an array, DNA cannot perform actions on itself. A team of helper molecules and enzymes that support and catalyze steps in DNA replication and protein synthesis are recruited to the DNA to perform their highly specialized roles. In a programming analogy, these molecules are the methods or functions of a class called on the data structure to manipulate it in some way. These methods constantly read and access DNA throughout the life of the cell to direct protein synthesis and orchestrate all cellular activity.

The abstraction of DNA and its processes can be represented so well in code, a brand new field of bioinformatics has formed to computationally tackle genetic questions. Algorithms process files storing DNA or protein letters to compare sequence similarity, predict RNA and protein structure, as well as build phylogenetic trees to map a species’ evolutionary history.

To predict the proteins that will be made from a DNA sequence, an RNA codon table in transcription is encoded using a dictionary/map. RNA is an intermediate, complementary template strand to DNA forming the basis of protein generation. Previously described as the “Rosetta stone” of nature, here we have the mapping of RNA nucleotides to their amino acid counterpart:

RNA_codons = {
    "UUU": "F", "CUU": "L", "AUU": "I", "GUU": "V",
    "UUC": "F", "CUC": "L", "AUC": "I", "GUC": "V",
    "UUA": "L", "CUA": "L", "AUA": "I", "GUA": "V",
    "UUG": "L", "CUG": "L", "AUG": "M", "GUG": "V",
    "UCU": "S", "CCU": "P", "ACU": "T", "GCU": "A",
    "UCC": "S", "CCC": "P", "ACC": "T", "GCC": "A",
    "UCA": "S", "CCA": "P", "ACA": "T", "GCA": "A",
    "UCG": "S", "CCG": "P", "ACG": "T", "GCG": "A",
    "UAU": "Y", "CAU": "H", "AAU": "N", "GAU": "D",
    "UAC": "Y", "CAC": "H", "AAC": "N", "GAC": "D",
    "UAA": "Stop", "CAA": "Q", "AAA": "K", "GAA": "E",
    "UAG": "Stop", "CAG": "Q", "AAG": "K", "GAG": "E",
    "UGU": "C", "CGU": "R", "AGU": "S", "GGU": "G",
    "UGC": "C", "CGC": "R", "AGC": "S", "GGC": "G",
    "UGA": "Stop", "CGA": "R", "AGA": "R", "GGA": "G",
    "UGG": "W", "CGG": "R", "AGG": "R", "GGG": "G"
}

RNA = "UGUGCCACCUAA"
protein = translate(RNA)
print(protein)
CAT

# wow. stringing proteins like this together forms the building blocks of nearly every process and structure in the body e.g.: enzymes (catalysts), structural (actin, keratin), transporters (hemoglobin), signallers (hormones, receptors).

Source: RNA codon table

The human source code can be found here: ftp://ftp.ensembl.org/pub/current_genbank/homo_sapiens/

Recommended reading/viewing:


The rise of self-replicating organisms

So if DNA is the universal data structure of life, how was it created or how did it evolve? First we need to understand the intricacies of cell replication and the problem this poses to abiogenesis.

Cell replication occurs in every cell in our bodies. For replication to occur an exact copy of DNA needs to be transferred into each daughter cell. Genes written into DNA encode the proteins responsible for making every biomolecule in every living cell. DNA replication has an error rate of 1 in 1 billion letters. This astonishing fidelity is the basis of heredity. A single error can be detrimental, but sometimes a mutation leads to increased fitness and falls under the criterion of neo-Darwinian evolution.

Immortalist scientists study replication to defeat or combat the ageing process. Oncology scientists are trying to uncover the conditions whereby replication goes awry to prevent uncontrollable division of cells that form cancerous tumours.

DNA must form a molecular dance with DNA polymerase for replication to even begin. A number of other molecules are also recruited for structural and supporting roles. These molecular minions prise open the intertwined strands and replicate both simultaneously.

DNA replication

📺 Cell replication full video

Francis S. Collins, principal scientist leading the Human Genome Project and author of The Language of God wrote:

DNA, with its phosphate-sugar backbone and intricately arranged organic bases, stacked neatly on top of one another and paired together at each rung of the twisted double helix, seems an utterly improbable molecule to have “just happened” — especially since DNA seems to possess no intrinsic means of copying itself.

So here we are faced with a conundrum. If DNA or RNA came first then what made them? If the enzyme came first then how was it encoded?


Filling in the gaps

Let’s trace back to what we know about Earth’s history:

  • Earth is supposedly ~4.55 billion years old (isotope dating of radioactive elements)
  • Earth was inhospitable for its first 550 million years
  • Rocks dating 4 billion years old show no sign of genetic life forms
  • 3.85 billion years ago, evidence of flourishing microbial life (preserved microorganisms in fossilized rock)
  • What happened in 150 million years that triggered life in the form of single-celled organisms capable of information storage, replication, and evolution?
  • Several hypotheses exist on gene transfer between organisms but not on how the first single-cell organism originated

Creation simulation

Stanley Miller joined Harold Urey’s lab as a PhD student in 1952 to test the Oparin-Haldane hypothesis. It proposed that hydrogen-rich conditions on early Earth combined with methane and water vapour could make organic compounds when exposed to lightning, volcanic heat or radiation.

By applying electrical charge to water (H₂O), methane (CH₄), ammonia (NH₃), and hydrogen (H₂), this landmark experiment found amino acids present in the solution within a week. Urey was so impressed he fully credited Miller in the discovery. This experiment has been repeated many times since to generate more amino acids, sugars and even nucleic acids.

The Urey-Miller experiment is often posited as evidence for abiogenesis. Despite all our efforts, however, we haven’t been able to mix up the right stuff to form life in the lab. Given enough time, is it possible a replicating scaffold with programmed purpose could form?


Panspermia

The structure and nature of DNA’s existence has troubled many scientists. Francis Crick (co-discoverer of DNA) and Leslie Orgel suggested that the molecule may have originated off-planet. Coining the theory Directed Panspermia, they presented their “highly unorthodox proposal” at a 1971 conference organized by Carl Sagan.

Two years later they published an article on directed panspermia. They argued that the universality of genetic code across species is evidence: if life had evolved multiple times independently there would be more variation in genetic codes. The integrity of the code, however, is highly conserved.

As clumps of amino acids have been found on meteorites, this has sparked versions of panspermia where DNA hitchhiked on a meteor and seeded Earth. This also raises the possibility that life may be scattered throughout the universe.

🔗 The origins of directed panspermia


RNA world

Thomas Cech in 1989 was awarded the Nobel Prize for showing that ribozymes, a class of RNA, can catalyze chemical reactions: Self-splicing RNA.

RNA was pinned as the first molecule to carry genetic information and the likely candidate for both chicken and egg, due to its ability to:

  • self-replicate
  • act as an enzyme
  • be converted to DNA through reverse transcription

📺 Video: RNA World

RNA is now widely accepted as the original progenitor molecule.

Critics remain:


Quantum DNA

Erwin Schrödinger in his 1944 book What is Life? predicted DNA’s structure and posited that life’s origin problem would be solved through quantum physics. He classified genes as aperiodic crystals. Crystals have repeated molecular structures with order encoded at the quantum level. The fidelity of replication led him to believe DNA was governed by quantum, not classical, laws.

Is it possible that life emerged directly from the atomic world guided by quantum mechanics?

Both Watson and Crick turned to DNA research after reading Schrödinger’s book. Almost a century later, quantum biology remains a frontier. Collaboration between physics and biology will be key to advance.


Conclusion

If you’ve made it this far, I commend you! Many aspects of life are perplexing and far-out. At the nanoscale, our cellular machinery works with mind-boggling complexity to sustain life.

Whether you are a materialist confident in life’s components guided by nature, or a Promethean dreamer with an imagination stretching across the universe, the concept of DNA and its role in life’s origins gives plenty to chew on.


Sources