Blockchain and the Future of Genomics
Jan 29, 2021
An anonymous programmer under the pseudonym Satoshi Nakamoto published the bitcoin whitepaper on October 31, 2008.
This sparked the adoption of blockchain technology and consensus algorithms, first in fintech, and now across supply chains and beyond.
Not only did Satoshi catalyze cryptocurrency, they founded a movement valuing transparency, privacy, and decentralization above all else.
But how does this tie to genomics?
Genomics companies profit ruthlessly from your DNA data — often reselling it indefinitely, even though you paid them to sequence it.
To understand how blockchain can address this, we must first grasp what blockchain is. I have been trying to get up to speed with this space recently - what is the hype and is it still possible to make money off it.
Topics covered:
- What is blockchain?
- The mainstream genomics data business model.
- The need for privacy in genomics data.
- DNA storage concerns.
- Storing genomics data on a blockchain.
- Who currently occupies the genomics–blockchain space?
What is Blockchain?

At its core, blockchain is a database - but one that is:
- Decentralized: no single owner.
- Immutable: transactions cannot be altered.
- Transparent: a single source of truth, auditable by all.
Think of a massive spreadsheet:
- Each transaction = new row (sender, receiver, ID, timestamp).
- Everyone can view new rows in real time.
- History is permanent and uneditable.
- Rows are grouped into blocks, validated by miners.
- Miners run cryptographic algorithms (proof-of-work), ensuring validity and order.
- Miners are rewarded with tokens (currency).
“Decentralization prevents a single entity from controlling the data; immutability guarantees that data cannot be altered; and security is ensured by protecting accounts with enhanced cryptographic methods.”
— Security and Privacy on Blockchain (2019)
Blockchain is still young. Proof-of-work (Bitcoin’s method) is energy-hungry, and newer algorithms (proof-of-stake, delegated PoS, proof-of-space) trade efficiency for decentralization/security.
The takeaway: you cannot cheat blockchain; all transactions are stored on the ledger.
Mainstream Genomic Data Model
23andMe: the genomic test of time
Companies like 23andMe and Ancestry hold millions of genomes (10M and 18M+ samples).
- ~80% of 23andMe users consent to their data being used - i.e. sold.
- Customers think they’re contributing to research, but their data is resold endlessly.
- Pharma giants (like GSK, $300M buy-in) get privileged access.
Imagine instead an open-source genome database - like PubMed - but for millions of anonymous genomes searchable by disease, traits, or demographics.
The Need for Privacy in Genomics
Data has become the new gold. Even social media alone predicts behaviour frighteningly well.

Genomes reveal addiction risk, mental health predispositions, acne, heart disease, and more.
AI like AlphaFold can now predict protein folding at ~90% accuracy.
Yet giving DNA to corporations is risky:
- DNA + biometrics + wearables = total personal profile.
- DNA fraud could implicate you in crimes (real breaches already happened).
- Insurers could raise premiums.
- Advertisers could hyper-target vulnerabilities.
We need fair, transparent DNA data transfer — balancing ownership, privacy, and research.
DNA Storage Concerns

Genomics generates huge files (FASTQ, BAM).
- Current model: centralized servers (cloud or physical).
- Risks: insider access, breaches, corruption, outages.
- No unified storage standard.
Blockchain could help by:
- Rewarding resource sharing (compute/storage).
- Enabling decentralized data distribution.
- Protecting privacy.
- Promoting collaboration.
Cryptography and Compression
A 2020 study implemented SAMChain, a Python blockchain built on MultiChain.
- Stores DNA in compressed BAM files.
- Metadata (“data streams”) indexed transactions for fast retrieval.
- Helper modules could query BAMs directly.

This proof-of-concept shows how blockchain could underpin a genomic data ecosystem.
Caveat: GDPR requires data deletion, which clashes with blockchain immutability.
Who Occupies the Genomics–Blockchain Space?
Encrypgen
Encrypgen runs Gene-Chain, a decentralized genomics marketplace.
- Founded by ethicist David Koepsell + genomic scientist Dr. Vanessa Gonzalez.
- They famously opposed Myriad Genetics’ BRCA1/2 gene patent.
- Users can upload DNA, sell access, and earn DNA tokens (Ethereum-based).
- Whitepaper: Gene-Chain.
They also founded the Genomic Blockchain Consortium for open standards and democratic governance.
Blockchain logs: Etherscan
Encrypgen summary:
- Decentralized DNA marketplace
- Trnsparent goals + ethical founders
- Active since 2018
- Early-stage, limited data/metadata
Nebula Genomics

Founded by George Church, Nebula sequences genomes (~$300 for 30× WGS) and uses blockchain as a feature.
- Partners with Oasis Labs.
- Offers sophisticated analysis + updates (subscription model).
- Emphasis on corporate blockchain (centralized tendencies).

Nebula summary:
- Technical whitepaper
- Published in journals
- Sequencing + analysis product
- Well-funded, strong backers
- More corporate than open-source
- Blockchain marketed, not core philosophy
Conclusion
Hmmm, not at all nebulous.. ;). Do I trust the guy who’s trying to resurrect the woolly mammoth with my data?
The concept of genome data on blockchain is exciting.
- Encrypgen offers an open, user-controlled marketplace.
- Nebula offers a corporate sequencing service with blockchain as a feature.
The future could see open-source genomic blockchains, balancing privacy, accessibility, and collaboration. And these companies could maybe convince or recruit genome sequencing skeptics to offer forth their precious DNA under the guise of blockchain’s privacy.
That said, while storing genomes on blockchain is appealing in principle - and certainly is buzzy right now - it’s worth asking: does everything really need to live on a blockchain? It seems there already exists more genomic data on platforms like PubMed or NCBI than can be easily analyzed and managed, and in the competitive worlds of science publishing and precision-medicine startups, it might be naive to envision everyone willingly pooling their data in a transparent, ethical, open-source way.
Maybe the solution isn’t blockchain per se, but some form of massive, decentralized genomic database - with an intuitive, timeline-, sample type- and geography-based interface that makes exploring datasets far more user-friendly than today’s repositories. Still, in reality, data is the most valuable asset most companies hold. Until incentives change, they’ll keep capitalizing on it before releasing it for the greater good.
