BioRouter: Decentralized Biodata Routing Protocol

Abstract

We present BioRouter, a decentralized biodata routing protocol that provides a unified, authenticated gateway for all operations on patient-owned genomic and clinical datasets. BioRouter introduces the BioCID (Biological Content Identifier), a deterministic addressing scheme that decouples data identity from physical storage location, enabling secure routing without exposing cloud storage infrastructure. Access control is enforced via a four-tier authorization cascade comprising: (1) cryptographic proof of data ownership via Ethereum signature recovery, (2) explicit BioNFT-gated consent grants with configurable duration and license type, (3) on-chain ERC-721 token holder verification, and (4) x402 micropayment settlement through the BioDataRouter smart contract on the Sequentias Network (Chain ID 15132025), which enforces a 95%/5% revenue split between data owner and protocol. BioRouter also implements Metamorphic Consent—a model in which patient consent transforms from a static one-time permission into an ongoing, revocable, economically-linked relationship. The protocol integrates with ERC-8004 AI agent identity registration, enabling full attribution tracking for AI-driven data access. A production genomics pipeline triggered via upload metadata executes PLINK-based extraction of 91,645 ancestry-informative SNPs followed by ADMIXTURE K=24 supervised analysis against 781 reference individuals spanning 24 global populations. All data is stored in Google Cloud Storage with AES-256 encryption; storage paths are never exposed to clients. The live protocol is deployed at biorouter.genobank.app.

Executive Summary (Non-Technical)

Today, genomic and clinical data is fragmented across laboratories, consumer testing companies, and hospital systems. Patients cannot control who accesses their data, cannot revoke permissions, and receive no compensation when their data contributes to research. BioRouter solves this by acting as a universal data router that patients control cryptographically through their Web3 wallet.

Every file uploaded to BioRouter receives a BioCID—a unique address like Biocid:user/0x5f5a60.../vcf/sample.vcf—that identifies the data without revealing where it is physically stored. Access is granted only when one of four conditions is met: the requester is the owner, the owner has explicitly granted permission, the requester holds the linked NFT, or the requester pays the owner-set price. When researchers pay, 95 cents of every dollar goes directly to the patient. Any consent can be revoked instantly, satisfying GDPR's right to erasure.

BioRouter also automatically runs ancestry analysis when a DTC genotype file is uploaded, delivering results across 24 global populations including 10 indigenous Mexican populations—results that are verifiable, attributed, and owned by the patient.

1. Introduction

The global genomic data ecosystem is characterized by a fundamental misalignment between data production and data ownership. Patients and research participants generate the raw material that drives precision medicine, pharmacogenomics, and population genetics, yet they exercise no meaningful control over how that data is accessed, shared, or monetized. This arrangement has produced a series of well-documented failures: the 2019 MyHeritage breach exposing 92 million records [1], the 2023 23andMe credential-stuffing attack compromising 6.9 million profiles [2], and the March 2025 23andMe bankruptcy that placed 15 million customers' genomic data into a corporate asset pool with no mechanism for patient objection [3].

Existing technical responses to this problem have proven inadequate. Federated learning degrades data quality, erases attribution, and allows model trainers to extract value without compensating data owners. Zero-knowledge proofs, while mathematically elegant, are architecturally incompatible with genomic data: genomics is probabilistic and non-deterministic, and ZK systems require deterministic computation to produce verifiable proofs [4]. Differential privacy introduces calibrated noise that is acceptable for demographic statistics but catastrophic for clinical-grade variant calls where single-base accuracy determines treatment decisions.

BioRouter addresses this gap by inverting the custody model entirely. Instead of data being held by an institution that grants access to patients, patients hold cryptographic keys that grant access to institutions. The data infrastructure—Google Cloud Storage with AES-256 encryption—is a private implementation detail hidden behind the BioCID addressing layer. What patients, researchers, and AI agents interact with is a protocol, not a storage system. Intellectual property registration and commercial licensing are handled through Story Protocol, whose Programmable IP License (PIL) framework enables on-chain license token minting, royalty distribution, and derivative work tracking—extended by GenoBank.io's revocable BioPIL licenses for GDPR-compliant genomic data commerce.

This whitepaper describes BioRouter's design, implementation, and operational characteristics. Section 2 surveys prior work. Section 3 formalizes the BioCID addressing scheme. Section 4 details the system architecture. Section 5 specifies the four-tier authorization cascade. Section 6 documents the API. Sections 7 through 10 cover smart contracts, genomic pipelines, AI agent integration, and the MongoDB data model. Sections 11 through 13 address privacy compliance, comparative analysis, and limitations.

2. Background and Related Work

2.1 Content-Addressable Storage

Content-identifiable addressing, in which a resource's address is derived from its content rather than its location, was formalized in distributed systems literature by Mazieres and Kaashoek [5] and later popularized by IPFS's CID scheme [6]. BioCIDs extend this pattern to biodata by prepending semantic metadata (agent, owner wallet, data type) to the filename, producing addresses that encode provenance without encoding storage location. Unlike IPFS CIDs, BioCIDs do not commit to a specific storage backend, enabling migration between storage systems while preserving persistent identifiers.

2.2 Blockchain-Based Health Data Governance

MedRec [7] demonstrated that Ethereum smart contracts could serve as a decentralized access permission layer for electronic health records. Subsequent systems including Healthureum [8], Mediblock [9], and Coral Health [10] expanded this framework but retained centralized storage, creating a split architecture where access control was decentralized but data remained vulnerable. BioRouter extends this line of work by coupling the access control layer with automated revenue distribution and AI agent identity attestation.

2.3 x402 Micropayments

The HTTP 402 "Payment Required" status code, originally reserved in RFC 7231 [11] and never formally standardized, has been operationalized by the Coinbase x402 protocol [12] as a machine-readable micropayment negotiation mechanism. BioRouter adopts x402 semantics: when an unauthorized requester attempts to download data, the server returns HTTP 402 with a structured body containing the owner-set price and the BioDataRouter contract address. The requester then settles payment on-chain, and the contract's hasAccess(agentId, wallet) function confirms access.

2.4 ERC-8004 AI Agent Identity

ERC-8004 [13] defines a standard interface for registering AI agents as on-chain identity principals capable of signing transactions and attesting to their capabilities. GenoclawIdentityRegistry, deployed at 0xcBc813e733692794660dEC4AbB2ADd515a9F3D18 on Sequentias Network, implements ERC-8004 with extensions for bioinformatic agent classification. Registered agents receive a deterministic identifier of the form gc-{first8hex} that is embedded in every BioCID they generate, enabling complete attribution tracking through the audit log.

2.5 Story Protocol and Programmable IP Licensing

Story Protocol [15a] establishes an on-chain infrastructure for registering intellectual property as composable, programmable assets. Each IP Asset receives a unique identifier and can have license terms attached via the Programmable IP License (PIL) framework. BioRouter integrates with Story Protocol at two levels: (1) genomic files uploaded through BioRouter may be simultaneously registered as Story Protocol IP Assets, enabling commercial licensing through standard PIL templates; and (2) the four-tier authorization model (Section 5) checks Story Protocol license token ownership as a valid access credential. GenoBank.io extends Story Protocol's standard PIL templates (non-commercial #1, commercial #2, exclusive #3, public-good #4) with five genomic-specific BioPIL licenses: GDPR-research #5, AI-training #6, clinical-use #7, pharma-research #8, and family-inheritance #9. A critical distinction is that Story Protocol licenses are permanent by design, whereas BioPIL licenses are revocable—a requirement for GDPR Article 17 compliance and the foundation of Metamorphic Consent.

Known BioNFT collections registered on Story Protocol include:

0x5021F7438ea502b0c346cB59F8E92B749Ecd74B5 — VCF Ownership
0x19A615224D03487AaDdC43e4520F9D83923d9512 — VCF Collection
0xB8d03f2E1C02e4cC5b5fe1613c575c01BDD12269 — VCF Annotation
0x88Ed5b47ea8f609Ee14ac60968C3f76f9138a171 — AlphaGenome
0x7fB09610594a2952144B5cADbD47972684dEfA86 — Ancestry
0xdaB93b0D7f01C9D7ffe33afcDc3518E8d6DE7Be1 — Newborn/Trio

2.6 ADMIXTURE and Population Genetics Pipelines

ADMIXTURE [14] uses a maximum likelihood model to decompose individual genotype data into K population components. At K=24, the Somos pipeline resolves ancestry proportions across 781 reference individuals representing major continental populations and 10 indigenous Mexican populations (Maya, Pima, Zapoteca, Huichol, Mixteca, Nahua-Otomi, Tarahumara, Triqui, Andes, Amazonas). The 91,645-SNP panel was selected for high information content across this reference set while maintaining compatibility with major DTC genotyping platforms (23andMe v3/v4/v5, AncestryDNA, Illumina GSA).

3. BioCID Addressing Scheme

3.1 Format Specification

A BioCID is a structured string identifier with the following grammar:

BioCID    := "Biocid:" bioagent "/" owner_biowallet "/" biodata_type "/" dataset
bioagent  := "user" | "gc-" HEX{8}
owner_biowallet := CHECKSUMMED_ETH_ADDRESS   ; EIP-55 checksum
biodata_type    := dtc-type | sequence-type | clinical-type | derived-type
dtc-type        := "dtc-genotype"
sequence-type   := "vcf" | "bam" | "fastq"
clinical-type   := "fhir" | "clinical-report"
derived-type    := "ancestry-result" | "sqlite"
dataset         := FILENAME   ; original filename, no path components

The bioagent field distinguishes uploads performed by a human user via direct upload (user) from those performed by an ERC-8004 registered AI agent (gc-{first8hex}). This distinction is critical for audit trail integrity: when GenoClaw or another registered agent stores a derived dataset (e.g., an ancestry result), the agent's identity is permanently encoded in the BioCID.

3.2 Example BioCIDs

Scenario	BioCID
Patient uploads 23andMe file	`Biocid:user/0x5f5a60EaEf242c0D51A21c703f520347b96Ed19a/dtc-genotype/DTC-FILE-GHAn0029.txt`
Patient uploads clinical VCF	`Biocid:user/0x5f5a60EaEf242c0D51A21c703f520347b96Ed19a/vcf/aligned_cleaned_Invitae.deepvariant.vcf`
GenoClaw stores ancestry result	`Biocid:gc-b96ed19a/0x5f5a60EaEf242c0D51A21c703f520347b96Ed19a/ancestry-result/somos_k24_results.json`
GenoClaw stores analysis DB	`Biocid:gc-b96ed19a/0x5f5a60EaEf242c0D51A21c703f520347b96Ed19a/sqlite/health_analysis_2026.db`
Patient uploads raw FASTQ	`Biocid:user/0xB3C3a584F9a5A77Ed84EBf2c8E66E8e8c1C2D3A4/fastq/WGS_sample_R1.fastq.gz`

3.3 Design Properties

BioCIDs satisfy the following properties by construction:

Storage Independence. The BioCID does not encode a GCS bucket, object path, or storage backend. The mapping from BioCID to physical storage is maintained exclusively in the biocid_registry MongoDB collection and is never returned to API clients.
Owner Attribution. The checksummed wallet address in position 2 is the canonical data owner. Ownership can be audited without querying any database by parsing the BioCID string.
Agent Attribution. The bioagent field enables attribution of AI-generated derived datasets to the specific registered agent, satisfying data lineage requirements in clinical contexts.
Collision Resistance. Two files with identical names uploaded by the same wallet but with different content are differentiated by the SHA-256 hash stored in the registry; the BioCID is supplemented by the hash in duplicate detection logic.
Human Readability. BioCIDs are designed to be readable and parseable without a resolver, contrasting with opaque hash-based CIDs.

4. System Architecture

4.1 Infrastructure Overview

BioRouter is implemented as a CherryPy Python 3.12 application serving on port 8095, deployed on GenoBank's GCS production server behind a Cloudflare proxy. The service connects to three external systems: MongoDB for registry and audit data, Google Cloud Storage for encrypted file storage, and the Sequentias Network EVM node for blockchain queries and transaction execution.

Figure 1 — BioRouter System Architecture


  ┌─────────────────────────────────────────────────────────────────────┐
  │  CLIENT LAYER                                                        │
  │                                                                      │
  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────────┐  │
  │  │   Patient    │  │  Researcher  │  │   GenoClaw AI Agent      │  │
  │  │  BioWallet   │  │   Web App    │  │   (ERC-8004 identity)    │  │
  │  └──────┬───────┘  └──────┬───────┘  └────────────┬─────────────┘  │
  └─────────┼─────────────────┼────────────────────────┼────────────────┘
            │                 │                         │
            │   HTTPS + user_signature (Web3 signature) │
            ▼                 ▼                         ▼
  ┌─────────────────────────────────────────────────────────────────────┐
  │  CLOUDFLARE EDGE (DDoS protection, TLS termination)                 │
  └──────────────────────────────┬──────────────────────────────────────┘
                                 │
                                 ▼
  ┌─────────────────────────────────────────────────────────────────────┐
  │  BIOROUTER SERVICE (CherryPy, Python 3.12, port 8095)               │
  │  https://biorouter.genobank.app                                      │
  │                                                                      │
  │  ┌──────────────────────────────────────────────────────────────┐   │
  │  │  AUTH LAYER                                                   │   │
  │  │  recover_wallet(user_signature) → checksummed wallet address  │   │
  │  │  Four-Tier Authorization Cascade (§5)                         │   │
  │  └──────────────────────────────────────────────────────────────┘   │
  │                                                                      │
  │  ┌────────────┐  ┌────────────┐  ┌─────────────┐  ┌─────────────┐  │
  │  │  /upload   │  │ /download  │  │   /stream   │  │  /grant     │  │
  │  │  /list     │  │ /resolve   │  │   /consents │  │  /revoke    │  │
  │  │  /set_price│  │ /pipeline  │  │             │  │             │  │
  │  └────────────┘  └────────────┘  └─────────────┘  └─────────────┘  │
  └──────────────┬───────────────────────────┬───────────────────────────┘
                 │                           │
        ┌────────┴────────┐       ┌──────────┴──────────┐
        ▼                 ▼       ▼                      ▼
  ┌──────────┐   ┌──────────────┐ ┌──────────────┐  ┌───────────────┐
  │  MongoDB │   │    Google    │ │  Sequentias  │  │  Story Protocol│
  │          │   │    Cloud     │ │  Network     │  │  (IP Assets)   │
  │ registry │   │   Storage   │ │  (EVM, RPC)  │  │                │
  │ consents │   │  (AES-256)  │ │              │  │  PIL + BioPIL  │
  │ audit_log│   │             │ │  BioDataRouter│  │  License Terms │
  └──────────┘   │ Bucket:     │ │  0x678d668E  │  │  NFT Ownership │
                 │ genobank-   │ │  IdentityReg │  └───────────────┘
                 │ biorouter   │ │  0xcBc813e7  │
                 └──────────────┘ └──────────────┘  ┌───────────────┐
                                                    │  Somos Ancestry│
  Storage Hierarchy (never exposed to clients):     │  Pipeline      │
  gs://genobank-biorouter/                          │  (PLINK +      │
    biorouter/{wallet}/{type}/{uid}/{file}           │  ADMIXTURE)    │
                                                    └───────────────┘

4.2 Signature-Based Authentication

Every API request is authenticated by a user_signature parameter. The signature is the ECDSA result of signing the canonical message "I want to proceed" with the client's private key. The server calls recover_wallet(user_signature) (equivalent to Ethereum's ecrecover) to derive the checksummed wallet address. This address becomes the authenticated principal for the request. No session tokens, API keys, or passwords are required.

Design Principle

Wallet signatures serve as both authentication and identity. A valid signature over "I want to proceed" proves private key possession without transmitting the key. The same cryptographic proof that authenticates an API call also establishes ownership of all BioCIDs in that wallet's namespace.

4.3 Storage Isolation and Indexation Hierarchy

All biofiles are stored in a dedicated GCS bucket (gs://genobank-biorouter) using a wallet-indexed hierarchy that is never exposed to clients. The internal path structure is:

gs://genobank-biorouter/
  biorouter/{owner_wallet}/{biodata_type}/{uid}/{filename}

Example:
  gs://genobank-biorouter/
    biorouter/0x5f5a60eaef242c0d51a21c703f520347b96ed19a/bam/dfdbeadcd2bb/DNA_SAMPLE-001.bam
    biorouter/0x5f5a60eaef242c0d51a21c703f520347b96ed19a/vcf/a8f3e2d1bc0f/DNA_SAMPLE-001.vcf
    biorouter/0x5f5a60eaef242c0d51a21c703f520347b96ed19a/fastq/7c2f91de34ab/DNA_SAMPLE-001_S26.R1.fastq.gz
    biorouter/0x5f5a60eaef242c0d51a21c703f520347b96ed19a/dtc-genotype/cea846287d5f/DTC-FILE-GHAn0029.txt

The owner wallet address is the primary index for all data in the BioRouter protocol. Every file belongs to exactly one wallet, and that wallet's checksummed address forms the root of its storage namespace. The {uid} segment is a random UUID suffix that prevents path collisions when the same filename is uploaded multiple times.

This hierarchy maps directly to BioCID addressing:

Internal path:  biorouter/{wallet}/{type}/{uid}/{file}
BioCID:         Biocid:{bioagent}/{wallet}/{type}/{file}

Example:
  GCS path  → biorouter/0x5f5a60...d19a/bam/dfdbeadcd2bb/DNA_SAMPLE-001.bam    (HIDDEN)
  BioCID    → Biocid:gc-b96ed19a/0x5f5a60EaEf242c0D51A21c703f520347b96Ed19a/bam/DNA_SAMPLE-001.bam    (PUBLIC)

GCS bucket names and object paths are stored exclusively in the gcs_bucket and gcs_path fields of biocid_registry. These fields are marked internal and are stripped from all API responses. Clients interact exclusively with BioCIDs. Even in the event of a MongoDB registry leak, the data itself remains inaccessible without valid GCS credentials; conversely, knowledge of the GCS path does not help an attacker without passing the four-tier authorization check, because the BioRouter service acts as the sole authenticated intermediary.

5. Four-Tier Authorization Model

Every download and stream request passes through the following authorization cascade. Tiers are evaluated in order; the first passing tier grants access. If all four tiers fail, the server returns HTTP 402 Payment Required with the owner-set price and the BioDataRouter contract address.

Figure 2 — Four-Tier Authorization Cascade


  Request: GET /api_biorouter/download?biocid=X&user_signature=Y
                          │
                          ▼
           ┌──────────────────────────────┐
           │  recover_wallet(Y)           │
           │  → authenticated_wallet = W  │
           └──────────────┬───────────────┘
                          │
                          ▼
  ┌─────────────────────────────────────────────────────────────┐
  │  TIER 1: Data Owner Check                                    │
  │  biocid_registry.owner_wallet == W ?                         │
  └──────────────────────────┬──────────────────────────────────┘
                    YES ─────┤
                             │ NO
                             ▼
  ┌─────────────────────────────────────────────────────────────┐
  │  TIER 2: BioNFT Consent Check (MongoDB)                      │
  │  biocid_consents WHERE biocid=X AND permittee=W              │
  │               AND status="active"                            │
  │               AND expires_at > NOW()  ?                      │
  └──────────────────────────┬──────────────────────────────────┘
                    YES ─────┤
                             │ NO
                             ▼
  ┌─────────────────────────────────────────────────────────────┐
  │  TIER 3: On-Chain NFT Token Holder Check                     │
  │  IF biocid_registry.bionft_token_id IS SET:                  │
  │    ERC-721.ownerOf(token_id) == W ?                          │
  └──────────────────────────┬──────────────────────────────────┘
                    YES ─────┤
                             │ NO
                             ▼
  ┌─────────────────────────────────────────────────────────────┐
  │  TIER 4: x402 On-Chain Payment Check                         │
  │  BioDataRouter.hasAccess(agent_id, W) == true ?              │
  └──────────────────────────┬──────────────────────────────────┘
                    YES ─────┤
                             │ NO
                             ▼
                    ┌────────────────┐
                    │ HTTP 402       │
                    │ Payment Req.   │
                    │ price_wei: X   │
                    │ contract: 0x.. │
                    └────────────────┘
                    ACCESS GRANTED
                    Stream from GCS

Data Owner

Cryptographic proof via signature recovery. The wallet that uploaded the file always retains full access with no expiration.

BioNFT Consent

Owner-granted permittee access via POST /grant. Configurable duration, license type, and revocable at any time.

NFT Token Holder

On-chain ERC-721 ownership query. Whoever holds the BioNFT linked to the BioCID is granted access, enabling transferable permissions.

x402 Payment

BioDataRouter on-chain settlement. 95% to patient, 5% to protocol, enforced by smart contract immutably.

5.1 Tier 1: Data Owner Authentication

Ownership is established at upload time: the wallet address recovered from the user_signature parameter on the POST /upload call becomes the owner_wallet field of the biocid_registry document. On subsequent download requests, the recovered wallet is compared against owner_wallet. A match grants unconditional access. No expiration applies to owner access.

5.2 Tier 2: Metamorphic Consent via BioNFT Grants

The canonical mechanism for granting access to a specific permittee is POST /api_biorouter/grant. This call, which must be authenticated by the data owner, creates a document in biocid_consents containing the permittee wallet, license type, duration, and expiry timestamp.

Consent under this model is Metamorphic in a specific technical sense: it begins as a static permission record and may evolve into an economically-linked relationship when combined with x402 payments and Biodata Dividends. The consent record is not immutable. The owner may:

Revoke a specific permittee at any time via POST /revoke_permittee, setting status = "revoked"
Revoke all consents for a BioCID via POST /revoke (GDPR Article 17 right to erasure)
Inspect all active consent grants via GET /consents

Auto-expiry is enforced at query time: if the current timestamp exceeds expires_at, the consent record's effective status is treated as "expired" without requiring a database write, preventing stale access grants from persisting after the consent window closes.

Metamorphic Consent Defined

Traditional consent models are binary and static: either consent is given or it is not. Metamorphic Consent recognizes that consent over high-value data is an ongoing relationship with economic, temporal, and scope dimensions. A patient may consent to research use for 365 days, revoke after 180 days upon learning the research was commercialized without notification, or expand consent to include AI training in exchange for a Biodata Dividend. The consent record is a living contract, not a checkbox.

5.3 Tier 3: On-Chain BioNFT and Story Protocol License Verification

Tier 3 checks two on-chain credential types:

3a. BioNFT Token Holder (Sequentias Network). When a BioNFT (ERC-721) is linked to a biocid_registry record (via the optional bionft_token_id field), the authorization layer queries the Sequentias Network: ERC721.ownerOf(token_id). If the result matches the requesting wallet, access is granted. This tier enables transferable data access without requiring the original owner to re-grant consent. A patient may mint a BioNFT representing read access to their whole-genome sequence and transfer it to a research institution. The institution's wallet becomes the NFT holder and gains access. If the patient revokes by burning the NFT or reclaiming it, access is lost at the next query without any database update.

3b. Story Protocol License Token Holder. If the BioCID's underlying data has been registered as a Story Protocol IP Asset, the authorization layer checks whether the requesting wallet holds a valid License Token for that IP Asset. Story Protocol's PIL (Programmable IP License) framework enforces license terms on-chain: commercial use, derivative works, attribution requirements, and royalty splits are encoded in the license and verified automatically. GenoBank.io extends Story Protocol's four standard PIL templates with five genomic-specific BioPIL licenses:

BioPIL ID	License Type	Revocable	Use Case
#5	GDPR Research	Yes	Academic research under GDPR consent
#6	AI Training	Yes	Model training with attribution + dividends
#7	Clinical Use	Yes	Hospital/clinician access for patient care
#8	Pharma Research	Yes	Drug discovery with revenue sharing
#9	Family Inheritance	No	Hereditary access for family members

The critical difference: Story Protocol's standard licenses are permanent, while BioPIL licenses are revocable. This distinction is enforced at the BioRouter layer—even if a Story Protocol license token exists, the BioRouter will block access if the corresponding biocid_consents record has been revoked by the data owner. Story Protocol provides the commercial licensing infrastructure; BioRouter provides the consent enforcement layer.

5.4 Tier 4: x402 Micropayment Authorization

For requesters without an active consent grant or NFT, BioRouter implements the x402 payment protocol. The data owner sets a price in wei via POST /set_price, stored in biocid_registry.price_wei. When an unauthorized requester attempts download, the server returns:

HTTP/1.1 402 Payment Required
Content-Type: application/json

{
  "error": "payment_required",
  "biocid": "Biocid:user/0x5f5a60.../vcf/sample.vcf",
  "price_wei": "50000000000000000",
  "price_eth": "0.05",
  "contract": "0x678d668ECAB612390bF60F6eB04d9e9f5398f2F3",
  "chain_id": 15132025,
  "revenue_split": { "patient_pct": 95, "protocol_pct": 5 }
}

The requester calls BioDataRouter.pay(biocid_hash) on Sequentias Network. The contract distributes 95% to the owner wallet and 5% to the protocol treasury, then records the payment against the (agent_id, wallet) pair. On the next API call, Tier 4 calls BioDataRouter.hasAccess(agent_id, wallet), which returns true and access is granted.

6. API Reference

All endpoints are served under the base path https://biorouter.genobank.app/api_biorouter/. Authentication is via the user_signature parameter on all calls.

POST /api_biorouter/upload

Upload a biofile and receive its BioCID. GCS path is never returned.

# Parameters (multipart/form-data)
user_signature   : string  (required) — ECDSA signature of "I want to proceed"
file             : binary  (required) — file content
biodata_type     : string  (optional) — override auto-detection
agent_signature  : string  (optional) — ERC-8004 agent signature for gc- prefix
pipeline_trigger : string  (optional) — "ancestry" triggers Somos K=24 pipeline

# Response 200
{
  "biocid":        "Biocid:user/0x5f5a60.../dtc-genotype/DTC-FILE-GHAn0029.txt",
  "biodata_type":  "dtc-genotype",
  "file_hash":     "sha256:1f8eab4c...",
  "file_size":     44644945,
  "pipeline":      {
    "status":     "queued",
    "job_id":     "anc_cea846287d5f",
    "trigger":    "ancestry"
  }
}

# Auto-detected biodata_type values
# 23andMe raw data     → dtc-genotype
# AncestryDNA raw data → dtc-genotype
# VCF (.vcf, .vcf.gz)  → vcf
# BAM/CRAM             → bam
# FASTQ (.fastq.gz)    → fastq
# FHIR JSON            → fhir
# SQLite database      → sqlite

GET /api_biorouter/download

Download a file by BioCID. Passes four-tier authorization. Returns file bytes with X-BioCID header.

# Parameters (query string)
user_signature : string  (required)
biocid         : string  (required)

# Response 200 (authorized)
Content-Disposition: attachment; filename="DTC-FILE-GHAn0029.txt"
X-BioCID: Biocid:user/0x5f5a60.../dtc-genotype/DTC-FILE-GHAn0029.txt
[binary file content]

# Response 402 (unauthorized, payment required)
{
  "error":          "payment_required",
  "price_wei":      "50000000000000000",
  "contract":       "0x678d668ECAB612390bF60F6eB04d9e9f5398f2F3",
  "chain_id":       15132025
}

GET /api_biorouter/stream

Stream large files (BAM, FASTQ, WGS VCF) with HTTP Range header support for chunked access. Identical authorization to /download.

# Parameters
user_signature : string  (required)
biocid         : string  (required)
Range          : header  (optional) — e.g., "bytes=0-1048575"

# Response 206 Partial Content (range request)
Content-Range: bytes 0-1048575/104857600
Content-Length: 1048576
[binary chunk]

# Example: stream first 50MB of a WGS BAM
curl -H "Range: bytes=0-52428799" \
  "https://biorouter.genobank.app/api_biorouter/stream?biocid=Biocid:user/0x...&user_signature=0x..."

# Use case: IGV.js can stream BAM files via this endpoint without full download
# Use case: GATK streaming access for variant calling on specific genomic regions

GET /api_biorouter/list

List the authenticated wallet's biofiles as BioCIDs. GCS paths are stripped.

# Parameters
user_signature : string  (required)
biodata_type   : string  (optional) — filter by type (e.g., "dtc-genotype")

# Response 200
{
  "wallet":   "0x5f5a60EaEf242c0D51A21c703f520347b96Ed19a",
  "count":    3,
  "biofiles": [
    {
      "biocid":        "Biocid:user/0x5f5a60.../dtc-genotype/DTC-FILE-GHAn0029.txt",
      "biodata_type":  "dtc-genotype",
      "file_size":     44644945,
      "created_at":    "2026-03-24T06:15:00Z",
      "pipeline_status": "completed",
      "access_count":  3
    }
  ]
}

POST /api_biorouter/grant

Grant a permittee wallet access to a BioCID. Must be called by the data owner.

# Parameters (JSON body)
{
  "user_signature":   "0x...",         // owner must sign
  "biocid":           "Biocid:user/0x5f5a60.../vcf/sample.vcf",
  "permittee_wallet": "0xB3C3a584F9a5A77Ed84EBf2c8E66E8e8c1C2D3A4",
  "license_type":     "research",      // research | clinical | ai-training
  "duration_days":    365
}

# Response 200
{
  "consent_id":     "64f3a1b2c8d9e0f1a2b3c4d5",
  "granted_at":     "2026-03-24T06:29:26Z",
  "expires_at":     "2027-03-24T06:29:26Z",
  "status":         "active"
}

POST /api_biorouter/revoke

Revoke ALL consent for a BioCID. GDPR Article 17 right to erasure. Owner-only.

# Parameters (JSON body)
{
  "user_signature": "0x...",
  "biocid":         "Biocid:user/0x5f5a60.../vcf/sample.vcf"
}

# Response 200
{
  "revoked_consents": 3,
  "biocid_status":    "revoked",
  "gdpr_compliant":   true
}

POST /api_biorouter/revoke_permittee

Revoke a specific permittee's access. Owner-only. Does not affect other permittees.

# Parameters (JSON body)
{
  "user_signature":   "0x...",
  "biocid":           "Biocid:user/0x5f5a60.../vcf/sample.vcf",
  "permittee_wallet": "0xB3C3a584F9a5A77Ed84EBf2c8E66E8e8c1C2D3A4"
}

# Response 200
{
  "revoked":        true,
  "permittee":      "0xB3C3a584F9a5A77Ed84EBf2c8E66E8e8c1C2D3A4",
  "previous_status": "active"
}

POST /api_biorouter/set_price

Owner sets the x402 access price for a BioCID. Stored in biocid_registry and returned in 402 responses.

# Parameters (JSON body)
{
  "user_signature": "0x...",
  "biocid":         "Biocid:user/0x5f5a60.../vcf/sample.vcf",
  "price_wei":      "50000000000000000"  // 0.05 ETH
}

# Response 200
{
  "price_wei":  "50000000000000000",
  "price_eth":  "0.05",
  "effective":  true
}

GET /api_biorouter/pipeline_status

Poll the status of a genomic pipeline triggered at upload time.

# Parameters
user_signature : string  (required)
biocid         : string  (required)

# Response 200
{
  "biocid":       "Biocid:user/0x5f5a60.../dtc-genotype/DTC-FILE-GHAn0029.txt",
  "pipeline":     "ancestry",
  "job_id":       "anc_cea846287d5f",
  "status":       "completed",       // queued | running | completed | failed
  "started_at":   "2026-03-24T06:16:00Z",
  "completed_at": "2026-03-24T06:28:43Z",
  "duration_s":   763,
  "result_biocid": "Biocid:gc-b96ed19a/0x5f5a60.../ancestry-result/somos_k24_results.json"
}

7. Smart Contracts (Sequentias Network)

Two smart contracts on the Sequentias Network (Chain ID 15132025) underpin BioRouter's on-chain authorization and identity layers.

Contract	Address	Standard	Purpose
BioDataRouter	`0x678d668ECAB612390bF60F6eB04d9e9f5398f2F3`	Custom + x402	x402 payment processing, 95/5 revenue split enforcement, `hasAccess` query
GenoclawIdentityRegistry	`0xcBc813e733692794660dEC4AbB2ADd515a9F3D18`	ERC-721 + ERC-8004	AI agent identity registration, `gc-{first8hex}` identifier generation

7.1 BioDataRouter Contract

The BioDataRouter contract implements two primary interfaces relevant to BioRouter API operation:

// SPDX-License-Identifier: MIT
// Sequentias Network — Chain ID 15132025
// Address: 0x678d668ECAB612390bF60F6eB04d9e9f5398f2F3

interface IBioDataRouter {

    // Called by unauthorized researcher to pay for access
    // msg.value is split: 95% to owner wallet, 5% to protocol treasury
    function pay(bytes32 biocidHash) external payable;

    // Returns true if (agentId, wallet) pair has a valid payment on record
    function hasAccess(bytes32 agentId, address wallet)
        external view returns (bool);

    // Returns the owner wallet for a given biocid hash
    function getOwner(bytes32 biocidHash)
        external view returns (address);

    // Emitted on each payment
    event PaymentReceived(
        bytes32 indexed biocidHash,
        address indexed payer,
        uint256 patientAmount,
        uint256 protocolAmount
    );
}

The 95%/5% revenue split is enforced by the contract's pay() function and cannot be altered by either party unilaterally. The split is not a policy choice that can be overridden by server configuration; it is a blockchain invariant.

7.2 GenoclawIdentityRegistry Contract

The identity registry extends ERC-721 with ERC-8004 agent attestation. Each registered AI agent is minted an NFT with a deterministic ID derived from the first 8 hex characters of its registration transaction hash. This ID becomes the bioagent prefix in any BioCID the agent generates.

interface IGenoclawIdentityRegistry {

    // Register an AI agent; emits AgentRegistered with the gc-{first8hex} id
    function registerAgent(
        address agentWallet,
        string calldata agentName,
        string calldata agentVersion,
        string calldata capabilities   // JSON-encoded capability list
    ) external returns (uint256 tokenId, string memory agentId);

    // Verify an agent signature for a given message
    function verifyAgentSignature(
        string calldata agentId,
        bytes32 messageHash,
        bytes calldata signature
    ) external view returns (bool);

    event AgentRegistered(
        uint256 indexed tokenId,
        string  agentId,
        address agentWallet
    );
}

8. Somos Ancestry Pipeline Integration

When a DTC genotype file is uploaded with the query parameter pipeline_trigger=ancestry, BioRouter enqueues a Somos ancestry analysis job. The pipeline is the same production implementation used by somosdao.io and has been validated bit-perfect against production results (all 24 populations match to 6 decimal places across a test cohort of 50 individuals).

8.1 Pipeline Stages

Figure 3 — Somos K=24 Pipeline


  UPLOAD: dtc-genotype file (23andMe, AncestryDNA, VCF, DTC format)
         │
         ▼
  Stage 1: genotype2ped conversion
           Auto-detect format by file header signature
           Output: PLINK .ped + .map files
         │
         ▼
  Stage 2: PLINK —extract 91,645 ancestry-informative SNPs
           SNP panel selected for maximum informativeness across
           24-population reference set; compatible with all major
           DTC genotyping arrays (Illumina GSA, OmniExpress)
         │
         ▼
  Stage 3: PLINK —merge with reference panel
           781 reference individuals
           24 populations (continental + 10 indigenous Mexican)
           Resolves strand flips and ambiguous SNPs automatically
         │
         ▼
  Stage 4: ADMIXTURE K=24 supervised (~12 minutes runtime)
           Maximum likelihood decomposition into 24 components
           Reference individuals' population assignments fixed
           Query individual's proportions estimated freely
         │
         ▼
  Stage 5: Parse .Q file
           Position 437 in sorted FAM file (FAM001/ID001 after merge)
           Extract 24 floating-point admixture proportions
         │
         ▼
  Stage 6: Store result as new BioCID
           bioagent:  gc-{GenoClaw agent id}
           type:      ancestry-result
           Stored in GCS, registered in biocid_registry
           pipeline_status → "completed"

8.2 Reference Population Panel

Code	Population	Region
AFR_ESTE	East African	Africa
AFR_NORTE	North African	Africa
AFR_OESTE	West African	Africa
AMAZONAS	Amazonian Indigenous	South America
ANDES	Andean Indigenous	South America
ASIA_ESTE	East Asian	Asia
ASIA_SUR	South Asian	Asia
ASIA_SURESTE	Southeast Asian	Asia
EUR_ESTE	Eastern European	Europe
EUR_NORESTE	Northeastern European	Europe
EUR_NORTE	Northern European	Europe
EUR_OESTE	Western European	Europe
EUR_SUROESTE	Southwestern European	Europe
JUDIO	Sephardic/Ashkenazi Jewish	Middle East / Europe
MAYA	Maya	Mexico / Mesoamerica
MEDIO_ORIENTE	Middle Eastern	Middle East
OCEANIA	Oceanic	Pacific
PIMA	Pima	Mexico / Southwestern USA
ZAPOTECA	Zapotec	Mexico (Oaxaca)
HUICHOL	Huichol (Wixaritari)	Mexico (Jalisco/Nayarit)
MIXTECA	Mixtec	Mexico (Oaxaca/Guerrero)
NAHUA_OTOMI	Nahua-Otomi	Mexico (Central)
TARAHUMARA	Tarahumara (Rarámuri)	Mexico (Chihuahua)
TRIQUI	Triqui	Mexico (Oaxaca)

Validation Result

Pipeline output was validated against 50 production Somos results. All 24 population proportions matched to 6 decimal places (mean absolute deviation < 1×10⁻⁶), confirming bit-perfect determinism between BioRouter-triggered and standalone pipeline runs.

9. GenoClaw AI Agent Integration

GenoClaw is GenoBank's patient-owned AI health agent deployed on NVIDIA NemoClaw infrastructure. It is registered in the GenoclawIdentityRegistry as an ERC-8004 principal, enabling it to act as an authorized data agent on behalf of patients who have explicitly delegated access. BioRouter serves as GenoClaw's persistent data layer.

9.1 Ancestry Query Workflow

Figure 4 — GenoClaw Ancestry Query via BioRouter


  User → GenoClaw: "What is my ancestry?"
         │
         ▼
  1. GenoClaw checks Somos DAO cache for prior result
         │ (cache miss)
         ▼
  2. GenoClaw → GET /api_biorouter/list?biodata_type=ancestry-result&user_signature=X
         │
         │ (no ancestry-result found)
         ▼
  3. GenoClaw → GET /api_biorouter/list?biodata_type=dtc-genotype&user_signature=X
         │
         │ (dtc-genotype found: Biocid:user/0x5f5a60.../dtc-genotype/DTC-FILE-GHAn0029.txt)
         ▼
  4. GenoClaw checks pipeline_status for existing job
         │ (no prior job)
         ▼
  5. GenoClaw instructs user to upload OR
     GenoClaw calls POST /upload with pipeline_trigger=ancestry
         │
         ▼
  6. Pipeline enqueued, job_id = "anc_cea846287d5f"
         │
         ▼
  7. GenoClaw polls GET /pipeline_status?biocid=X every 60 seconds
         │ (~12 minutes)
         ▼
  8. status = "completed"
     result_biocid = "Biocid:gc-b96ed19a/0x5f5a60.../ancestry-result/somos_k24_results.json"
         │
         ▼
  9. GenoClaw → GET /api_biorouter/download?biocid=result_biocid&user_signature=X
         (Tier 1 passes: gc-b96ed19a is the agent, user wallet is the owner)
         │
         ▼
  10. GenoClaw parses 24-population proportions, formats narrative
      GenoClaw → User: "You are 42.3% Western European, 31.7% Maya, ..."

9.2 Agent Attribution in BioCIDs

When GenoClaw stores a derived dataset (ancestry result, clinical summary, pharmacogenomics report), the resulting BioCID encodes the agent's ERC-8004 identifier in the bioagent field. This creates a permanent, auditable chain of attribution: the raw DTC genotype file is owned by the patient (bioagent = user), while the derived ancestry result is attributed to the AI agent (bioagent = gc-b96ed19a) that generated it, on behalf of the same patient (owner_biowallet is unchanged).

This attribution model satisfies emerging AI transparency requirements under the EU AI Act [15] and enables auditors to trace any analytical output back to its source data and the AI system that produced it, without requiring the auditor to have access to either the source data or the model weights.

10. MongoDB Data Model

10.1 biocid_registry

Primary registry for all stored biofiles. Indexed on biocid (unique), owner_wallet, file_hash, and biodata_type.

{
  "biocid":           "Biocid:user/0x5f5a60EaEf242c0D51A21c703f520347b96Ed19a/dtc-genotype/DTC-FILE-GHAn0029.txt",
  "owner_wallet":     "0x5f5a60EaEf242c0D51A21c703f520347b96Ed19a",   // EIP-55 checksum
  "bioagent":         "user",                                           // or "gc-{8hex}"
  "biodata_type":     "dtc-genotype",
  "dataset":          "DTC-FILE-GHAn0029.txt",
  "file_hash":        "sha256:1f8eab4c...",                            // SHA-256 hex
  "file_size":        44644945,
  "gcs_bucket":       "INTERNAL — NEVER RETURNED TO CLIENTS",
  "gcs_path":         "INTERNAL — NEVER RETURNED TO CLIENTS",
  "bionft_token_id":  null,                                            // ERC-721 token ID if linked
  "consent_status":   "active",
  "price_wei":        "50000000000000000",
  "pipeline_trigger": "ancestry",
  "pipeline_job_id":  "anc_cea846287d5f",
  "pipeline_status":  "completed",
  "created_at":       "2026-03-24T06:15:00Z",
  "access_count":     3
}

10.2 biocid_consents

One document per consent grant. Owner-permittee pairs are not unique; multiple grants may exist for the same pair with different license types or durations. Expiry is checked at query time by comparing expires_at to the current UTC timestamp; no scheduled job is required to update status to "expired".

{
  "_id":              "64f3a1b2c8d9e0f1a2b3c4d5",
  "biocid":           "Biocid:user/0x5f5a60.../dtc-genotype/DTC-FILE-GHAn0029.txt",
  "owner_wallet":     "0x5f5a60EaEf242c0D51A21c703f520347b96Ed19a",
  "permittee_wallet": "0xB3C3a584F9a5A77Ed84EBf2c8E66E8e8c1C2D3A4",
  "license_type":     "research",    // research | clinical | ai-training
  "duration_days":    365,
  "granted_at":       "2026-03-24T06:29:26Z",
  "expires_at":       "2027-03-24T06:29:26Z",
  "status":           "active"       // active | revoked | expired (logical)
}

10.3 biocid_access_log

Immutable append-only audit log. Every operation that reads or modifies a BioCID record writes a log entry. This includes uploads, downloads (whether authorized or denied), stream requests, grant events, revocation events, and pipeline triggers.

{
  "timestamp":    "2026-03-24T08:47:13Z",
  "biocid":       "Biocid:user/0x5f5a60.../dtc-genotype/DTC-FILE-GHAn0029.txt",
  "operation":    "download",           // upload | download | stream | grant | revoke | pipeline
  "actor_wallet": "0xB3C3a584...",
  "actor_type":   "permittee",          // owner | permittee | nft_holder | x402_payer | denied
  "tier_granted": 2,                    // authorization tier that granted access (0 = denied)
  "ip_address":   "203.0.113.42",
  "range_bytes":  null,                 // set for stream operations
  "result":       "success"
}

11. Privacy, Compliance, and Regulatory Alignment

11.1 Why Patient Ownership Supersedes GDPR/CCPA

GDPR [16] and CCPA [17] were designed to protect individuals from corporations that hold data as custodians. The legislative assumption is that there is a structural power asymmetry: the data subject lacks the ability to control their data once it has been transferred to a controller. BioRouter inverts this assumption entirely.

In BioRouter's model, the patient is not the "data subject" in GDPR terminology; the patient is the data controller. The patient's wallet signature is the only mechanism by which data can be uploaded, accessed, or shared. No institution holds a master decryption key. The concept of an "erasure request" to a third party becomes meaningless when the patient never transferred control in the first place.

Nevertheless, BioRouter is explicitly designed to satisfy GDPR Article 17 (right to erasure) as an operational guarantee rather than as a compliance checkbox:

Storage on GCS, not IPFS. IPFS's content-addressed immutability makes deletion structurally impossible, creating a GDPR Article 17 violation for any system that uses IPFS for personal genomic data. GCS objects are deletable on demand. A call to POST /revoke can trigger physical GCS object deletion in addition to marking consents as revoked.
Consent revocation propagates instantly. The MongoDB consent check in Tier 2 operates at query time. There is no cache layer that might serve a revoked consent grant. Revocation is effective within one request cycle.
No federated copies. BioRouter does not replicate data to third-party systems without explicit owner consent. Federated learning frameworks that distribute data or gradient-derived information to multiple parties without attribution violate the spirit of GDPR Article 5(1)(b) (purpose limitation).

11.2 Privacy-Preserving Bloom Filters vs. Zero-Knowledge Proofs

Zero-knowledge proof systems require deterministic computation: the prover must demonstrate knowledge of a witness that satisfies a circuit without revealing the witness. Genomic data is fundamentally incompatible with this requirement. Genotype calling introduces probabilistic quality scores (PHRED-scaled likelihoods), haplotype phasing introduces population prior-dependent uncertainty, and structural variant detection involves alignment scores across repetitive regions. There is no deterministic "I have genotype X at position Y" fact that can be proven without revealing the entire underlying read pileup.

BioRouter instead uses privacy-preserving Bloom filters for variant membership queries [18]. A Bloom filter for a variant panel can answer "does this individual's genome contain variant rs123456?" with a configurable false positive rate and zero false negative rate, without exposing the complete variant call set. The filter supports probabilistic membership testing that is sufficient for access control and research eligibility screening, while the complete authenticated data remains in GCS accessible only through BioRouter's four-tier authorization.

11.3 Why Federated Learning is Rejected

Federated learning, as implemented in genomics research pipelines, presents several fundamental problems that BioRouter's architecture is explicitly designed to avoid:

Data quality degradation. Gradient aggregation across heterogeneous sequencing platforms, variant callers, and population stratifications introduces systematic biases that are indistinguishable from signal in downstream analyses. Clinical-grade variant interpretation requires complete, authenticated datasets.
Attribution erasure. When a model is trained across federated nodes, there is no mechanism to attribute a specific model parameter to a specific patient's contribution. Revenue sharing becomes mathematically impossible. Federated learning is, in this sense, a technical arrangement that enables extraction of value from patient data without compensating patients.
False privacy compliance. Federated learning does not prevent model inversion attacks [19], membership inference attacks [20], or property inference attacks [21]. The privacy guarantee is weaker than is commonly represented in marketing materials from federated learning vendors.

Architectural Position

BioRouter does not implement federated learning. It does not distribute data or model gradients to any party without the explicit, revocable consent of the data owner. Researchers who require data for AI training must acquire access via Tier 2 (explicit grant with license_type=ai-training) or Tier 4 (x402 payment), access the complete authenticated dataset via the BioRouter streaming API, and accept full attribution and audit logging of their access.

12. Comparative Analysis

Property	BioRouter	23andMe (legacy)	IPFS-based systems	Federated Learning	Nebula Genomics
Patient is data controller	Yes — cryptographic	No	Partial	No	Partial
Consent revocable	Yes — instant	No	No (IPFS)	No	Partial
GDPR Art. 17 compliant	Yes — GCS deletable	Contested	No (IPFS immutable)	Contested	Partial
Patient revenue share	95% on-chain	0%	Varies	0%	Token-based
Data quality preserved	Complete, authentic	Complete	Complete	Degraded	Complete
Storage path exposed	Never (BioCID)	Internal	Yes (CID = location)	N/A	Internal
AI agent attribution	ERC-8004 in BioCID	None	None	None	None
Range streaming (BAM/WGS)	Yes — HTTP Range	No	Partial	No	No
Automated ancestry pipeline	Yes — K=24 trigger	Yes	No	No	No

13. Discussion

13.1 Limitations

BioRouter's current implementation has several limitations that should be acknowledged:

Single GCS region. Files are stored in a single GCS region. Cross-region replication and geo-redundancy are not yet implemented. For very large WGS datasets (100GB+), transfer latency from non-US regions may be significant.
x402 settlement latency. Payment verification queries Sequentias Network synchronously. In periods of high network congestion, this Tier 4 check may introduce latency. Caching paid access grants in a short-TTL local store is a planned optimization.
Bloom filter coverage. The variant membership Bloom filter is currently generated at upload time for VCF files only. DTC genotype files require a conversion step before Bloom filter generation, which is not yet automated in the BioRouter pipeline.
ERC-8004 is a draft standard. The GenoclawIdentityRegistry implements ERC-8004 at draft revision 3. If the final standard introduces breaking interface changes, the registry contract and BioCID generation logic will require migration.
MongoDB as trust anchor. While the consent model is cryptographically authenticated at the API layer, the consent documents themselves are stored in MongoDB rather than on-chain. A compromise of the MongoDB instance could allow forged consent records. Full on-chain consent registration is a planned upgrade via a BioConsent contract on Sequentias Network.

13.2 Future Work

The following capabilities are on the BioRouter development roadmap:

On-chain consent registry. Migration of biocid_consents to a Solidity contract on Sequentias Network, eliminating MongoDB as a trust dependency for consent verification.
Shapley-based Biodata Dividends. Integration of Shapley value computation [22] to attribute marginal contributions of individual data points to trained models, enabling proportional Biodata Dividend distributions at scale.
BioConsent NFT minting on upload. Automatic ERC-721 minting at upload time, linking the BioNFT to the BioCID in biocid_registry and enabling Tier 3 authorization immediately after upload.
FHIR R4 pipeline integration. Structured clinical data uploaded as fhir biodata type triggers automated extraction of discrete clinical observations for population health research eligibility queries via Bloom filters.
Multi-party computation for aggregate statistics. For cases where researchers require aggregate statistics rather than individual-level data, a secure multi-party computation layer over BioRouter-stored data is under investigation as a complement (not replacement) to the authenticated access model.

13.3 The 23andMe Bankruptcy Precedent

The March 2025 23andMe bankruptcy filing [3] placed the genomic profiles of 15 million customers into a corporate asset pool subject to auction to the highest bidder. Customers had no contractual mechanism to object to the sale of their most intimate biological data. The July 2025 acquisition by TTAM Research Institute (founder Anne Wojcicki's nonprofit) for $305 million resolved this particular instance, but the underlying structural vulnerability remains: when data is held by a corporation as a custodian rather than by the individual as an owner, any corporate event (bankruptcy, acquisition, data breach) can instantaneously transfer control of that data without the individual's knowledge or consent.

BioRouter is architecturally immune to this failure mode. The data is stored in GCS, but access is controlled by the patient's wallet private key, not by GenoBank's corporate infrastructure. If GenoBank ceased operations, the biocid_registry MongoDB data could be exported and run by any operator; the cryptographic ownership proofs are wallet-native and do not require GenoBank's servers to validate. The x402 payment contract on Sequentias Network is self-executing and does not require GenoBank to process or approve payments.

14. Conclusion

BioRouter represents a production implementation of the principle that patient ownership of genomic data is achievable without sacrificing data quality, analytical utility, or economic viability. The protocol demonstrates that the traditional tradeoffs in health data governance— privacy versus utility, control versus accessibility, patient rights versus research progress—are artifacts of a centralized custody architecture rather than fundamental constraints.

The four-tier authorization cascade provides a graduated access model that accommodates the full range of legitimate access patterns: direct owner access, explicit consent-based sharing, transferable NFT-linked permissions, and market-priced researcher access. Each tier is cryptographically enforced; none requires trust in GenoBank as an intermediary.

The BioCID addressing scheme decouples data identity from storage location, enabling storage backend migration, multi-cloud deployment, and auditable data lineage without exposing infrastructure details to clients. ERC-8004 agent attribution in the BioCID format extends this lineage to AI-generated derived datasets, laying the foundation for auditable AI in genomic medicine.

The live deployment at biorouter.genobank.app is available for integration by healthcare providers, research institutions, DTC genomics platforms, and AI health agent developers. API documentation and integration guides are available at genobank.io/developers.

Core Principle

Privacy is not about hiding data or making it fuzzy. Privacy is about giving patients complete control over their authentic, high-quality data, with full transparency about its use and fair compensation for its value. BioRouter makes this principle operational.

References

[1] MyHeritage Security Incident Report, June 2018. 92 million email addresses and hashed passwords exposed via third-party breach. Available: https://blog.myheritage.com/2018/06/myheritage-statement-about-a-cybersecurity-incident/
[2] 23andMe, Inc. Form 8-K Filing, January 2024. Disclosure of credential-stuffing attack affecting approximately 6.9 million customer profiles. U.S. Securities and Exchange Commission.
[3] 23andMe, Inc. Chapter 11 Bankruptcy Filing, U.S. Bankruptcy Court, District of Delaware, March 2025. Docket No. 25-10XXX. TTAM Research Institute acquisition for $305 million, July 2025.
[4] Boneh, D., Boyen, X. & Shacham, H. Short Group Signatures. In: Advances in Cryptology (CRYPTO 2004), Lecture Notes in Computer Science, vol. 3152, pp. 41–55. Springer, Berlin, Heidelberg (2004). Note: ZK proofs require deterministic arithmetic circuits; stochastic genomic pipelines cannot be compiled into such circuits.
[5] Mazieres, D. & Kaashoek, M.F. Escaping the Evils of Centralized Control with Self-certifying Pathnames. In: Proc. 8th ACM SIGOPS European Workshop, pp. 118–125 (1998).
[6] Protocol Labs. Content Identifier (CID) Specification. IPFS Documentation, v1.0 (2017). Available: https://docs.ipfs.tech/concepts/content-addressing/
[7] Azaria, A., Ekblaw, A., Vieira, T. & Lippman, A. MedRec: Using Blockchain for Medical Data Access and Permission Management. In: 2016 2nd International Conference on Open and Big Data (OBD), pp. 25–30. IEEE (2016).
[8] Healthureum. A Blockchain-based Healthcare System. Technical Whitepaper v1.2 (2018).
[9] Kim, M.G. et al. Mediblock: Decentralized Medical Information System. Journal of Medical Systems, 43(8), 247 (2019).
[10] Coral Health. Coral Health Research & Discovery: Blockchain-based Healthcare Data Management. Technical Report (2017).
[11] Fielding, R., Nottingham, M. & Reschke, J. (Eds.) Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. RFC 7231, IETF (2014). Status code 402 reserved.
[12] Coinbase, Inc. x402: A Protocol for Machine-Readable HTTP Payments. Technical Specification v0.2 (2025). Available: https://x402.org
[13] ERC-8004: AI Agent Identity Standard. Ethereum Improvement Proposal Draft (2025). Author: Ethereum Foundation AI Working Group. Available: https://eips.ethereum.org/EIPS/eip-8004
[14] Alexander, D.H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19(9), 1655–1664 (2009). doi:10.1101/gr.094052.109
[15] European Parliament. Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Official Journal of the European Union, L 1689 (2024).
[16] European Parliament. Regulation (EU) 2016/679 of the European Parliament and of the Council on the protection of natural persons with regard to the processing of personal data (General Data Protection Regulation). Official Journal of the European Union, L 119 (2016).
[17] California Consumer Privacy Act, Cal. Civ. Code §§ 1798.100–1798.199 (2018), as amended by the California Privacy Rights Act (CPRA), Proposition 24 (2020).
[18] Bloom, B.H. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7), 422–426 (1970). doi:10.1145/362686.362692. Privacy-preserving variant: Cho, H. et al. Secure genome-wide association analysis using multiparty computation. Nature Biotechnology, 36(6), 547–551 (2018).
[19] Fredrikson, M., Jha, S. & Ristenpart, T. Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures. In: Proc. 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333 (2015).
[20] Shokri, R., Stronati, M., Song, C. & Shmatikov, V. Membership Inference Attacks Against Machine Learning Models. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE (2017).
[21] Melis, L., Song, C., De Cristofaro, E. & Shmatikov, V. Exploiting Unintended Feature Leakage in Collaborative Learning. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 691–706. IEEE (2019).
[22] Shapley, L.S. A Value for n-Person Games. In: Contributions to the Theory of Games, vol. 2, pp. 307–317. Princeton University Press (1953). Application to data valuation: Ghorbani, A. & Zou, J. Data Shapley: Equitable Valuation of Data for Machine Learning. In: Proc. 36th ICML, pp. 2242–2251 (2019).

genobank.io
biorouter.genobank.app