The Intra-LLM BioFilesystem
A Consent-Gated, Provenance-Verifiable Protocol for Cross-Vendor Multi-Agent Genomic Interpretation
Abstract
Large language models are entering clinical genomics, yet their reasoning is typically unauditable, single-vendor, and disconnected from the consent that governs the data they read. We present the Intra-LLM BioFilesystem, a protocol that lets AI agents from different vendors co-interpret one patient genome on a single shared workspace, and produces a record that a third party can independently re-verify. The protocol rests on three combined ideas. First, content-addressed conversation: every agent statement is cryptographically bound, through a biocid and a content hash, to the exact immutable data version it was made from, so the record asserts not that an agent said something but that it said it about specific bytes, provably. Second, cross-vendor adversarial co-interpretation: models with different training and different failure modes cross-check each other through an append-only, hash-chained turn log, turning model diversity into an error-reduction mechanism with a tamper-evident trail. Third, consent-gated agent cognition: agent data access is governed by revocable patient consent (a four-tier authorization aligned with GDPR Article 17), and the conversation log itself is treated as a clinical-scope artifact. We describe the architecture, in which a single server (biofs-node) is the sole authority for sequence assignment and chain integrity, a Model Context Protocol server exposing sixteen tools, and a conductor that drives Claude Code, Grok Build, and Gemini through their subscription command-line interfaces in headless mode, consuming no metered API tokens. We report a working implementation: per-agent EIP-712 signatures for non-repudiable authorship, confidence-weighted cross-agent consensus over structured ACMG classifications, an independent offline verifier that re-checks the chain and the signatures with zero trust, and a live exchange in which Claude and Grok debated a de-identified somatic case and reached a tamper-evident, chain-valid record.
Keywords: multi-agent systems, model context protocol, clinical genomics, ACMG, reproducibility, provenance, hash chain, EIP-712, patient consent, GDPR, BioNFT
1. Introduction
1.1 The problem
Clinical variant interpretation is high-stakes, contested, and increasingly assisted by large language models. Three structural weaknesses follow when an LLM is pointed at a genome.
Unauditability. An LLM produces a verdict and a rationale, but there is usually no durable, verifiable link between the words and the exact data the model read. A reviewer cannot later confirm that a stated allele fraction or a cited variant corresponds to a specific file at a specific version. The reasoning is plausible but unprovable.
Single-vendor monoculture. One model carries one training distribution and one set of systematic blind spots. A second instance of the same model tends to agree with the first for the same wrong reasons. Genuine error reduction requires diversity of failure modes, not repetition.
Consent disconnection. The data an agent reads belongs to a patient who may grant or revoke access. Most agent tooling treats data as a static file path, with no notion that access is conditional, revocable, and regulated, and no notion that the transcript of the analysis is itself sensitive.
1.2 Previous approaches and why they fall short
Multi-agent orchestration frameworks coordinate tool-using agents, but they keep shared state in process memory, with no cryptographic provenance and no model of patient consent. Multi-agent debate methods improve answers by having models critique each other, but the debate is typically between instances of the same model and leaves no tamper-evident, independently verifiable record. The Model Context Protocol (MCP) standardizes how one client reaches tools and data, but it is a single client-to-server channel, not a substrate for several heterogeneous agents to share mutable state safely. Agent-to-agent messaging protocols define how agents talk, but they do not bind statements to immutable data, enforce consent, or treat the transcript as a regulated artifact. None of these target auditable, reproducible, consent-governed reasoning over a clinical genome.
1.3 Our contribution
This paper presents a protocol that combines, for the first time to our knowledge, three properties in one system aimed at clinical genomics.
- Content-addressed conversation. Every agent claim is bound to the exact immutable data version it was made from, by citing a biocid and a content hash; the turn log is append-only and hash-chained, so the whole record is tamper-evident and re-verifiable offline.
- Cross-vendor adversarial co-interpretation. Agents from different vendors (in our implementation, Anthropic Claude, xAI Grok, and Google Gemini) cross-check each other on one shared workspace; their structured verdicts are compared by a confidence-weighted consensus, so disagreement becomes a measurable signal rather than hidden divergence.
- Consent-gated agent cognition. Agent access to genomic bytes is mediated by a four-tier authorization with revocable, on-chain patient consent (aligned with GDPR Article 17), and the conversation log is governed as a clinical-scope artifact, de-identified by design.
The remainder of this paper describes the architecture (Section 2), the three innovations in mechanism detail (Section 3), the protocol surface (Section 4), reproducibility and verification (Section 5), a worked clinical example (Section 6), the implementation and results (Section 7), security and compliance (Section 8), and limitations and future work (Section 9).
2. System Architecture
2.1 Design principles
Immutability removes the corruption problem. Genomic artifacts (VCF, aligned reads, annotation databases) are write-once and content-addressed. Two agents cannot corrupt a read-only object. Therefore the only shared mutable state that needs concurrency control is the conversation itself.
One authority for order and integrity. A single server, biofs-node, assigns every turn a gap-free monotonic sequence number and links it into a hash chain. Because it is the sole sequencer, the total order is clean and the chain is always well-formed, even when several agents write at once. This is the role a central coordinator plays without requiring agents to trust each other.
Thin clients, shared server-side state. Each agent reaches the workspace through its own MCP server process; all shared state lives server-side. No agent holds authoritative state, so there is nothing to reconcile between them.
2.2 The protocol stack
subscription CLIs, headless, no API tokens"] CLA["claude -p (Claude Opus 4.8)"] GRK["grok -p (Grok Build)"] GEM["gemini -p (Gemini 3)"] MC1["biofs MCP
agent_id=claude-code"] MC2["biofs MCP
agent_id=grok-build"] MC3["biofs MCP
agent_id=gemini-cli"] NODE["biofs-node
single authority: seq + hash-chain + CAS + signature verify + anchor"] DB["MongoDB
append-only turn log, case header, leases"] BIO["Immutable biodata (GCS)
content-addressed by biocid + content_hash"] ROUTER["BioRouter
4-tier consent, GDPR Art.17 revocation"] CHAIN["Sequentia (chain 15132025)
tamper-evident anchor"] CDR --> CLA --> MC1 --> NODE CDR --> GRK --> MC2 --> NODE CDR --> GEM --> MC3 --> NODE NODE --> DB NODE --> CHAIN MC1 -.read-only, consent-gated.-> ROUTER --> BIO style NODE fill:#667eea,color:#000000 style BIO fill:#e6fffa,color:#000000 style ROUTER fill:#fef5e7,color:#000000
Five layers compose the system.
- L0, immutable biodata. VCF, aligned reads, and annotation outputs in object storage, addressed by biocid and content hash, resolved through BioRouter under consent.
- L1, the workspace authority (biofs-node). The single sequencer and integrity authority. It exposes the workspace over HTTP and persists to MongoDB, with an in-process memory fallback for development.
- L2, the biofs MCP server. A Model Context Protocol server exposing sixteen tools to any MCP-aware client. Each agent runs its own instance, stamped with a distinct agent identity.
- L3, the conductor. A neutral orchestrator that drives each agent through its subscription command-line interface in headless mode.
- L4, the trust anchors. BioRouter for consent and Sequentia for on-chain, tamper-evident anchoring of the conversation digest.
2.3 Subscription-based, headless orchestration
A distinguishing operational property: the agents are driven through the vendors' own subscription command-line tools in non-interactive mode (claude -p, grok -p, gemini -p), not through metered API endpoints. The conductor strips provider API-key environment variables from each child process so the tool authenticates from its stored subscription login. The turn prompt carries almost nothing; each agent hydrates the conversation by reading the shared workspace, so the record, not the prompt, is the source of truth. This keeps cost low and, more importantly, keeps the durable state in the verifiable log rather than in any single agent's private context.
3. The Three Innovations
3.1 Content-addressed conversation
Each turn the agents write may carry a list of references, each a triple of {biocid, content_hash, kind}. The biocid names the data object; the content hash pins its exact version. A statement such as a variant call is therefore not free-floating prose but a claim bound to specific, immutable bytes. Because the underlying artifact is write-once, a reviewer who later resolves the same biocid reads identical content and can confirm the binding.
The turns themselves form an append-only, hash-chained log. biofs-node computes each turn's hash over a fixed field set and links it to its predecessor:
turn_hash = sha256( canonical( {case_id, seq, agent_id, model, role, ts,
content, tool_calls, tool_results, refs,
claim, meta, prev_hash} ) )
prev_hash(turn n) = turn_hash(turn n-1) // genesis = 64 zero bytes
Any later edit to a turn's content changes its recomputed hash and breaks the chain at that point, which an independent verifier detects deterministically. The data layer is immutable and the conversation layer is tamper-evident; together they make the record reproducible.
3.2 Cross-vendor adversarial co-interpretation
The protocol runs models from different vendors against the same case. In the reference implementation these are Claude Opus 4.8 (agent claude-code), Grok Build (agent grok-build), and Gemini 3 (agent gemini-cli). Each reads the others' turns from the shared log and is prompted to advance or to challenge, not to repeat. Because the models carry different training distributions, the disagreements they surface are more informative than the agreements a single model produces with itself.
To make disagreement measurable rather than rhetorical, agents emit structured verdicts through a classification turn that carries a claim object: the subject (for example, an HGVS variant string), the classification (for example, Pathogenic or Likely pathogenic or VUS), the ACMG criteria invoked, and a confidence in the unit interval. A consensus function aggregates the latest claim per agent per subject and computes a confidence-weighted call:
weight[classification] = sum of confidence over agents asserting it consensus_call = argmax(weight) agreement = (number of distinct classifications <= 1) agreement_rate = agreed_subjects / total_subjects
Agreement, disagreement, the per-subject consensus call, and an overall agreement rate become first-class, queryable outputs. An optional referee role (a designated model, or a rule engine) reads the consensus and adjudicates the remaining disagreements into a final call.
3.3 Consent-gated agent cognition
Access to genomic bytes is not a file path; it is a consent decision. BioRouter enforces a four-tier authorization cascade (owner, BioNFT consent, on-chain license, and payment) and returns one of authorized, payment required, license required, or revoked. Consent is revocable: a patient exercising the right to erasure (GDPR Article 17) flips the relevant record, and subsequent resolution returns revoked.
Two safeguards keep this honest in the multi-agent setting. First, by default an agent has no data session and cannot resolve raw bytes; it reasons at the methodology level until the operator explicitly authorizes resolution. This means that pointing an analysis at a third-party model does not, by itself, send patient bytes anywhere. Second, the conversation log is governed as a clinical-scope artifact: data subjects are referenced by biowallet address or anonymous case label, never by name, so the transcript is de-identified by construction.
Why this matters. The combination is the point. Content-addressing without consent leaks; consent without provenance is unverifiable; either without cross-vendor diversity inherits one model's blind spots. The protocol integrates all three so that a clinical AI conversation is at once reproducible, auditable, diverse, and consent-governed.
4. Protocol Surface
4.1 The workspace authority (biofs-node)
biofs-node exposes the shared workspace over HTTP. The principal endpoints:
| Endpoint | Purpose |
|---|---|
POST /agent/workspace/open | Open or create a case; returns header, ordered turns, and a cursor. |
GET /agent/workspace/read | Read turns after a sequence cursor; how each agent sees new turns. |
POST /agent/workspace/append | Append exactly one turn; assigns sequence, hashes, chains, and verifies the signature when present. |
POST /agent/workspace/case | Optimistic-concurrency update of the case header; 409 on version mismatch. |
POST /agent/workspace/lease | Advisory turn-taking lease on a named resource. |
GET /agent/workspace/replay | Full ordered log plus end-to-end chain verification. |
GET /agent/workspace/consensus | Per-subject agreement, confidence-weighted call, and agreement rate. |
GET /agent/workspace/stream | Server-sent events; pushes new turns in real time. |
POST /agent/workspace/anchor | Compute a segment digest; broadcast it on-chain when enabled. |
4.2 The data model
A turn is the unit of record. Its persisted shape:
Turn {
case_id, seq, // server-assigned, gap-free, monotonic
agent_id, model:{name,version},// who authored, which model
role, ts, content, // the message
refs:[{biocid, content_hash, kind}], // content-addressed evidence
claim:{subject, classification, criteria[], strength, confidence}, // structured verdict
tool_calls, tool_results, meta,
prev_hash, turn_hash, // chain
sig, signer, signed // per-agent signature (optional)
}
The case header (title, owner wallet, biocids, status, active editor, and a version counter) is the only shared scalar state, and it is mutated only by optimistic compare-and-set. Leases provide advisory turn-taking. Anchors record a segment digest for on-chain notarization.
4.3 The MCP tools
The biofs MCP server exposes sixteen tools to any MCP-aware client: seven data-access tools and nine workspace tools.
| Group | Tools |
|---|---|
| Data access (consent-gated) | bio_discover, bio_authenticate, bio_load_manifest, bio_resolve, bio_stream_chunk, bio_run_skill, bio_pay_x402 |
| Shared workspace | workspace_open, workspace_read, workspace_append, workspace_classify, workspace_consensus, workspace_case, workspace_lease, workspace_replay, workspace_anchor |
4.4 Per-agent non-repudiable authorship
When signing is enabled, each agent's MCP process holds a persistent key and signs the canonical form of the fields it authored:
signed_payload = canonical( {agent_id, role, content, refs, claim} )
sig = sign(signed_payload, agent_private_key)
biofs-node verifies, on append, that the recovered address matches the declared signer, and rejects the turn otherwise. The signature binds the agent's key to the content it produced, independent of the server-assigned ordering, so authorship is non-repudiable. Signatures follow the EIP-712 domain of GenoBank-BioContext on Sequentia (chain id 15132025).
5. Reproducibility and Verification
Reproducibility here means a precise thing. Because language models sample, no architecture guarantees that a model would emit identical words on a rerun. What the protocol makes reproducible is the record: what was said, in what order, citing exactly which data, with which tool results, signed by which agent. A third party can replay it and independently verify every provenance claim.
To demonstrate this without any trust in GenoBank infrastructure, the protocol ships an independent offline verifier. It is a separate reimplementation of the chain specification, in a different language and package from the authority server. Given an exported log, it recomputes every turn hash, checks every chain link, and re-verifies every signature, contacting no server. When the independent verifier agrees with the authority, the record is reproducible by anyone.
A deliberately tampered turn changes its recomputed hash; the verifier reports the break at the exact sequence number and exits non-zero. The same mechanism that makes the conversation auditable makes a single altered byte detectable.
6. A Worked Clinical Example
We ran the protocol on a de-identified somatic case. The conductor opened a shared case, then drove two agents in turn through their subscriptions.
Claude (round one) set the interpretive frame: it described the expected file types and reference build, proposed a tiered triage under the ACMG and ClinGen sequence-variant-interpretation guidelines, and, anticipating the second agent, pre-staked two contested points, that the PM2 criterion should be applied at supporting strength rather than moderate, and that any predicted loss-of-function call must be backed by explicit caller and coverage quality control before earning full PVS1 strength.
Grok (round one) read Claude's turn and rebutted it point by point: it challenged the default reference assumption, arguing that without inspecting the alignment header the build is not settled and that lift-over between builds can invert criteria, and it qualified the PM2 position, noting that for ultra-rare founder variants in endogamous populations a moderate strength can be defensible if population-frequency filtering accounts for ancestry-matched controls.
Neither agent could fetch raw bytes, because no data session was authorized; both reasoned at the methodology level and said so. The full exchange was recorded as an append-only, hash-chained log whose integrity verified end to end. In a three-agent configuration that adds Gemini, the structured verdicts produced a measurable split, two agents at Pathogenic and one at Likely pathogenic, with the confidence-weighted consensus resolving to Pathogenic, an outcome the system reports as an explicit disagreement with a computed call rather than a hidden divergence.
This is the core demonstration: two models from different vendors, driven on their owners' subscriptions, genuinely disagreeing on expert ACMG points over one shared workspace, leaving a tamper-evident record that anyone can re-verify.
7. Implementation and Results
The reference implementation comprises the workspace authority (biofs-node), the biofs MCP server (sixteen tools), and a command-line client that provides both the workspace verbs and the conductor. Three vendor agents are registered, each with a distinct identity and, optionally, its own signing key: Claude through its client configuration, Grok through its configuration, and Gemini through its settings, scoped so that each loads only the biofs server.
| Property | Result |
|---|---|
| Core invariants (unit tests) | 29 of 29 passing, including gap-free sequence under 20-way concurrent append, hash-chain integrity, tamper detection, confidence-weighted consensus, compare-and-set conflict, and lease expiry. |
| Per-agent signatures | MCP signs; biofs-node verifies on append and rejects mismatches; signatures re-verified by the independent offline verifier. |
| Independent reproducibility | Offline verifier recomputes chain and signatures from an exported log; agrees with the authority; reports a tampered record's exact break point and exits non-zero. |
| Cross-vendor co-work | Claude, Grok, and Gemini connect to the workspace as distinct agents; consensus computed across all three. |
| Live exchange | Real Claude and Grok subscription invocations debated a de-identified case; chain valid end to end; zero metered API tokens. |
| Real-time presence | Server-sent-events stream delivers new turns live to observers. |
8. Security, Privacy, and Compliance
De-identification by design. The conversation log is a clinical-scope artifact. Data subjects are referenced by biowallet address or anonymous case label, never by name. This keeps the transcript compliant with HIPAA, GDPR, and CCPA de-identification expectations even though it is a working record.
Revocable consent. Access is mediated by a four-tier authorization with on-chain consent. A patient exercising erasure under GDPR Article 17 causes subsequent resolution to return revoked. The on-chain audit persists as proof of prior authorization while the data itself remains erasable in object storage.
Third-party model disclosure. Routing real patient bytes to any third-party model is a deliberate, gated decision. By default agents cannot resolve raw data and reason only at the methodology level. Enabling real resolution is an explicit operator action and, for protected data routed to an external vendor, a business-associate and disclosure decision the operator must make.
Tamper evidence and notarization. The hash chain makes any edit detectable. When on-chain anchoring is enabled, the digest of a log segment is broadcast to Sequentia, giving the record a court and journal-grade, tamper-evident timestamp.
9. Limitations and Future Work
Reproducibility scope. The record is reproducible and verifiable; the generation is not bit-for-bit deterministic, by the nature of sampled models. We are explicit about this distinction throughout.
Quantitative evaluation. This paper establishes the protocol and a worked example. A controlled evaluation, comparing single-model interpretation against same-model self-debate against cross-vendor debate on a gold-standard set such as expert-curated ClinVar or ClinGen variant expert panel classifications, and measuring accuracy against expert consensus together with errors caught by the second and third agents, is the natural next study and the strongest test of the central claim that vendor diversity reduces misclassification.
Deployment maturity. On-chain anchoring of segment digests and consent-gated resolution of real data are implemented and gated behind operator-controlled configuration; production deployment with a durable store and a funded anchoring key is in progress.
Generality. The structured-claim schema is not specific to variant classification; the same machinery applies to other interpretive tasks where heterogeneous agents should be cross-checked and the conclusion must be reproducible and consent-governed.
10. Conclusion
Clinical genomics does not need agents that decide; it needs an infrastructure in which agent reasoning is auditable, diverse, reproducible, and bound to patient consent. The Intra-LLM BioFilesystem provides that infrastructure by combining content-addressed conversation, cross-vendor adversarial co-interpretation, and consent-gated agent cognition on a single, hash-chained, server-authoritative workspace. Models from different vendors, driven on their owners' subscriptions, co-interpret one genome and leave a record any third party can independently verify. The data stays immutable and consent-governed; the conversation stays tamper-evident; the disagreements that matter become measurable. This is a foundation on which trustworthy, collaborative clinical AI can be built and, crucially, checked.
References
- Richards, S., Aziz, N., Bale, S., et al. (2015). Standards and guidelines for the interpretation of sequence variants. Genetics in Medicine, 17(5), 405 to 424.
- ClinGen Sequence Variant Interpretation Working Group. Recommendations for application of the ACMG and AMP criteria.
- Anthropic. Model Context Protocol specification. modelcontextprotocol.io.
- GenoBank.io. The BioFS Protocol: NFT-gated genomic data access. genobank.io/whitepapers/biofs-protocol.
- GenoBank.io. BioRouter and the x402 Biodata Router. genobank.io/whitepapers/x402-biodata-router.
- European Parliament and Council. Regulation (EU) 2016/679 (General Data Protection Regulation), Article 17, Right to erasure.
- Ethereum Improvement Proposal 712: Typed structured data hashing and signing.
For inquiries: [email protected] | https://genobank.io
The future of genomics is not centralized platforms. It is patient sovereignty, cryptographic consent, and programmable collaboration.