How it works

We taught an AI to read
every sacred text at once

Scholars spend lifetimes noticing that a Sumerian flood story predates Genesis by a thousand years, or that the Buddha and Osiris share the same death-and-return arc. We built a system that finds these patterns across every tradition, simultaneously — and shows its work.

01 — THE PROBLEMHumanity's knowledge is fragmented

Fourteen traditions. Thousands of years. No shared index.

The world's sacred texts — the Quran, the Bhagavad Gita, the Book of Enoch, the Tao Te Ching, the Zohar — were written across 4,000 years, in a dozen languages, on three continents. They contain overlapping flood myths, parallel creation stories, mirrored hero journeys, and suspiciously similar cosmologies. But connecting them has always required a scholar who happened to read both Akkadian cuneiform and Sanskrit. That's vanishingly rare.

The result? The most interesting questions in comparative religion — did the dying-and-rising god motif originate once and spread, or did every culture invent it independently? — remain trapped behind institutional gates, scattered across journal articles that cite each other in circles.

We thought: what if you could ask that question to all 26 texts at once, and get an answer that traces every claim back to its source, scores its confidence, and tells you where the scholars disagree?

02 — THE CORPUS26 texts. 14 traditions. Two databases.

"In the beginning was the Word" — and the word needed structure.

Sacred texts

Traditions

1,700+

Passages

374

Entities extracted

106

Themes identified

226

Structural links

We don't just store text. Every passage lives in a knowledge graph — a web of relationships where Marduk is linked to Tiamat is linked to the concept of primordial chaos is linked to Genesis 1:2 ("the earth was without form, and void") is linked to the Enuma Elish's Tablet I. Every node has a type. Every edge has a confidence score. Every claim has a citation.

Alongside the graph, a vector database stores 1024-dimensional semantic embeddings of every passage. This means we can find connections that aren't lexical — passages that mean similar things even when they use completely different words, in different languages, from different millennia.

03 — THE PIPELINEFrom raw text to living knowledge

Six stages. Each one adds a layer of understanding.

Stage 1

Parse

Raw text arrives in wildly different formats — Bible verses, Quran ayahs, numbered stanzas, free prose. Format-specific parsers normalize everything into structured sections with canonical references. Genesis 1:1 becomes addressable. Surah 96, Ayah 1 becomes addressable. The Poetic Edda's Voluspa, stanza 3 becomes addressable.

Stage 2

Chunk

Sections are split into 500–800 token chunks that respect structural boundaries — verse breaks, paragraph breaks, stanza endings. Overlapping windows (100 tokens) ensure no context is lost at seams. Each chunk gets a deterministic ID tied to its source.

Stage 3

Embed

Every chunk is transformed into a 1024-dimensional vector via Voyage AI's voyage-3-large model. This is the magic that lets us find semantic siblings — passages that feel alike even across languages and millennia. A Sumerian prayer and a Vedic hymn addressing the same cosmic question will cluster together in vector space, even though they share zero vocabulary.

Stage 4

Extract

This is where AI meets scholarship. Claude reads each passage and extracts structured data using tool-use — not free-form generation, but constrained, schema-bound extraction:

// What Claude extracts from each passage:
{
  "entities": [
    { "name": "Tiamat", "type": "deity",
      "attributes": ["primordial", "sea", "chaos", "feminine"] }
  ],
  "themes": [
    { "category": "creation",
      "description": "cosmos fashioned from body of slain deity" }
  ],
  "relationships": [
    { "source": "Marduk", "target": "Tiamat",
      "type": "defeats", "confidence": 0.95 }
  ]
}

Entity types include deities, figures, places, concepts, symbols, rituals, artifacts, creatures, and events. Theme categories span 14 archetypal patterns — from creation myths to trickster figures to cosmic battles. Every extraction is tied to its source passage with a citation trail.

Stage 5

Load

Extracted entities, themes, and relationships are loaded into Neo4j as graph nodes and edges. MERGE operations prevent duplicates. Every entity links to its tradition, every passage links to its text, every text links to its canonical work. The graph grows with each passage processed.

Stage 6

Cross-Reference

The most exciting stage. The system takes entity pairs from different traditions — say, Osiris (Egyptian) and Dionysus (Greek) — and evaluates their connection across six dimensions of evidence. This is where decades of comparative mythology work happens in minutes.

04 — CONFIDENCE SCORINGSix dimensions. No hand-waving.

Every connection earns its place — or gets flagged as disputed.

This is what separates us from "vibes-based" comparisons. When The Archive says "Gilgamesh's flood narrative likely influenced the Genesis account", it doesn't just assert it. It scores the claim across six weighted dimensions:

Textual Similarity

25% weight — cosine similarity of passage embeddings

Scholarly Consensus

20% weight — academic source agreement

Linguistic Evidence

20% weight — cognates, loanwords, etymology

Temporal Proximity

15% weight — how close in time? <200yr = high

Archaeological Evidence

10% weight — physical corroboration

Geographic Plausibility

10% weight — transmission corridor feasibility

The AI ceiling: Any connection based solely on AI inference — without corroborating human scholarship, archaeological evidence, or linguistic proof — is capped at 0.7 confidence. The system literally cannot claim certainty about something only it has noticed. It has to show its work.

Connections scoring below 0.3, or where scholarly consensus is below 0.4, are automatically flagged as disputed. They're still visible — interesting hypotheses are valuable — but they're clearly marked so nobody mistakes an emerging pattern for established fact.

05 — CROSS-REFERENCINGThe associative intelligence layer

Not flattening traditions — mapping how they actually interacted.

When the cross-reference analyzer compares two entities from different traditions, it doesn't just say "these are similar." It classifies the mechanism:

Literary Borrowing

Direct textual influence. The Genesis flood account borrows specific narrative elements from the Gilgamesh epic — the boat dimensions, the dove, the raven.

Oral Diffusion

Myths that spread through trade routes, migration, and storytelling without direct textual contact. The hero's journey motif appears along Silk Road corridors.

Independent Invention

Structural parallels that arise from shared human cognition, not contact. Flood myths in Mesoamerica likely emerged independently from Near Eastern ones.

Shared Historical Event

Multiple traditions remembering the same event differently — a real catastrophic flood at the end of the Ice Age, filtered through different cultural lenses.

Structural Inevitability

Patterns that emerge from the structure of narrative itself — Lévi-Strauss's binary oppositions, the cognitive science of religion's "minimally counterintuitive" concepts.

Every cross-reference also stores falsification criteria: what evidence would disprove the connection? And counter-evidence: what existing scholarship argues against it? This isn't a system that confirms what you want to hear. It's a system that tries to prove itself wrong.

06 — STRUCTURAL ANALYSISThree frameworks. Centuries of scholarship. One graph.

Propp. Campbell. Thompson. Applied at passage level.

Beyond entity extraction, we map every narrative against three foundational frameworks from comparative mythology and folklore studies:

Vladimir Propp's 31 Narrative Functions — The recurring building blocks of folk tales, from "the hero receives a magical agent" to "the villain is punished." We tag passages with their Propp function, then compare across traditions. It turns out the Sumerian Descent of Inanna and the Greek Persephone myth hit many of the same functions in nearly the same order.

Joseph Campbell's Monomyth — The 17 stages of the Hero's Journey, from "The Call to Adventure" through "The Return." We map where each sacred text's narrative falls on this arc, enabling cross-tradition comparison at the structural level.

Thompson Motif Index — The standard classification system for recurring elements in world folklore, with 22 categories from A (Mythological Motifs) to Z (Miscellaneous). When you see that motif A1010 ("World-flood") appears in Mesopotamian, Hebrew, Hindu, and Norse traditions, that's the Thompson Index at work.

07 — IN PRACTICEWhat this actually looks like

Ask a question. Get the evidence chain.

Say you ask: "Is there a connection between the Egyptian weighing of the heart and the Christian Last Judgment?"

The system doesn't guess. It:

1. Finds the relevant passages via semantic search (vector similarity across the Book of the Dead and Revelation).
2. Retrieves the extracted entities (Ma'at, Osiris, Anubis / Christ, the Lamb, the Book of Life) and their relationships from the knowledge graph.
3. Checks the cross-reference table for existing Osiris–Christ scholarly analysis (Frazer, Mettinger, Smith — with dates, claims, and counter-claims).
4. Scores the connection across all six confidence dimensions.
5. Synthesizes an answer in your chosen reasoning mode — authoritative, transparent, dialectical, or Socratic — with every claim hyperlinked to its source passage.

"And I saw the dead, great and small, standing before the throne, and books were opened."

Revelation 20:12 (KJV)

"O my heart of my mother! Do not stand up as a witness against me, do not be opposed to me in the tribunal."

Egyptian Book of the Dead, Spell 30B (Faulkner)

Two texts, two millennia apart, two completely different cultures — and the same existential architecture: a cosmic courtroom where the dead are judged by the contents of their heart. The Archive doesn't flatten these into equivalence. It maps exactly how similar they are, why they might be connected, and who disagrees.

08 — WHY THIS MATTERSAccessible scholarship at the speed of curiosity

The patterns were always there. We built the lens.

I built this because I kept hitting the same wall. I'd read the Bhagavad Gita and think "this sounds exactly like what the Stoics said" — and then spend three hours on JSTOR trying to find out if anyone had noticed that, only to discover it was buried in a 1987 monograph behind a $40 paywall. That's insane. These are humanity's most important texts. The connections between them shouldn't be locked up.

This isn't a toy. The confidence scoring system means you can trust the connections at face value — or drill into the evidence chain if you're skeptical. The AI ceiling means the system can't hallucinate certainty. The falsification criteria mean every hypothesis comes pre-loaded with the seeds of its own disproval.

And we're just getting started. 21 passages enriched so far, with ~1,700 more to go. Every passage we process adds entities, themes, and cross-references to the graph. The intelligence compounds. A connection that's invisible with 100 entities becomes obvious with 1,000.

The goal is simple: make the study of comparative religion as navigable as Google Maps made geography. Zoom out, see the patterns. Zoom in, read the primary sources. Every level of detail is grounded in evidence.

09 — UNDER THE HOODThe technical stack

For the engineers and investors who want to see the engine.

// Architecture
Frontend         Next.js 15 + React 19 (TypeScript, CSS Modules)
Knowledge Graph  Neo4j Aura (2,600+ nodes, 30+ relationship types)
Vector Search    PostgreSQL + pgvector (Neon) — 1024-dim Voyage AI
AI Engine        Claude Sonnet 4.6 (entity extraction, analysis, chat)
Embeddings       Voyage AI voyage-3-large (1024 dimensions)
TTS              Kokoro 82M on Modal (GPU, scales to zero)
Payments         Stripe (3 tiers: Free / Scholar / Oracle)
Hosting          Vercel (edge functions, ISR)
Monorepo         pnpm + Turborepo (web, mobile, shared packages)

// Enrichment cost
Per passage      ~$0.11 (Sonnet) or ~$0.006 (Haiku)
Remaining        ~1,683 passages × $0.006 = ~$10 to complete
Value created    A knowledge graph that took 26 sacred texts and
                 wove them into a single, queryable, citation-backed
                 map of human religious thought.

This is what I'm building.

A tool that makes humanity's deepest questions navigable. Not by simplifying them — by finally giving them the structure they deserve.

Explore the Graph Support the Project See All Features

We taught an AI to readevery sacred text at once