Metaphorex: A Bot-Sourced Knowledge Graph
The situation
I wanted a reference catalog of metaphors, to test a hunch that this tool could help both me AND my code-writing AI partners improve our higher-level system design and architecture skills. Cognitive linguists have studied these for decades, but the material is scattered across books, papers, and folklore, with no structured, browsable single catalog.
Building it by hand would take years, but not if I directed AI agents do the volume work.
What I built
A synthetic dataset, not a chatbot.
I’m calling it a “materials library” because that’s how it’s meant to be used. Engineers consult materials libraries to understand not just the strengths of their building materials but the failure modes: where steel fatigues, where concrete cracks, where wood splits along the grain. Knowing the weaknesses is what lets you build something that actually holds up. Metaphors work the same way. “Data is water” is useful until you need to talk about data that doesn’t flow downhill. “Argument is war” illuminates tactics but obscures collaboration. Each entry’s “Where It Breaks” section documents those failure modes, so you can choose your abstractions deliberately instead of inheriting them by accident.
Each entry is a structured markdown file with YAML frontmatter. 400+ mappings, 100+ frames, 20+ categories so far. The catalog lives in a GitHub repo. GitHub is the CMS: pull requests are drafts, merged is published.
A five-agent pipeline. The agents are built as a Claude Code plugin and operate on the repo through normal git workflows:
- Prospector surveys a source (a book, paper, or corpus), builds an extraction playbook, writes parsing scripts, and files sub-issues for each candidate mapping.
- Surveyor reviews Prospector playbooks before mining begins.
- Miner follows the playbook and extracts structured entries, opening PRs.
- Smelter does mechanical cleanup: validation, formatting, metadata normalization. The cheapest model handles the least creative work.
- Assayer reviews Miner output for quality, accuracy, and completeness.
Each agent has scoped permissions and a defined trust level. The Prospector can write scripts (human-reviewed via CODEOWNERS). The Miner follows playbooks but doesn’t write code. The Assayer can request changes but can’t merge. Nine import projects have run through this pipeline so far, sourcing from Lakoff and Johnson, Jungian archetypes, design patterns, AI engineering, and more.
Validation and quality gates. A Python validator enforces the content schema: required frontmatter fields, slug consistency, frame existence, cross-references. Zero warnings, zero errors is the standard. Every PR must pass before merge.
A browsable site. The Astro static site at metaphorex.org renders the catalog with full-text search via Pagefind. Deploys nightly from main via GitHub Actions.
What’s interesting about it
The agents don’t chat. They operate on files through git, just like a human contributor would. PRs, branches, code review, CI checks. The orchestration is the repo workflow itself, not a custom runtime.
The layered contributions from agents with different models and prompts lead to better output than a single frontier model would produce alone.
The whole system runs on a single content repo with no database, no API server, and no hosting costs beyond GitHub Pages. The agent plugin is portable: fork the repo, bring your own Claude API key, and the pipeline works.
The outcome
A growing, structured knowledge base that would have taken years to build manually. The pipeline processes a new source (survey, extract, review, publish) in hours, and runs continuously. The project is open source under CC BY-SA 4.0 (content) and MIT (code). Browse the catalog at metaphorex.org or read the source on GitHub.