18.9 KB · updated 2026-05-19 · md

architecture.md

docs/architecture.md

Architecture

<!-- translations:start -->

한국어 · 中文 · 日本語 · Русский · Español · Français · Deutsch

<!-- translations:end --> Tesserae turns a directory of source material into a controlled, typed knowledge graph and projects that graph through a durable markdown wiki layer into a static, AI-friendly website. The April 2026 redesign reorganised the system around a Karpathy three-layer model: raw evidence stays raw, a typed graph governs ontology, and a markdown wiki layer sits between the graph and any rendered output. The static site is now a renderer of that wiki layer rather than a direct dump of the graph, with the controlled ontology in MD0 as the schema.

The Karpathy three-layer model

Andrej Karpathy's framing for LLM-friendly knowledge bases distinguishes three layers, each with its own durability guarantee:

LayerConcernRepo locationOwner
L1 — Raw sourcesThe literal bytes the user authored or harvested. Append-only.data/, docs/, project trees referenced in .tesserae/config.jsonthe user
L2 — WikiTyped markdown pages (sources, concepts, entities, papers, repos, topics, syntheses, questions) with YAML frontmatter. Idempotent: regenerated each compile, but only rewritten when content hashes change..tesserae/wiki/WikiPageStore, WikiLayerProjector, SynthesisProjector
L3 — RenderedThe static HTML site, AI-sibling exports, search index, sitemaps, JSON-LD. Wiped and rewritten every compile, but byte-stable across reruns..tesserae/site/StaticSiteBuilder (tesserae/site/)

The schema sits across all three layers as a separate axis: ResearchGraph in graph.json is the controlled ontology that L2 pages link against, and the ResearchNodeType / edge whitelist in MD3 is the source of truth for what types exist at all.

The redesign added L2 explicitly. Before April 2026 the static site was projected straight from graph.json; the wiki layer existed only inside the Obsidian vault export. Splitting it out gave us:

  • A single human-editable surface (open .tesserae/wiki/ in Obsidian or any markdown editor).
  • Idempotent rebuilds: re-running project compile produces zero file diffs unless source content changed.
  • An evolution log: synthesis pages accumulate over time and let the project narrate itself.

Pipeline

data/, docs/, src/                                    (L1 raw)
        │
        ▼  project compile  (tesserae/project.py)
┌───────────────────────────┐
│ ResearchGraphExtractor    │   deterministic + selective Claude
│ + canonicalization        │
└───────────┬───────────────┘
            │
            ▼
┌───────────────────────────┐
│ ResearchGraph (graph.json)│   schema: research_graph.py
└───────────┬───────────────┘
            │
            ├──▶ WikiLayerProjector   (one page per L1/L2 node)
            ├──▶ SynthesisProjector   (pulse, daily, weekly, topic, …)
            │
            ▼
┌───────────────────────────┐
│ .tesserae/wiki/  (L2 md)  │   sources/, concepts/, entities/,
│                            │   papers/, repos/, topics/,
│                            │   syntheses/, questions/
└───────────┬───────────────┘
            │
            ▼  StaticSiteBuilder.write_site
┌───────────────────────────┐
│ .tesserae/site/  (L3 html)│   index.html, <kind>/index.html,
│                            │   <kind>/<slug>.html,
│                            │   per-page .txt + .json siblings,
│                            │   llms.txt, llms-full.txt,
│                            │   graph.json, graph.jsonld,
│                            │   search-index.json,
│                            │   sitemap.xml, rss.xml,
│                            │   robots.txt, ai-readme.md,
│                            │   manifest.json
└───────────────────────────┘

Every step is incremental. The graph extractor uses manifest.json content hashes to skip unchanged source files. WikiPageStore.write_page returns False (and skips the write) when the body hash matches what's already on disk. StaticSiteBuilder wipes and rewrites .tesserae/site/, but its output is deterministic — see "Idempotence story" below.

Module map

Wiki + synthesis (L2)

ModuleResponsibility
MD0WikiPage dataclass, WikiPageStore for filesystem I/O. Stdlib-only YAML-subset frontmatter parser. Body-hash idempotence.
MD0WikiLayerProjector: maps each ResearchGraph node of a wiki-layer type to a markdown page in the right kind/ folder.
MD0SynthesisProjector: deterministic templates for pulse, daily_digest, weekly, topic, comparison, field_overview. Adds Synthesis nodes and synthesizes / summarizes edges back into the graph.

Graph + ontology

ModuleResponsibility
MD0ResearchNodeType enum (incl. SYNTHESIS), edge-type whitelist (incl. synthesizes, summarizes), validation.
MD0Alias canonicalization + near-duplicate review queue.
MD0Deterministic Python AST extractor for the development slice.
MD0Claude CLI/OAuth selective extractor.

Site renderer (L3)

ModuleResponsibility
MD0StaticSiteBuilder.write_site: wipes + rebuilds the site, walks every route, emits exports + AI siblings + manifest.
MD0One renderer per route (home, indexes, detail pages, timeline, graph, about). SiteContext carries precomputed indices so renderers stay pure.
MD0HTML primitives: breadcrumbs, card, badge, node_table, edge_list, sparkline_svg, heatmap_svg, toc, page_shell, ai_siblings_footer.
MD0Design tokens — CSS variables, light + dark themes, layout, typography, all components styled here.
MD0Client JS bundle: search palette, theme toggle, sigma + 3D-force graph view.
MD0Stdlib-only markdown renderer (links, autolinks, code, emphasis, headings). No external dependency.
MD0Four-signal relevance scoring (direct link, source overlap, Adamic-Adar, type affinity) used by every Related section.
MD0search-index.json builder. Wiki-layer kinds only.
MD0Session index/detail renderers for imported harness history: project-memory summary sections, conversation turn rail, markdown transcript rendering, and collapsed tool-use blocks.
MD0llms.txt, llms-full.txt, graph.jsonld, sitemap.xml, rss.xml, robots.txt, ai-readme.md, per-page .txt/.json siblings.

Pipeline orchestration

ModuleResponsibility
MD0ProjectWiki.compile: drives extraction → graph → wiki layer → site. Owns ProjectPaths (config, graph, manifest, wiki, site, etc.).
MD0All tesserae project … subcommands, including compile, build-site, serve, watch, deploy.
MD0project deploy: pushes .tesserae/site/ to a gh-pages branch via worktree, optionally enables Pages via gh.

External adapters (unchanged this round)

ModuleResponsibility
MD0Obsidian vault projection (graph coloring, Dataview dashboard, raw assets).
MD0Claude Code / Codex / Gemini / Kiro / Cursor / OpenCode harness exports.
MD0Inbound Claude Code/Codex session discovery, normalization, storage under .tesserae/harness_sessions/, and redacted markdown summaries.
MD0Temporal-fact JSONL + optional live Graphiti sync.
MD0Cognee nodes/edges JSONL bundle and direct add/cognify path.
MD0MCP stdio server exposing schema, graph_summary, search_nodes, node_context, search_facts, timeline.

Project workspace layout

.tesserae/
  config.json                 project name, source kind, source list
  graph.json                  validated ResearchGraph (incl. Synthesis nodes)
  manifest.json               per-source content hashes (input dedup)
  sqlite.db                   SQLite graph store
  temporal_facts.jsonl        Graphiti-style temporal projection
  graphiti_episodes.jsonl     dependency-free Graphiti episode export
  report.md                   graph quality / summary
  competitive_report.md       comparison vs. MegaMem / Graphiti / others
  markdown_projection/        flat human-readable markdown
  obsidian_vault/             Obsidian projection w/ .obsidian/, raw/assets/
  agent_harness/              Claude Code / Codex / etc. harness files
  harness_sessions/           imported local Claude Code/Codex sessions
  cognee_bundle/              Cognee nodes/edges/manifest JSONL
  wiki/                       L2 markdown wiki — see below
  site/                       L3 static site — see below

.tesserae/wiki/ (L2)

wiki/
  sources/<slug>.md           raw documents from data/ + docs/, with frontmatter
  concepts/<slug>.md          Concept / TechnicalTerm / Algorithm / etc.
  entities/<slug>.md          Model / Dataset / Benchmark / Metric / Org / Person
  papers/<slug>.md            Paper hub
  repos/<slug>.md             Repository / Project / CodeProject
  topics/<slug>.md            ResearchField / ResearchTopic / ApproachFamily / Trend
  syntheses/<slug>.md         pulse, daily_digest, weekly, topic, comparison, field_overview
  questions/<slug>.md         OpenQuestion

Each file is editable by hand; the next compile honours user edits as long as the body hash differs from what the projector would write. (Editing only the body wins; editing the frontmatter loses on next compile because frontmatter is regenerated.) Obsidian users can open .tesserae/wiki/ directly; the existing obsidian_vault/ adapter is a separate projection, not a substitute.

.tesserae/site/ (L3)

site/
  index.html                  home + project pulse
  about.html                  schema, build info
  assets/{style.css,app.js}   single CSS bundle + single JS bundle
  sources/index.html
  sources/<slug>.html
  sources/<slug>.txt          AI sibling — plain text
  sources/<slug>.json         AI sibling — structured record
  concepts/…  entities/…  papers/…  repos/…  topics/…  syntheses/…  questions/…
  sessions/index.html          imported harness-session index
  sessions/<project>/<id>.html session detail: summary, metadata, turn rail, markdown turns, collapsed tools
  timeline/index.html
  graph/index.html            interactive 2D + 3D force layout
  graph.json                  full graph payload (incl. code nodes, for tooling)
  graph.jsonld                schema.org Dataset, wiki-layer nodes only
  search-index.json           palette + page search; wiki-layer kinds only
  llms.txt                    llmstxt.org — short index
  llms-full.txt               llmstxt.org — every page body, capped 5MB
  sitemap.xml                 every emitted route
  rss.xml                     last 30 syntheses
  robots.txt                  permissive (crawl + index)
  ai-readme.md                machine-readable site map
  manifest.json               sha256 + size for every emitted file

What's deliberately excluded

The redesign drew an explicit line: code-class and code-function nodes stay in graph.json (so MCP, Cognee, and Graphiti consumers still see them) but never get HTML pages, never appear in search-index.json, and never appear in the navigation. That's the user-facing contract — the wiki is a document-first knowledge base, not a function browser.

Concretely, StaticSiteBuilder skips any node whose type is not in the L2 wiki kind map (tesserae/wiki_projector.py::_KIND_FOR_TYPE):

  • Excluded from L2 + L3: CodeClass, CodeFunction, CodeModule, Dependency, EvidenceSpan, SourceFile, all Claim variants (Claim, ContributionClaim, PerformanceClaim, ComparisonClaim, LimitationClaim, CausalClaim).
  • Surface where they still appear: as bullets, badges, neighbour counts, or evidence excerpts inline on related wiki pages, and in graph.json for downstream tooling.

If you need code-level browsing, point an LSP / call-graph tool at the source tree directly — that's a different problem from "wiki of what this project knows."

Idempotence story

The redesign aims for byte-identical output across two consecutive project compile runs over unchanged inputs. The pieces:

  1. Source extraction uses manifest.json content hashes; unchanged files are skipped, so the graph remains stable.
  2. Wiki layer writes are idempotent at the body level. WikiPageStore.write_page reads the existing file, strips frontmatter, sha256s the body, and short-circuits if the new body hashes the same — even if the new frontmatter has a different generated_at timestamp. This is the key trick that keeps git diffs tight on rebuild.
  3. Synthesis output carries a content_hash: sha256-… in its frontmatter. The body hash is computed without generated_at so repeated compiles on the same graph produce the same hash, and Synthesis nodes carry the same content_hash in graph metadata.
  4. Site rendering wipes site/ at the start of write_site, then writes deterministically: routes are sorted, dictionaries dumped with sort_keys=True, manifest.json walked via sorted(rglob("*")). Two runs produce byte-identical files including the manifest.

This is verified by tests/test_site_pages.py and the end-to-end smoke in tests/test_project_e2e_redesign.py (compile twice, diff sites, expect zero file deltas).

Scaling notes

  • Graph view node cap. MD0 bounds the page-embedded payload for the interactive force layout. Beyond ~1500 nodes the browser-side simulation gets sluggish on mid-range hardware, so the page drops the lowest-degree wiki-layer nodes first when the count exceeds the cap. The exported graph.json is unaffected — it always contains the full graph. Code nodes are filtered out before the cap is applied.
  • llms-full.txt cap. A 5 MB safety cap applies in MD1; the file ends with a [TRUNCATED — see graph.jsonld for the full set] marker if the cap is hit. graph.jsonld is uncapped because JSON-LD consumers expect the full set.
  • Search index. Wiki-layer kinds only. Code-graph nodes never enter search-index.json; the redesign target is < 500 KB for the dogfood corpus and we're well under that today.
  • Per-page byte budget (rule of thumb). Each detail page < 60 KB gz HTML, shared CSS < 30 KB, shared JS < 25 KB, sigma vendor on the graph page only (~60 KB). The graph view uses 3D-force-graph + Three.js loaded once; all other pages stay vanilla.
  • Compile time on dogfood. ~300 markdown files extract in under 5 s on a recent dev machine; site render adds another ~2 s. The wiki layer's idempotence means subsequent compiles touch only the changed paths.

Frontend interaction surface

  • Search palettecmd+k / ctrl+k / /. Fuzzy match over search-index.json, scoped to wiki kinds. Recent pages persisted in localStorage.
  • Theme toggle — top-right button; data-theme="dark" is stored in localStorage and applied before paint to avoid flash.
  • Sticky right TOC — desktop only; collapses to a <details> drawer on mobile. Generated from <h2> / <h3> in the page body.
  • Activity heatmap — 26-week SVG with month + weekday labels. Cells link to the day's digest.md source page when one exists. (Per-day timeline detail pages — /timeline/<YYYY-MM-DD>.html — are an explicit follow-up; the inline notice in render_timeline flags it. ⚠ in-progress.)
  • Graph view/graph/. 3D force layout (3d-force-graph + Three.js) with hover tooltips, edge labels, cursor-anchored zoom, and a 2D fallback view. Node colors come from ResearchNodeType.
  • Mobile shell — drawer rail, bottom nav, fluid type, touch-safe hit targets (≥ 44 px).

Testing strategy

  • Unittests/test_wiki_store.py, tests/test_synthesis.py, tests/test_site_components.py, tests/test_site_pages.py, tests/test_site_exports.py, tests/test_relevance.py.
  • Idempotencetests/test_project_e2e_redesign.py compiles twice and asserts zero diffs in wiki/ and site/.
  • Link integritytests/test_frontend.py parses every emitted HTML for hrefs and asserts every internal link resolves to a generated file. No nodes/codeclass-*.html is produced.
  • AI siblings — for every path/foo.html, the test suite asserts path/foo.txt and path/foo.json exist; the JSON parses and contains {title, kind, body, links}.
  • No Playwright — vanilla pytest under PYTEST_DISABLE_PLUGIN_AUTOLOAD=1.