synthesis-llm.md
docs/synthesis-llm.md
LLM-Backed Synthesis Prose
<!-- translations:start -->
한국어 · 中文 · 日本語 · Русский · Español · Français · Deutsch
<!-- translations:end --> Tesserae ships with two synthesis paths. The default is a deterministic heuristic that never calls a network: it produces predictable, idempotent markdown templates from the research graph. The optional LLM upgrade path replaces those templates with prose written by Claude on every compile, while keeping every other invariant (idempotence, citation tracking, hash-stable bodies) intact.
This page covers when to enable it, what it costs, what data leaves your machine, and how to inspect the output.
What it does
Both paths consume the same _PagePlan inputs (node ids, names, types, descriptions, source paths). The difference is the body.
Heuristic (generator: heuristic-v1)
# Project Pulse
## Counts
- Paper: 14
- Repository: 4
...
## Recently added
- Geometry-Grounded Gaussian Splatting (Paper)
- Volumetric Rendering Revisited (Paper)
...
## Tagline
Tesserae — a self-evolving research notebook.
Reads like a database dump. Useful, deterministic, and shipped today.
LLM (generator: llm-claude-sonnet-4-6)
## Recent activity
The wiki tightened around 3D reconstruction this week. Two papers landed
under the Splatting Family [ApproachFamily:splatting:a86ed11b9524], both
foregrounding photometric and depth supervision for stable splat geometry
[Paper:geometry-grounded-gaussian-splatting:f188522141a2]. The dominant
through-line is volumetric rendering refinements
[Concept:volumetric-rendering:b05846130d24].
Reads like an editorial digest. The model is constrained to restate facts present in the inputs — every paragraph that names a node ends with a [node_id] citation, and bodies that omit citations (or are shorter than 80 chars) are rejected and fall back to the heuristic.
Prompt shape
Two blocks: a long, stable system block wrapped in cache_control: ephemeral and a per-page user message that varies by kind.
System block (cached, identical across pages)
You are an Tesserae synthesis writer. Your job is to summarize a controlled
knowledge graph into a single Markdown page. Rules you follow ABSOLUTELY:
RULE 1 — DO NOT INVENT FACTS. Restate or summarize ONLY material you find
in the inputs. ...
RULE 2 — CITE EVERY CLAIM. Every paragraph that names a node MUST end
with one or more citation markers in square brackets, where the bracket
body is the node's id (e.g. ``[Paper:arxiv-2604.20329:abcd1234]``).
...
RULE 3 — STAY ON TOPIC. The synthesis kind decides the shape:
* pulse : project-wide weekly snapshot. 5-9 sentences max.
* daily_digest : one paragraph per noteworthy paper that day.
* weekly : 3 themes from the week, 1 paragraph each.
* topic : narrative about a research topic / approach family.
* comparison : one paragraph per family with shared task/benchmark.
* field_overview: 1-2 paragraphs per linked sub-topic.
RULE 4 — TONE. Direct, terse, technical. ...
RULE 5 — FORMAT. Output is pure Markdown. No frontmatter. ...
RULE 6 — LANGUAGE. Match the dominant language of the input materials.
If 80%+ of input titles/descriptions are in Korean, write in Korean.
Otherwise English.
The current ontology is:
Paper, Repository, Concept, Algorithm, Model, Dataset, Benchmark, Metric,
Person, Organization, ResearchTopic, ApproachFamily, Synthesis, ...
A node id has the shape ``Type:slug:hash``.