2026-05-19-session-graph-extractor-design.md
docs/superpowers/specs/2026-05-19-session-graph-extractor-design.md
Session graph extractor — design
Status: design • 2026-05-19 • owner: Tesserae maintainers • supersedes: nothing
Tesserae compiles a typed knowledge graph from a data/ directory of documents. Today that graph stops at what's in the documents — every conversation the user and their coding agent have about those documents is lost on close-tab. The point of this spec is to make those conversations a first-class part of the graph, so the agent keeps accumulating project-specific understanding that the next session can query back.
Goal
After tesserae project compile, the typed graph contains structured findings extracted from local Claude Code / Codex / equivalent sessions, each linked to the documents that came up in that session. An agent can ask tesserae project ask "what did we conclude about 3D Gaussian Splatting?" and get back specific insight nodes with provenance back to the session that produced them and the paper(s) they reference.
Non-goals
- Real-time session capture (we read what's on disk, we don't hook into the agent's IPC).
- Cross-project memory (each project's
.tesserae/keeps its own sessions; no global aggregation). - Re-summarising the same session every compile when it hasn't changed (caching is core, not a stretch goal).
- Replacing the existing
Synthesisprojector. Daily/weekly digests continue to render the same way; session findings are an input to the next compile, not an alternative output channel. - Editing or filtering transcripts. We read what's already normalised by the existing
discover_harness_sessionsflow; this spec doesn't touch transcript normalisation.
User-visible behaviour
tesserae project compile # auto-imports sessions, runs extraction
tesserae project compile --no-sessions # skip session extraction entirely
tesserae project compile --sessions-llm=false # structural pass only, no LLM
The default is: if .tesserae/harness_sessions/ exists (auto-populated by the implicit discover_harness_sessions call), the session graph extractor runs. The LLM extraction layer runs only when ANTHROPIC_API_KEY (or the configured equivalent) is set; without it, only the structural pass runs and the graph gains SessionActivity metadata but no finding nodes.
Query examples the agent can run after a compile:
tesserae project ask "what decisions did we make about extractor dedup?"tesserae project ask --backend wiki --kind SessionDecision- MCP:
find_related_nodes(node_id="Paper:arxiv-2601-17835:...", edge_type="discussed_in")
Architecture
A new module tesserae/session_graph.py exposes SessionGraphExtractor with the same shape as ResearchGraphExtractor: it produces a ResearchGraph slice that project.merge_graphs combines with the document slice and the code slice. The extractor runs after the document pass during ProjectWiki._compile_pipeline:
document extractors ─┐
code graph extractor ─┼─→ merge_graphs ─→ projection layer
session extractor ─┘ (id-dedup,
cross-type and
same-type merges)
Module layout
tesserae/
session_graph.py # new: orchestration entrypoint
session_graph_structural.py # new: deterministic activity pass
session_graph_llm.py # new: LLM extraction pass + JSON parsing
llm_synthesis.py # existing: reused for LLM backend abstraction
harness_sessions.py # existing: source of truth for session records
Splitting structural and LLM passes into separate files keeps the deterministic path independently testable (no LLM fixtures needed) and lets us add a pure-deterministic fallback in the rare case the LLM never returns parseable JSON.
Where session findings live on disk
.tesserae/
harness_sessions/ # normalised HarnessSession JSON
<date>-<slug>-<hash>.json
session_findings/ # NEW: extractor output cache
<session_id>.findings.json # cached LLM output, keyed by content_hash
<session_id>.activity.json # cached structural output
The cache files let the next compile skip both passes when neither the transcript nor the session metadata has changed (content_hash of the normalised session record). Stale caches are pruned by id-comparison against the live harness_sessions/ set.
Schema additions
Node types (six new entries in ResearchNodeType)
| Type | Semantics | Example body |
|---|---|---|
SessionInsight | An observation / learned fact / pattern noticed. | "GS-Slam reuses NeRF-Studio's bundle adjustment loop, not its rasteriser." |
SessionDecision | An explicit choice the user and agent agreed on. | "Cache LLM outputs by (session_id, content_hash) rather than session_id alone." |
SessionQuestion | An unresolved question raised during the session. (Distinct from OpenQuestion which captures unresolved questions surfaced inside academic papers.) | "Does the spec say whether the dedup pass runs before or after the link-fixup?" |
SessionTODO | An actionable follow-up. | "Add a --sessions-llm=false smoke test to test_session_graph.py." |
SessionHypothesis | A testable assumption not yet verified. | "The slug collisions on ssim go away once we use public-aware ownership." |
SessionTakeaway | A condensed key point worth remembering. | "The whole-graph compile is fast enough to re-run end-to-end after every extractor change." |
Distinction from OpenQuestion: the existing OpenQuestion type captures questions surfaced from academic papers' "future work" sections. SessionQuestion captures questions raised during agent/user conversation. They're never collapsed even when text matches, because the provenance source and lifecycle differ.
The six are added at the end of the enum to keep historical id ordering stable.
Edge types (new entries in ALLOWED_EDGE_TYPES)
| Edge | From | To | Meaning |
|---|---|---|---|
derived_from_session | any Session* | Session node (sub-type) | Provenance back to the session that produced this finding. |
discussed_in | any doc node (Paper / Repository / Concept / …) | Session node | A document came up during this session. |
references | any Session* | any doc node | This finding refers to / is about that doc. |
supersedes | Session* | Session* | A later finding refines / replaces an earlier one (LLM-emitted when transcripts cite "we previously thought X, now Y"). |
A Session node type
We also add a single Session node type per discovered harness session. It carries the structural data already in the HarnessSession dataclass (started_at, project_root, files_touched, tools_used, token totals, etc.) as metadata. Every finding edges back to its parent Session via derived_from_session. This lets queries answer "what did we work on yesterday" and "which session produced this insight" in one hop.
Session nodes are private — is_public_research_node(Session) == False — so they don't get vault pages by default. They live in the graph for query reachability and MCP retrieval.
Metadata on finding nodes
Each Session* node carries:
{
"session_id": "...", // links back to Session node
"turn_ids": [12, 13, 17], // turns the LLM cited as source
"extractor": "session-llm", // or "session-structural" for activity-only nodes
"llm_model": "claude-sonnet-4-...", // when extractor=session-llm
"content_hash": "sha256-...", // for cache invalidation
"confidence": 0.0..1.0 // LLM-self-reported, optional
}
Extraction pipeline
Pass 1 — structural (always-on, no LLM)
For each HarnessSession in .tesserae/harness_sessions/:
- Mint one
Sessionnode, id-seed =harness:<session_id>. - Read
files_touched→ emitdiscussed_inedges to the matching doc node IDs (resolved via the same path-to-node-id index the ResearchGraphExtractor already builds for cross-reference linking). - Read
commands_run→ store as activity metadata on the Session node. Don't mint nodes for shell commands. - Read
decisions(already populated bydiscover_harness_sessions) → mint oneSessionDecisionper entry withextractor = "session-structural". These are the structurally-recoverable subset; the LLM pass enriches and adds more.
This pass alone gives the user a graph where every session is queryable (which sessions worked on paper Y?), without paying for any LLM calls. It runs on every compile, every session.
Pass 2 — LLM extraction (when API key present)
For each session not already in cache (session_findings/<session_id>.findings.json exists AND its content_hash matches the current session record):
- Build the LLM prompt:
- System: a brief on what tesserae is + the six finding types + the JSON schema for the response.
- Context: the list of doc node IDs in the current project graph (truncated/scored so the LLM has plausible reference targets, not the whole 2000-node corpus).
- User: the normalised transcript (assistant + user turns, with turn IDs).
- Call the configured backend via the existing
llm_synthesis._synthesizerabstraction. One call per session for sessions under the context budget; over-budget sessions split into overlapping windows of ~30 turns with a 5-turn overlap. - Parse the returned JSON list of findings: ``
json [ { "kind": "decision", "body": "...", "turn_ids": [17, 19, 22], "references": ["Paper:arxiv-2601-...", "Concept:gaussian-splatting:..."] }, ... ]`` - For each finding, mint a
Session<Kind>node with: id_seed = session:<session_id>:<kind>:<sha1(body)[:12]>so the id is stable across re-extractions of the same content.derived_from_sessionedge to the Session node.referencesedges to each cited doc node, OR fall back todiscussed_infrom the structural pass if the LLM cited nothing.- Persist the raw JSON to
session_findings/<session_id>.findings.jsonfor the next compile's cache check.
If the LLM returns malformed JSON or fails entirely, the session keeps only its structural data. The structural pass is the floor; the LLM pass is enrichment.
Pass 3 — graph merge
The session extractor returns a ResearchGraph slice. merge_graphs combines it with the document and code slices via the existing id-dedup → same-type-aliased-merge → cross-type-merge pipeline.
One nuance for the cross-type merger: Session* types are NOT in _CROSS_TYPE_MERGE_PRIORITY, so a same-named Paper and SessionInsight will not be collapsed. That's correct — a Paper called "Gaussian Splatting" and an insight about Gaussian Splatting are different entities.
Linking strategy (recap of the design call)
Primary: the LLM's references: [<id>, ...] list. We trust it because the prompt gives it the actual graph node IDs as the only legal targets, and the JSON-mode response constrains the format.
Fallback: when the LLM omits citations on a finding, fall back to structural — look at the files_touched for the turn range the finding cites in turn_ids, and emit references edges to whichever doc nodes resolve from those file paths. If neither LLM cite nor structural fallback yields a target, the finding is kept but with no references edges; it still has derived_from_session so it's reachable.
Caching and idempotence
Two cache files per session:
.tesserae/session_findings/<session_id>.findings.json
.tesserae/session_findings/<session_id>.activity.json
findings.json contains the LLM-extracted JSON list plus the content_hash of the source session record at extraction time. activity.json contains the structural pass output. On compile:
- Compute
content_hash(session)from the normalised HarnessSession dict. - If cache file exists AND its
content_hashmatches, skip the relevant pass and use the cached findings. - Otherwise, run the pass and write the cache atomically (tmp + rename, matching the pattern from
project.compile's graph write).
Stale caches (no matching session in harness_sessions/) are pruned at the start of each compile.
Node IDs are deterministic and content-addressed (stable_id over session:<session_id>:<kind>:<sha1(body)[:12]>), so two compiles producing the same findings produce the same IDs. The same body extracted from the same session always lands on the same node — no duplicate Insight nodes accumulating per recompile.
Configuration
.tesserae/config.json gains:
{
"sessions": {
"enabled": true, // default true
"llm_enabled": "auto", // auto | true | false
"max_turns_per_chunk": 30, // LLM-chunking threshold
"max_tokens_per_call": 30000, // soft budget
"model": "claude-sonnet-4-7-20251201", // override default backend model
"include_doc_id_context": 200 // top-N doc ids passed in prompt
}
}
CLI flags map onto these:
--sessions / --no-sessions→sessions.enabled--sessions-llm=auto|true|false→sessions.llm_enabled--sessions-model <name>→sessions.model
When llm_enabled = "auto" (default), the LLM pass runs iff the configured backend has credentials. true errors loudly if there's no backend; false always runs structural-only.
Vault projection
Findings render under a new vault directory:
<vault>/
sessions/
<date>-<session-slug>/
_session.md # Session-level overview (when/what files/who)
insights.md # all SessionInsight findings from this session
decisions.md # all SessionDecision findings
questions.md # SessionQuestion
todos.md # SessionTODO
hypotheses.md # SessionHypothesis
takeaways.md # SessionTakeaway
Each finding renders as a list item with the body, the source session link, and wikilinks to its referenced docs. The doc page on the other side of the link gains an ## Discussed in sessions section in Incoming form when sessions referenced it (using the existing render_edge_section flow with edge.type = "discussed_in").
Session nodes themselves stay private — is_public_research_node returns False for them — so they're queryable in MCP but don't get flooded into the public vault.
CLI / MCP surface changes
CLI: see Configuration section above. Additionally, tesserae sessions list is extended to show per-session finding counts:
2026-05-13-paper-deep-dive-3d-gaussian-splatting insights=4 decisions=2 questions=1
2026-05-14-concept-trace-multi-view-consistency insights=7 decisions=0 questions=3
MCP: two new tools
list_sessions(project_id, since=None, limit=20)→ array of Session nodesfind_session_findings(node_id, kinds=None)→ array ofSession*nodes linked tonode_idviadiscussed_in/references
The existing find_related_nodes already handles the new edge types generically.
Test strategy
| Layer | Test type | What |
|---|---|---|
| Structural extractor | unit | Fixed HarnessSession JSON → fixed graph slice. No LLM. |
| LLM JSON parser | unit | Various LLM responses (well-formed, partial, malformed, no citations) → expected node/edge counts. |
| Caching | unit | Same content_hash → no LLM call. Changed content_hash → new LLM call. Stale caches pruned. |
| End-to-end compile | integration | Demo corpus + examples/demo-corpus/.harness-sessions/claude-code/*.json → finding counts match a golden file. |
| Idempotence | integration | Two compiles produce byte-identical graph.json. |
| LLM-off path | integration | --sessions-llm=false produces same graph minus finding nodes; Session nodes and structural edges survive. |
Existing test invariants are preserved: 1042 tests pass + 13 pre-existing failures unchanged. New tests added in tests/test_session_graph_structural.py, tests/test_session_graph_llm.py, tests/test_session_graph_e2e.py.
Open questions / risks
- Cost containment. A user with 100 historical sessions of 50 turns each on first compile would trigger 100 LLM calls. Mitigations: the content-hash cache only re-extracts changed sessions on subsequent compiles; the
sessions.max_tokens_per_callbudget bounds per-call cost;--no-sessionsdisables entirely. - LLM citation hallucination. The LLM may cite a doc id that doesn't exist. Mitigation: at parse time, we validate every
referencesentry against the livenode_by_idindex; invalid ids are dropped and alink-fixaudit line is written to.tesserae/sessions/extraction.log. - Privacy of transcripts. Transcripts can contain user-confidential content. The opt-in semantics are explicit and tiered:
- With
sessions.llm_enabled = false(or--sessions-llm=false): zero outbound transmission. Only the structural pass runs. - With
sessions.llm_enabled = "auto"(default) and no LLM backend credentials configured: zero outbound transmission. Structural pass only, no error. - With
sessions.llm_enabled = "auto"AND a configured backend (e.g.ANTHROPIC_API_KEYset): the act of configuring the backend IS the opt-in. The full transcript of each not-yet-cached session goes to the LLM. Theredacted_previewfield is for human-readable session lists, not for LLM prompts — extraction quality on a redacted preview is too poor to be worth it. - With
sessions.llm_enabled = true: same as the previous case, but errors loudly if no backend is configured.
In all cases the transcript itself stays on the user's disk; only the LLM's structured JSON output is persisted to the graph and the per-session cache.
- Stale findings when a session is deleted. If a user manually removes a session JSON, the next compile prunes its findings cache and the merge_graphs id-dedup means no leftover Session* nodes survive (because they're only present in the cached extractor slice).
- Schema-version migration. Adding six node types is additive; no migration needed for existing
.tesserae/graph.jsonfiles because the graph is regenerated on every compile.
Out of scope (deferred follow-ups)
- Synthesis-as-input (the "D" sub-feature): re-extracting from the projector's own
Synthesispages would give a different feedback loop. Deferred until this lands and we see how often Synthesis pages produce findings worth re-extracting from. - Cross-project session aggregation (one tesserae setup pulling sessions from many projects'
harness_sessions/). - Real-time hooks into the agent so findings appear without a full
project compilepass. - LLM-generated summary deltas ("here's what changed in the graph since last week") — that's a Synthesis-projector concern.