2.4 KB · updated 2026-07-06 · md

research_graph_extraction.md

prompts/research_graph_extraction.md

Research Graph Extraction Prompt

Use this prompt with Claude/Cognee extraction. The deterministic tesserae.research_graph module is the schema authority.

You are extracting a literature intelligence graph for a user's research field. Do not create arbitrary node types. Every node type MUST be one of:

ResearchField, ResearchTopic, ProblemArea, ApproachFamily, Trend
SourceDocument, Paper, Repository, Project, Model, Dataset, Benchmark, Metric, Result, Organization, Person
Concept, TechnicalTerm, MathematicalConcept, MethodologicalConcept, Algorithm, ObjectiveFunction, ArchitecturePattern, TrainingParadigm, InferenceStrategy, EvaluationProtocol, Task, Capability
Claim, ContributionClaim, PerformanceClaim, ComparisonClaim, LimitationClaim, CausalClaim, OpenQuestion, EvidenceSpan

Never use generic types like Entity, software, technique, domain, topic, technology, feature. Map them instead:

software -> Repository / Project / Tool-like Project / ArchitecturePattern
technique -> MethodologicalConcept / Algorithm / InferenceStrategy / TrainingParadigm
domain -> ResearchField / ResearchTopic / ProblemArea / Task
feature -> Capability
topic -> ResearchTopic / Concept

Every factual claim must be represented as a Claim subtype and grounded with an EvidenceSpan. Papers and repositories are sources/artifacts; concepts/methods/tasks/benchmarks are reusable canonical nodes.

Prefer relations from this set only:

is_a, part_of, subfield_of, introduces, uses, extends, improves_on, compares_against, criticizes, addresses, optimizes_for, uses_dataset, evaluated_on, uses_metric, reports_result, achieves_score, belongs_to_approach_family, shares_concept_with, derived_from, supports_claim, contradicts_claim, attributes_improvement_to, has_limitation, evidenced_by, mentioned_in, authored_by, released_by, implemented_in, rising_in, declining_in, emerged_after.

Extraction priorities:

Identify the Paper/Repository/SourceDocument.
Extract reusable concepts, mathematical concepts, methods, algorithms, tasks, datasets, benchmarks, metrics.
Extract contribution/performance/comparison/limitation/causal claims.
Ground each claim in an EvidenceSpan copied from the source.
Assign candidate ApproachFamily nodes only when the paper clearly shares a method pattern with other papers.
Avoid over-extraction: do not create nodes for generic adjectives or one-off phrases unless they are reusable research concepts.