rag-anything.md
docs/integrations/rag-anything.md
RAG-Anything multimodal companion
<!-- translations:start -->
한국어 · 中文 · 日本語 · Русский · Español · Français · Deutsch
<!-- translations:end -->
RAG-Anything is a multimodal RAG framework (built on LightRAG) that parses PDFs, Office documents, images, and equations through MinerU/Docling/PaddleOCR. Tesserae integrates it both as a multimodal ingestion pipeline (UA-style native graph projection) and as a runtime memory backend alongside Cognee.
Why use both?
- Tesserae — long-lived agent memory, wiki compilation, graph projection.
- RAG-Anything — multimodal ingestion + LightRAG runtime retrieval.
The two complement each other: RAG-Anything brings PDF/Office/image understanding that Tesserae's text-first source loaders don't provide; Tesserae keeps the long-lived, queryable memory that survives across sessions.
Current low-friction workflow
The recommended path is the setup wizard:
tesserae project setup
For automation:
tesserae project setup \
--yes \
--with-raganything \
--install-raganything \
--raganything-parser mineru \
--run-raganything
tesserae project compile
The setup wizard installs both raganything and docling together. MinerU stays opt-in: install it with pip install 'mineru[core]' only if you have PDFs or images to ingest.
Tesserae stores a managed refresh command rather than asking users to invent one:
tesserae project refresh-raganything --parser mineru
During compile, Tesserae:
- checks whether
.tesserae/external/raganything/manifest.jsonexists and matches the current git commit (via the storedmeta.json#gitCommitHash); - runs the managed refresh wrapper if missing/stale or
--refresh-external-toolsis passed; - discovers non-code sources (PDFs, Office docs, images, markdown) and parses them via the configured parser;
- writes
manifest.json+meta.json; - continues the normal memory compile.
You can force all configured external refresh commands before a compile:
tesserae project compile --refresh-external-tools
Manual equivalent
pip install 'raganything[all]'
python -m tesserae.raganything_refresh --project . --parser mineru
tesserae project compile
Compile-time vs runtime
Tesserae splits the integration cleanly:
- Compile-time parsing (
refresh-raganythingandcompile): runs parsers directly — native read for.md/.txt/.rst,docling.DocumentConverterfor everything else. RAG-Anything's full pipeline is not invoked here, so no LLM/embedding/vision keys are needed for compile to succeed. - Runtime queries (
project ask):raganything_query.pyinstantiatesRAGAnythingwith the project's configured LLM/embedding/vision functions and runsaqueryagainst LightRAG's store. This path requires API keys.
The split means compile is fast, deterministic, and key-free; only retrieval-time operations cost LLM tokens.
Native graph synchronization
Tesserae imports the parsed manifest natively during compile when the configured tool uses sync_mode: native_graph.
The native adapter reads .tesserae/external/raganything/manifest.json, projects each parsed document into a SourceFile node with multimodal block metadata, and writes a sync manifest:
.tesserae/external/raganything-sync.json
Current mapping:
| RAG-Anything | Tesserae direction |
|---|---|
documents[*] | SourceFile node, metadata.parser="raganything" |
content_list[type=text] | folded into SourceFile.description; concepts via existing extractor |
content_list[type=image] | SourceFile.metadata.multimodal_blocks[] (img_path, caption) |
content_list[type=table] | SourceFile.metadata.multimodal_blocks[] (table_body, caption) |
content_list[type=equation] | SourceFile.metadata.multimodal_blocks[] and metadata.equations[] (LaTeX preserved) |
Provenance is preserved on each node:
{"system": "rag-anything", "id": "doc-<sha256>", "type": "document", "artifact": ".tesserae/external/raganything/manifest.json"}
Note: the interactive graph view hides sources-group nodes by default to focus on concepts and entities — projected raganything SourceDocuments stay in graph.json (MCP, Cognee, search, per-page wiki views still see them), they just don't flood the canvas. Set graph_view.show_sources = true in .tesserae/config.json to restore the dense view.
Runtime memory backend
memory_backends.raganything (default produced by default_raganything_backend_config) coexists with Cognee. project ask tries backends in priority order; per-project priority can be set via memory_backends.priority. RAG-Anything is opt-in (default enabled: false); the setup flag --with-raganything flips it on.
LLM provider (no API key needed)
RAG-Anything's runtime backend needs an LLM to answer queries. Tesserae defaults to its existing OAuth-based CLI integrations — no API key required:
| Provider | How it authenticates | Setup flag |
|---|---|---|
codex (default) | codex CLI OAuth (you logged in once with codex login) | --raganything-llm-provider codex |
claude | claude -p CLI; respects CLAUDE_CONFIG_DIR for multi-account setups | --raganything-llm-provider claude --raganything-claude-config-dir ~/.claude-personal2 |
For multi-account Claude setups (e.g., ~/.claude-personal1, ~/.claude-personal2), pass --raganything-claude-config-dir <path> at setup. The runtime backend will export CLAUDE_CONFIG_DIR=<path> before each invocation so the chosen account's auth is used without touching your default ~/.claude.
Embeddings
| Provider | When to use |
|---|---|
deterministic (default) | No external deps. Hash-based; low semantic quality but enough for LightRAG to construct an index. Good baseline for proving the integration works. |
ollama | Local Ollama running with an embedding model (e.g., nomic-embed-text). Pass --raganything-embedding ollama; the backend defaults to http://localhost:11434. |
Direct OpenAI embedding support is not wired through these flags in v1 — users with OpenAI keys can set OPENAI_API_KEY and override memory_backends.raganything.embedding.provider directly in .tesserae/config.json (RAGAnything will pick up the env var via LightRAG's defaults).
Invoking from the CLI
# Auto mode: tries RAG-Anything (when enabled), then Cognee, then compiled-wiki search.
tesserae project ask "What does the integration spec say about parser routing?"
# Force a specific backend.
tesserae project ask "..." --backend raganything
tesserae project ask "..." --backend cognee
tesserae project ask "..." --backend wiki
--backend raganything calls tesserae.raganything_query.query directly. A relative working_dir in memory_backends.raganything is resolved against the project root before the call.
Top-level ask (uses the multi-project registry)
For workflows where you want to ask across multiple registered Tesserae projects without cd-ing into each one, the top-level tesserae ask command resolves the project via the persistent registry shared with the MCP server:
# One-time: register your projects (saved to ~/.tesserae/registry.json).
tesserae wiki register ~/Developer/Projects/Tesserae --name tesserae --activate
tesserae wiki register ~/Developer/Projects/Other --name other
# List registered projects.
tesserae wiki list
# Ask the currently active project.
tesserae ask "How does the parser routing work?"
# Ask a specific registered project (no need to activate it).
tesserae ask "What is the architecture?" --wiki other
# Force a backend or pass a direct path.
tesserae ask "..." --wiki tesserae --backend raganything --json
tesserae ask "..." --project /tmp/somewhere
The dispatch logic — --project > --wiki > active project — is implemented in _top_level_ask_handler and the answer formatting / backend selection is shared with project ask and the MCP ask tool through tesserae.query.ask_project. The registry is file-backed (~/.tesserae/registry.json by default), so it persists across sessions and stays in sync with the MCP server's project list.
Querying across multiple vaults (--scope all-registered)
Bet B2 — when you have several registered projects (research vault, work vault, side-project vault) and you want to ask the same question against all of them, use --scope all-registered:
# Fan out across every registered project. The aggregated envelope is
# {"scope": "all-registered", "question": "...", "by_project": {"<alias>": <envelope>}}.
tesserae ask "What did I write about RLHF?" --scope all-registered --json
# Restrict to a hand-picked subset of aliases.
tesserae ask "..." --scope all-registered --scope-aliases research side-projects
The handler iterates registered projects in alphabetical order, calls ask_project against each, and aggregates the per-project envelopes. A single project failing — missing config, RAG-Anything not enabled, Cognee down — is captured as {"error": "..."} in that alias's slot and never aborts the rest of the fan-out. The same scope argument is accepted by the MCP ask tool, so MCP-driven coding agents get the same fan-out without extra plumbing.
Multi-project registry (tesserae wiki)
| Command | Purpose |
|---|---|
tesserae wiki list [--json] | Show registered projects and which one is active. |
tesserae wiki register <path> [--name <alias>] [--activate] | Add a project to the registry; alias defaults to the sanitized directory name. |
tesserae wiki activate <name> | Mark an entry as the active project for subsequent tesserae ask calls without --wiki. |
tesserae wiki unregister <name> | Remove an entry; clears the active pointer when it matched. |
These commands operate directly on tesserae.mcp_server.ProjectRegistry — no MCP roundtrip — so they can be scripted without running the MCP server.
Invoking from MCP
The stdio MCP server exposes an ask tool with the same backend selector:
{
"name": "ask",
"arguments": {
"question": "What does the integration spec say about parser routing?",
"backend": "auto",
"project": "tesserae"
}
}
The dispatch order (raganything → cognee → compiled-wiki search) and working_dir resolution mirror the CLI handler exactly, so coding agents and human operators converge on the same answers.
System prerequisites
- Python 3.10+ is required for RAG-Anything (the upstream
raganythingpackage ≥1.3.0 transitively depends onmineru[core], which is Python 3.10+). On older Pythons Tesserae disables the integration with a clear warning rather than silently installing a broken placeholder. - LibreOffice for
.doc/.docx/.ppt/.pptx/.xls/.xlsxparsing — install separately via your platform's package manager. RAG-Anything skips Office documents with a warning when LibreOffice is missing. - MinerU model weights are downloaded on first parse and cached (~GBs). Subsequent runs reuse the cache.
- OpenAI-compatible LLM/embedding/vision keys for the runtime memory backend (
OPENAI_API_KEY,OPENAI_BASE_URL). Parser-only mode does not require keys.
Parser routing
Tesserae auto-routes sources to the right parser per file extension:
| Extension | Parser | Reason |
|---|---|---|
.md, .markdown, .txt, .rst | docling | Lightweight; no MinerU model download. |
.doc, .docx, .ppt, .pptx, .xls, .xlsx | docling | Better Office structure preservation per upstream. |
.pdf, .png, .jpg, .jpeg, .gif, .bmp, .tiff, .webp | configured default (--raganything-parser, default mineru) | OCR + table extraction. |
Override per-bucket with --text-parser and --office-parser on refresh-raganything. The configured default still applies to PDFs and images.
Before the parse loop runs, Tesserae probes whether each required parser's Python package is importable (importlib.import_module(...)) and bails fast with a single aggregated error listing every missing parser and its install command. We deliberately don't use upstream RAGAnything.check_parser_installation() because it only inspects the parser configured on the instance and folds in model-weight readiness checks that don't fit a pre-flight stage.
Tesserae also picks RAGAnything's construction-time parser from the actual routing distribution (most-common picked parser wins) rather than from --raganything-parser directly. This avoids the failure mode where RAGAnything.__init__ tries to initialize a heavy parser (e.g. mineru) whose model weights aren't yet on disk and brick the entire run before per-call parser= overrides can take effect. The --raganything-parser flag still controls the default for non-text, non-Office sources (PDFs, images).
Parser packages
The compile-time parse path uses docling.DocumentConverter directly for every non-text source; install it once and you're covered:
| Parser | Install command |
|---|---|
docling (compile-time default for everything except native text) | bundled when you run --with-raganything --install-raganything (or pip install docling standalone) |
paddleocr (optional OCR alternative) | pip install 'raganything[paddleocr]>=1.3.0' and pip install paddlepaddle (platform-specific wheel) |
Note:
mineruis currently not invoked at compile-time. The compile path bypasses RAG-Anything's full pipeline (which would require LLM/embedding/vision callables) and routes every non-text source through docling directly. MinerU support is reserved for a future direct-import path that ingests an externally-producedcontent_list.json.
When a configured parser is missing, refresh-raganything bails fast — listing every missing parser in a single error with the right install command — instead of cascading per-file failures.
Per-page ask widget
Every detail page (concept, paper, repo, synthesis, entity, topic, question, source) includes an inline "ask about this page" widget. It POSTs to /api/ask on the local tesserae project serve instance, which calls tesserae.query.ask_project and renders the answer inline. The widget prepends the current page's node name to the user's question as a natural-language context hint (e.g. ` About <NodeName>: <question> ); a future PR can wire real subgraph scoping into ask_project` itself.
The widget detects backend availability via /api/ask/health on load. When the wiki is served statically (GitHub Pages, file://, S3, any plain static host) the widget collapses to a one-line note pointing readers at tesserae project serve for local interactive use. No requests fail and nothing blocks page rendering — the widget is a deferred JS island, separate from the heavier graph bundle.
Pair this with the multi-project registry (tesserae wiki register) and you can ask any registered project's wiki from any of its detail pages.
Collaboration principle
Tesserae remains the memory compiler. RAG-Anything remains an independent companion: a multimodal parser + LightRAG retrieval engine.