15.3 KB · updated 2026-05-19 · md

rag-anything.md

docs/integrations/rag-anything.md

RAG-Anything multimodal companion

<!-- translations:start -->

한국어 · 中文 · 日本語 · Русский · Español · Français · Deutsch

<!-- translations:end -->

RAG-Anything is a multimodal RAG framework (built on LightRAG) that parses PDFs, Office documents, images, and equations through MinerU/Docling/PaddleOCR. Tesserae integrates it both as a multimodal ingestion pipeline (UA-style native graph projection) and as a runtime memory backend alongside Cognee.

Why use both?

  • Tesserae — long-lived agent memory, wiki compilation, graph projection.
  • RAG-Anything — multimodal ingestion + LightRAG runtime retrieval.

The two complement each other: RAG-Anything brings PDF/Office/image understanding that Tesserae's text-first source loaders don't provide; Tesserae keeps the long-lived, queryable memory that survives across sessions.

Current low-friction workflow

The recommended path is the setup wizard:

tesserae project setup

For automation:

tesserae project setup \
  --yes \
  --with-raganything \
  --install-raganything \
  --raganything-parser mineru \
  --run-raganything
tesserae project compile

The setup wizard installs both raganything and docling together. MinerU stays opt-in: install it with pip install 'mineru[core]' only if you have PDFs or images to ingest.

Tesserae stores a managed refresh command rather than asking users to invent one:

tesserae project refresh-raganything --parser mineru

During compile, Tesserae:

  1. checks whether .tesserae/external/raganything/manifest.json exists and matches the current git commit (via the stored meta.json#gitCommitHash);
  2. runs the managed refresh wrapper if missing/stale or --refresh-external-tools is passed;
  3. discovers non-code sources (PDFs, Office docs, images, markdown) and parses them via the configured parser;
  4. writes manifest.json + meta.json;
  5. continues the normal memory compile.

You can force all configured external refresh commands before a compile:

tesserae project compile --refresh-external-tools

Manual equivalent

pip install 'raganything[all]'
python -m tesserae.raganything_refresh --project . --parser mineru
tesserae project compile

Compile-time vs runtime

Tesserae splits the integration cleanly:

  • Compile-time parsing (refresh-raganything and compile): runs parsers directly — native read for .md/.txt/.rst, docling.DocumentConverter for everything else. RAG-Anything's full pipeline is not invoked here, so no LLM/embedding/vision keys are needed for compile to succeed.
  • Runtime queries (project ask): raganything_query.py instantiates RAGAnything with the project's configured LLM/embedding/vision functions and runs aquery against LightRAG's store. This path requires API keys.

The split means compile is fast, deterministic, and key-free; only retrieval-time operations cost LLM tokens.

Native graph synchronization

Tesserae imports the parsed manifest natively during compile when the configured tool uses sync_mode: native_graph.

The native adapter reads .tesserae/external/raganything/manifest.json, projects each parsed document into a SourceFile node with multimodal block metadata, and writes a sync manifest:

.tesserae/external/raganything-sync.json

Current mapping:

RAG-AnythingTesserae direction
documents[*]SourceFile node, metadata.parser="raganything"
content_list[type=text]folded into SourceFile.description; concepts via existing extractor
content_list[type=image]SourceFile.metadata.multimodal_blocks[] (img_path, caption)
content_list[type=table]SourceFile.metadata.multimodal_blocks[] (table_body, caption)
content_list[type=equation]SourceFile.metadata.multimodal_blocks[] and metadata.equations[] (LaTeX preserved)

Provenance is preserved on each node:

{"system": "rag-anything", "id": "doc-<sha256>", "type": "document", "artifact": ".tesserae/external/raganything/manifest.json"}

Note: the interactive graph view hides sources-group nodes by default to focus on concepts and entities — projected raganything SourceDocuments stay in graph.json (MCP, Cognee, search, per-page wiki views still see them), they just don't flood the canvas. Set graph_view.show_sources = true in .tesserae/config.json to restore the dense view.

Runtime memory backend

memory_backends.raganything (default produced by default_raganything_backend_config) coexists with Cognee. project ask tries backends in priority order; per-project priority can be set via memory_backends.priority. RAG-Anything is opt-in (default enabled: false); the setup flag --with-raganything flips it on.

LLM provider (no API key needed)

RAG-Anything's runtime backend needs an LLM to answer queries. Tesserae defaults to its existing OAuth-based CLI integrations — no API key required:

ProviderHow it authenticatesSetup flag
codex (default)codex CLI OAuth (you logged in once with codex login)--raganything-llm-provider codex
claudeclaude -p CLI; respects CLAUDE_CONFIG_DIR for multi-account setups--raganything-llm-provider claude --raganything-claude-config-dir ~/.claude-personal2

For multi-account Claude setups (e.g., ~/.claude-personal1, ~/.claude-personal2), pass --raganything-claude-config-dir <path> at setup. The runtime backend will export CLAUDE_CONFIG_DIR=<path> before each invocation so the chosen account's auth is used without touching your default ~/.claude.

Embeddings

ProviderWhen to use
deterministic (default)No external deps. Hash-based; low semantic quality but enough for LightRAG to construct an index. Good baseline for proving the integration works.
ollamaLocal Ollama running with an embedding model (e.g., nomic-embed-text). Pass --raganything-embedding ollama; the backend defaults to http://localhost:11434.

Direct OpenAI embedding support is not wired through these flags in v1 — users with OpenAI keys can set OPENAI_API_KEY and override memory_backends.raganything.embedding.provider directly in .tesserae/config.json (RAGAnything will pick up the env var via LightRAG's defaults).

Invoking from the CLI

# Auto mode: tries RAG-Anything (when enabled), then Cognee, then compiled-wiki search.
tesserae project ask "What does the integration spec say about parser routing?"

# Force a specific backend.
tesserae project ask "..." --backend raganything
tesserae project ask "..." --backend cognee
tesserae project ask "..." --backend wiki

--backend raganything calls tesserae.raganything_query.query directly. A relative working_dir in memory_backends.raganything is resolved against the project root before the call.

Top-level ask (uses the multi-project registry)

For workflows where you want to ask across multiple registered Tesserae projects without cd-ing into each one, the top-level tesserae ask command resolves the project via the persistent registry shared with the MCP server:

# One-time: register your projects (saved to ~/.tesserae/registry.json).
tesserae wiki register ~/Developer/Projects/Tesserae --name tesserae --activate
tesserae wiki register ~/Developer/Projects/Other --name other

# List registered projects.
tesserae wiki list

# Ask the currently active project.
tesserae ask "How does the parser routing work?"

# Ask a specific registered project (no need to activate it).
tesserae ask "What is the architecture?" --wiki other

# Force a backend or pass a direct path.
tesserae ask "..." --wiki tesserae --backend raganything --json
tesserae ask "..." --project /tmp/somewhere

The dispatch logic — --project > --wiki > active project — is implemented in _top_level_ask_handler and the answer formatting / backend selection is shared with project ask and the MCP ask tool through tesserae.query.ask_project. The registry is file-backed (~/.tesserae/registry.json by default), so it persists across sessions and stays in sync with the MCP server's project list.

Querying across multiple vaults (--scope all-registered)

Bet B2 — when you have several registered projects (research vault, work vault, side-project vault) and you want to ask the same question against all of them, use --scope all-registered:

# Fan out across every registered project. The aggregated envelope is
# {"scope": "all-registered", "question": "...", "by_project": {"<alias>": <envelope>}}.
tesserae ask "What did I write about RLHF?" --scope all-registered --json

# Restrict to a hand-picked subset of aliases.
tesserae ask "..." --scope all-registered --scope-aliases research side-projects

The handler iterates registered projects in alphabetical order, calls ask_project against each, and aggregates the per-project envelopes. A single project failing — missing config, RAG-Anything not enabled, Cognee down — is captured as {"error": "..."} in that alias's slot and never aborts the rest of the fan-out. The same scope argument is accepted by the MCP ask tool, so MCP-driven coding agents get the same fan-out without extra plumbing.

Multi-project registry (tesserae wiki)

CommandPurpose
tesserae wiki list [--json]Show registered projects and which one is active.
tesserae wiki register <path> [--name <alias>] [--activate]Add a project to the registry; alias defaults to the sanitized directory name.
tesserae wiki activate <name>Mark an entry as the active project for subsequent tesserae ask calls without --wiki.
tesserae wiki unregister <name>Remove an entry; clears the active pointer when it matched.

These commands operate directly on tesserae.mcp_server.ProjectRegistry — no MCP roundtrip — so they can be scripted without running the MCP server.

Invoking from MCP

The stdio MCP server exposes an ask tool with the same backend selector:

{
  "name": "ask",
  "arguments": {
    "question": "What does the integration spec say about parser routing?",
    "backend": "auto",
    "project": "tesserae"
  }
}

The dispatch order (raganythingcognee → compiled-wiki search) and working_dir resolution mirror the CLI handler exactly, so coding agents and human operators converge on the same answers.

System prerequisites

  • Python 3.10+ is required for RAG-Anything (the upstream raganything package ≥1.3.0 transitively depends on mineru[core], which is Python 3.10+). On older Pythons Tesserae disables the integration with a clear warning rather than silently installing a broken placeholder.
  • LibreOffice for .doc/.docx/.ppt/.pptx/.xls/.xlsx parsing — install separately via your platform's package manager. RAG-Anything skips Office documents with a warning when LibreOffice is missing.
  • MinerU model weights are downloaded on first parse and cached (~GBs). Subsequent runs reuse the cache.
  • OpenAI-compatible LLM/embedding/vision keys for the runtime memory backend (OPENAI_API_KEY, OPENAI_BASE_URL). Parser-only mode does not require keys.

Parser routing

Tesserae auto-routes sources to the right parser per file extension:

ExtensionParserReason
.md, .markdown, .txt, .rstdoclingLightweight; no MinerU model download.
.doc, .docx, .ppt, .pptx, .xls, .xlsxdoclingBetter Office structure preservation per upstream.
.pdf, .png, .jpg, .jpeg, .gif, .bmp, .tiff, .webpconfigured default (--raganything-parser, default mineru)OCR + table extraction.

Override per-bucket with --text-parser and --office-parser on refresh-raganything. The configured default still applies to PDFs and images.

Before the parse loop runs, Tesserae probes whether each required parser's Python package is importable (importlib.import_module(...)) and bails fast with a single aggregated error listing every missing parser and its install command. We deliberately don't use upstream RAGAnything.check_parser_installation() because it only inspects the parser configured on the instance and folds in model-weight readiness checks that don't fit a pre-flight stage.

Tesserae also picks RAGAnything's construction-time parser from the actual routing distribution (most-common picked parser wins) rather than from --raganything-parser directly. This avoids the failure mode where RAGAnything.__init__ tries to initialize a heavy parser (e.g. mineru) whose model weights aren't yet on disk and brick the entire run before per-call parser= overrides can take effect. The --raganything-parser flag still controls the default for non-text, non-Office sources (PDFs, images).

Parser packages

The compile-time parse path uses docling.DocumentConverter directly for every non-text source; install it once and you're covered:

ParserInstall command
docling (compile-time default for everything except native text)bundled when you run --with-raganything --install-raganything (or pip install docling standalone)
paddleocr (optional OCR alternative)pip install 'raganything[paddleocr]>=1.3.0' and pip install paddlepaddle (platform-specific wheel)

Note: mineru is currently not invoked at compile-time. The compile path bypasses RAG-Anything's full pipeline (which would require LLM/embedding/vision callables) and routes every non-text source through docling directly. MinerU support is reserved for a future direct-import path that ingests an externally-produced content_list.json.

When a configured parser is missing, refresh-raganything bails fast — listing every missing parser in a single error with the right install command — instead of cascading per-file failures.

Per-page ask widget

Every detail page (concept, paper, repo, synthesis, entity, topic, question, source) includes an inline "ask about this page" widget. It POSTs to /api/ask on the local tesserae project serve instance, which calls tesserae.query.ask_project and renders the answer inline. The widget prepends the current page's node name to the user's question as a natural-language context hint (e.g. ` About <NodeName>: <question> ); a future PR can wire real subgraph scoping into ask_project` itself.

The widget detects backend availability via /api/ask/health on load. When the wiki is served statically (GitHub Pages, file://, S3, any plain static host) the widget collapses to a one-line note pointing readers at tesserae project serve for local interactive use. No requests fail and nothing blocks page rendering — the widget is a deferred JS island, separate from the heavier graph bundle.

Pair this with the multi-project registry (tesserae wiki register) and you can ask any registered project's wiki from any of its detail pages.

Collaboration principle

Tesserae remains the memory compiler. RAG-Anything remains an independent companion: a multimodal parser + LightRAG retrieval engine.