Offline-first MDN Web Docs RAG-MCP server ready for semantic search with hybrid vector and full‑text retrieval
Offline-first MDN Web Docs RAG-MCP server ready for semantic search with hybrid vector (1024-d) and full‑text (BM25) retrieval.

The dataset covers the core MDN documentation sections, including:
See dataset repo on HuggigFace for more details.
npx -y @deepsweet/mdn@latest downloadBoth dataset (~260 MB) and the embedding model GGUF file (~438 MB) will be downloaded directly from HugginFace and stored in its default cache location (typically ~/.cache/huggingface/), just like the hf download command does.
{
"mcpServers": {
"mdn": {
"command": "npx",
"args": [
"-y",
"@deepsweet/mdn@latest",
"server"
],
"env": {}
}
}
}[!TIP]
Remove@latestfor a full offline experience, but keep in mind that this will cache a fixed version without auto-updating.
The stdio server will spawn llama.cpp under the hood, load the embedding model (~655 MB RAM/VRAM), and query the dataset – all on demand.
| Env variable | Default value | Description |
|---|---|---|
MDN_DATASET_PATH | HuggingFace cache | Custom dataset directory path |
MDN_MODEL_PATH | HuggingFace cache | Custom model file path |
MDN_MODEL_TTL | 1800 | For how long llama.cpp with embedding model should be kept loaded in memory, in seconds; 0 to prevent unloading |
MDN_QUERY_DESCRIPTION | Natural language query for hybrid vector and full-text search | Custom search query description in case your LLM does a poor job asking the MCP tool |
MDN_SEARCH_RESULTS_LIMIT | 3 | Total search results limit |
HF_TOKEN | Optional HuggingFace access token, helps with occasional "HTTP 429 Too Many Requests" |
hf cache pruneThe RAG-MCP server itself and the processing scripts are available under MIT.
deepsweet/mdn
March 31, 2026
April 13, 2026
TypeScript