feat (memory): Implement new (opt-in) QMD memory backend

2026-04-19 01:57:27 +00:00 · 2026-01-27 21:57:15 -08:00
parent e9f182def7
commit 5d3af3bc62
24 changed files with 1828 additions and 601 deletions
--- a/docs/concepts/memory.md
+++ b/docs/concepts/memory.md
@@ -1,14 +1,12 @@
 ---
-summary: "How OpenClaw memory works (workspace files + automatic memory flush)"
+summary: "How Moltbot memory works (workspace files + automatic memory flush)"
 read_when:
  - You want the memory file layout and workflow
  - You want to tune the automatic pre-compaction memory flush
-title: "Memory"
 ---
-
 # Memory

-OpenClaw memory is **plain Markdown in the agent workspace**. The files are the
+Moltbot memory is **plain Markdown in the agent workspace**. The files are the
 source of truth; the model only "remembers" what gets written to disk.

 Memory search tools are provided by the active memory plugin (default:
@@ -26,7 +24,7 @@ The default workspace layout uses two memory layers:
  - **Only load in the main, private session** (never in group contexts).

 These files live under the workspace (`agents.defaults.workspace`, default
-`~/.openclaw/workspace`). See [Agent workspace](/concepts/agent-workspace) for the full layout.
+`~/clawd`). See [Agent workspace](/concepts/agent-workspace) for the full layout.

 ## When to write memory

@@ -38,9 +36,9 @@ These files live under the workspace (`agents.defaults.workspace`, default

 ## Automatic memory flush (pre-compaction ping)

-When a session is **close to auto-compaction**, OpenClaw triggers a **silent,
+When a session is **close to auto-compaction**, Moltbot triggers a **silent,
 agentic turn** that reminds the model to write durable memory **before** the
-context is compacted. The default prompts explicitly say the model _may reply_,
+context is compacted. The default prompts explicitly say the model *may reply*,
 but usually `NO_REPLY` is the correct response so the user never sees this turn.

 This is controlled by `agents.defaults.compaction.memoryFlush`:
@@ -55,16 +53,15 @@ This is controlled by `agents.defaults.compaction.memoryFlush`:
          enabled: true,
          softThresholdTokens: 4000,
          systemPrompt: "Session nearing compaction. Store durable memories now.",
-          prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store.",
-        },
-      },
-    },
-  },
+          prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store."
+        }
+      }
+    }
+  }
 }
 ```

 Details:
-
 - **Soft threshold**: flush triggers when the session token estimate crosses
  `contextWindow - reserveTokensFloor - softThresholdTokens`.
 - **Silent** by default: prompts include `NO_REPLY` so nothing is delivered.
@@ -78,15 +75,13 @@ For the full compaction lifecycle, see

 ## Vector memory search

-OpenClaw can build a small vector index over `MEMORY.md` and `memory/*.md` (plus
-any extra directories or files you opt in) so semantic queries can find related
-notes even when wording differs.
+Moltbot can build a small vector index over `MEMORY.md` and `memory/*.md` so
+semantic queries can find related notes even when wording differs.

 Defaults:
-
 - Enabled by default.
 - Watches memory files for changes (debounced).
- Uses remote embeddings by default. If `memorySearch.provider` is not set, OpenClaw auto-selects:
+- Uses remote embeddings by default. If `memorySearch.provider` is not set, Moltbot auto-selects:
  1. `local` if a `memorySearch.local.modelPath` is configured and the file exists.
  2. `openai` if an OpenAI key can be resolved.
  3. `gemini` if a Gemini key can be resolved.
@@ -94,13 +89,91 @@ Defaults:
 - Local mode uses node-llama-cpp and may require `pnpm approve-builds`.
 - Uses sqlite-vec (when available) to accelerate vector search inside SQLite.

-Remote embeddings **require** an API key for the embedding provider. OpenClaw
+Remote embeddings **require** an API key for the embedding provider. Moltbot
 resolves keys from auth profiles, `models.providers.*.apiKey`, or environment
 variables. Codex OAuth only covers chat/completions and does **not** satisfy
 embeddings for memory search. For Gemini, use `GEMINI_API_KEY` or
 `models.providers.google.apiKey`. When using a custom OpenAI-compatible endpoint,
 set `memorySearch.remote.apiKey` (and optional `memorySearch.remote.headers`).

+### QMD backend (experimental)
+
+Set `memory.backend = "qmd"` to swap the built-in SQLite indexer for
+[QMD](https://github.com/tobi/qmd): a local-first search sidecar that combines
+BM25 + vectors + reranking. Markdown stays the source of truth; Moltbot shells
+out to QMD for retrieval. Key points:
+
+**Prereqs**
+- Disabled by default. Opt in per-config (`memory.backend = "qmd"`).
+- Install the QMD CLI separately (`bun install -g github.com/tobi/qmd` or grab
+  a release) and make sure the `qmd` binary is on the gateway’s `PATH`.
+- QMD needs an SQLite build that allows extensions (`brew install sqlite` on
+  macOS). The gateway sets `INDEX_PATH`/`QMD_CONFIG_DIR` automatically.
+
+**How the sidecar runs**
+- The gateway writes a self-contained QMD home under
+  `~/.clawdbot/agents/<agentId>/qmd/` (config + cache + sqlite DB).
+- Collections are rewritten from `memory.qmd.paths` (plus default workspace
+  memory files) into `index.yml`, then `qmd update` + `qmd embed` run on boot and
+  on a configurable interval (`memory.qmd.update.interval`, default 5 m).
+- Searches run via `qmd query --json`. If QMD fails or the binary is missing,
+  Moltbot automatically falls back to the builtin SQLite manager so memory tools
+  keep working.
+
+**Config surface (`memory.qmd.*`)**
+- `command` (default `qmd`): override the executable path.
+- `includeDefaultMemory` (default `true`): auto-index `MEMORY.md` + `memory/**/*.md`.
+- `paths[]`: add extra directories/files (`path`, optional `pattern`, optional
+  stable `name`).
+- `sessions`: opt into session JSONL indexing (`enabled`, `retentionDays`,
+  `exportDir`, `redactToolOutputs`—defaults to redacting tool payloads).
+- `update`: controls refresh cadence (`interval`, `debounceMs`, `onBoot`).
+- `limits`: clamp recall payload (`maxResults`, `maxSnippetChars`,
+  `maxInjectedChars`, `timeoutMs`).
+- `scope`: same schema as [`session.sendPolicy`](/reference/configuration#session-sendpolicy).
+  Default is DM-only (`deny` all, `allow` direct chats); loosen it to surface QMD
+  hits in groups/channels.
+- Snippets sourced outside the workspace show up as
+  `qmd/<collection>/<relative-path>` in `memory_search` results; `memory_get`
+  understands that prefix and reads from the configured QMD collection root.
+- When `memory.qmd.sessions.enabled = true`, Moltbot exports sanitized session
+  transcripts (User/Assistant turns) into a dedicated QMD collection under
+  `~/.clawdbot/agents/<id>/qmd/sessions/`, so `memory_search` can recall recent
+  conversations without touching the builtin SQLite index.
+- `memory_search` snippets now include a `Source: <path#line>` footer when
+  `memory.citations` is `auto`/`on`; set `memory.citations = "off"` to keep
+  the path metadata internal (the agent still receives the path for
+  `memory_get`, but the snippet text omits the footer and the system prompt
+  warns the agent not to cite it).
+
+**Example**
+
+```json5
+memory: {
+  backend: "qmd",
+  citations: "auto",
+  qmd: {
+    includeDefaultMemory: true,
+    update: { interval: "5m", debounceMs: 15000 },
+    limits: { maxResults: 6, timeoutMs: 4000 },
+    scope: {
+      default: "deny",
+      rules: [{ action: "allow", match: { chatType: "direct" } }]
+    },
+    paths: [
+      { name: "docs", path: "~/notes", pattern: "**/*.md" }
+    ]
+  }
+}
+```
+
+**Citations & fallback**
+- `memory.citations` applies regardless of backend (`auto`/`on`/`off`).
+- When `qmd` runs, we tag `status().backend = "qmd"` so diagnostics show which
+  engine served the results. If the QMD subprocess exits or JSON output can’t be
+  parsed, the search manager logs a warning and returns the builtin provider
+  (existing Markdown embeddings) until QMD recovers.
+
 ### Additional memory paths

 If you want to index Markdown files outside the default workspace layout, add
@@ -142,7 +215,6 @@ agents: {
 ```

 Notes:
-
 - `remote.baseUrl` is optional (defaults to the Gemini API base URL).
 - `remote.headers` lets you add extra headers if needed.
 - Default model: `gemini-embedding-001`.
@@ -170,12 +242,10 @@ If you don't want to set an API key, use `memorySearch.provider = "local"` or se
 `memorySearch.fallback = "none"`.

 Fallbacks:
-
 - `memorySearch.fallback` can be `openai`, `gemini`, `local`, or `none`.
 - The fallback provider is only used when the primary embedding provider fails.

 Batch indexing (OpenAI + Gemini):
-
 - Enabled by default for OpenAI and Gemini embeddings. Set `agents.defaults.memorySearch.remote.batch.enabled = false` to disable.
 - Default behavior waits for batch completion; tune `remote.batch.wait`, `remote.batch.pollIntervalMs`, and `remote.batch.timeoutMinutes` if needed.
 - Set `remote.batch.concurrency` to control how many batch jobs we submit in parallel (default: 2).
@@ -183,7 +253,6 @@ Batch indexing (OpenAI + Gemini):
 - Gemini batch jobs use the async embeddings batch endpoint and require Gemini Batch API availability.

 Why OpenAI batch is fast + cheap:
-
 - For large backfills, OpenAI is typically the fastest option we support because we can submit many embedding requests in a single batch job and let OpenAI process them asynchronously.
 - OpenAI offers discounted pricing for Batch API workloads, so large indexing runs are usually cheaper than sending the same requests synchronously.
 - See the OpenAI Batch API docs and pricing for details:
@@ -209,12 +278,10 @@ agents: {
 ```

 Tools:
-
 - `memory_search` — returns snippets with file + line ranges.
 - `memory_get` — read memory file content by path.

 Local mode:
-
 - Set `agents.defaults.memorySearch.provider = "local"`.
 - Provide `agents.defaults.memorySearch.local.modelPath` (GGUF or `hf:` URI).
 - Optional: set `agents.defaults.memorySearch.fallback = "none"` to avoid remote fallback.
@@ -222,34 +289,31 @@ Local mode:
 ### How the memory tools work

 - `memory_search` semantically searches Markdown chunks (~400 token target, 80-token overlap) from `MEMORY.md` + `memory/**/*.md`. It returns snippet text (capped ~700 chars), file path, line range, score, provider/model, and whether we fell back from local → remote embeddings. No full file payload is returned.
- `memory_get` reads a specific memory Markdown file (workspace-relative), optionally from a starting line and for N lines. Paths outside `MEMORY.md` / `memory/` are allowed only when explicitly listed in `memorySearch.extraPaths`.
+- `memory_get` reads a specific memory Markdown file (workspace-relative), optionally from a starting line and for N lines. Paths outside `MEMORY.md` / `memory/` are rejected.
 - Both tools are enabled only when `memorySearch.enabled` resolves true for the agent.

 ### What gets indexed (and when)

- File type: Markdown only (`MEMORY.md`, `memory/**/*.md`, plus any `.md` files under `memorySearch.extraPaths`).
- Index storage: per-agent SQLite at `~/.openclaw/memory/<agentId>.sqlite` (configurable via `agents.defaults.memorySearch.store.path`, supports `{agentId}` token).
- Freshness: watcher on `MEMORY.md`, `memory/`, and `memorySearch.extraPaths` marks the index dirty (debounce 1.5s). Sync is scheduled on session start, on search, or on an interval and runs asynchronously. Session transcripts use delta thresholds to trigger background sync.
- Reindex triggers: the index stores the embedding **provider/model + endpoint fingerprint + chunking params**. If any of those change, OpenClaw automatically resets and reindexes the entire store.
+- File type: Markdown only (`MEMORY.md`, `memory/**/*.md`).
+- Index storage: per-agent SQLite at `~/.clawdbot/memory/<agentId>.sqlite` (configurable via `agents.defaults.memorySearch.store.path`, supports `{agentId}` token).
+- Freshness: watcher on `MEMORY.md` + `memory/` marks the index dirty (debounce 1.5s). Sync is scheduled on session start, on search, or on an interval and runs asynchronously. Session transcripts use delta thresholds to trigger background sync.
+- Reindex triggers: the index stores the embedding **provider/model + endpoint fingerprint + chunking params**. If any of those change, Moltbot automatically resets and reindexes the entire store.

 ### Hybrid search (BM25 + vector)

-When enabled, OpenClaw combines:
-
+When enabled, Moltbot combines:
 - **Vector similarity** (semantic match, wording can differ)
 - **BM25 keyword relevance** (exact tokens like IDs, env vars, code symbols)

-If full-text search is unavailable on your platform, OpenClaw falls back to vector-only search.
+If full-text search is unavailable on your platform, Moltbot falls back to vector-only search.

 #### Why hybrid?

 Vector search is great at “this means the same thing”:
-
 - “Mac Studio gateway host” vs “the machine running the gateway”
 - “debounce file updates” vs “avoid indexing on every write”

 But it can be weak at exact, high-signal tokens:
-
 - IDs (`a828e60`, `b3b9895a…`)
 - code symbols (`memorySearch.query.hybrid`)
 - error strings (“sqlite-vec unavailable”)
@@ -262,21 +326,17 @@ good results for both “natural language” queries and “needle in a haystack

 Implementation sketch:

-1. Retrieve a candidate pool from both sides:
-
+1) Retrieve a candidate pool from both sides:
 - **Vector**: top `maxResults * candidateMultiplier` by cosine similarity.
 - **BM25**: top `maxResults * candidateMultiplier` by FTS5 BM25 rank (lower is better).

-2. Convert BM25 rank into a 0..1-ish score:
-
+2) Convert BM25 rank into a 0..1-ish score:
 - `textScore = 1 / (1 + max(0, bm25Rank))`

-3. Union candidates by chunk id and compute a weighted score:
-
+3) Union candidates by chunk id and compute a weighted score:
 - `finalScore = vectorWeight * vectorScore + textWeight * textScore`

 Notes:
-
 - `vectorWeight` + `textWeight` is normalized to 1.0 in config resolution, so weights behave as percentages.
 - If embeddings are unavailable (or the provider returns a zero-vector), we still run BM25 and return keyword matches.
 - If FTS5 can’t be created, we keep vector-only search (no hard failure).
@@ -306,7 +366,7 @@ agents: {

 ### Embedding cache

-OpenClaw can cache **chunk embeddings** in SQLite so reindexing and frequent updates (especially session transcripts) don't re-embed unchanged text.
+Moltbot can cache **chunk embeddings** in SQLite so reindexing and frequent updates (especially session transcripts) don't re-embed unchanged text.

 Config:

@@ -340,13 +400,12 @@ agents: {
 ```

 Notes:
-
 - Session indexing is **opt-in** (off by default).
 - Session updates are debounced and **indexed asynchronously** once they cross delta thresholds (best-effort).
 - `memory_search` never blocks on indexing; results can be slightly stale until background sync finishes.
 - Results still include snippets only; `memory_get` remains limited to memory files.
 - Session indexing is isolated per agent (only that agent’s session logs are indexed).
- Session logs live on disk (`~/.openclaw/agents/<agentId>/sessions/*.jsonl`). Any process/user with filesystem access can read them, so treat disk access as the trust boundary. For stricter isolation, run agents under separate OS users or hosts.
+- Session logs live on disk (`~/.clawdbot/agents/<agentId>/sessions/*.jsonl`). Any process/user with filesystem access can read them, so treat disk access as the trust boundary. For stricter isolation, run agents under separate OS users or hosts.

 Delta thresholds (defaults shown):

@@ -367,7 +426,7 @@ agents: {

 ### SQLite vector acceleration (sqlite-vec)

-When the sqlite-vec extension is available, OpenClaw stores embeddings in a
+When the sqlite-vec extension is available, Moltbot stores embeddings in a
 SQLite virtual table (`vec0`) and performs vector distance queries in the
 database. This keeps search fast without loading every embedding into JS.

@@ -389,10 +448,9 @@ agents: {
 ```

 Notes:
-
 - `enabled` defaults to true; when disabled, search falls back to in-process
  cosine similarity over stored embeddings.
- If the sqlite-vec extension is missing or fails to load, OpenClaw logs the
+- If the sqlite-vec extension is missing or fails to load, Moltbot logs the
  error and continues with the JS fallback (no vector table).
 - `extensionPath` overrides the bundled sqlite-vec path (useful for custom builds
  or non-standard install locations).
@@ -426,6 +484,5 @@ agents: {
 ```

 Notes:
-
 - `remote.*` takes precedence over `models.providers.openai.*`.
 - `remote.headers` merge with OpenAI headers; remote wins on key conflicts. Omit `remote.headers` to use the OpenAI defaults.