mirror of
https://github.com/openclaw/openclaw.git
synced 2026-04-19 08:58:37 +00:00
feat: add PDF analysis tool with native provider support (#31319)
* feat: add PDF analysis tool with native provider support New `pdf` tool for analyzing PDF documents with model-powered analysis. Architecture: - Native PDF path: sends raw PDF bytes directly to providers that support inline document input (Anthropic via DocumentBlockParam, Google Gemini via inlineData with application/pdf MIME type) - Extraction fallback: for providers without native PDF support, extracts text via pdfjs-dist and rasterizes pages to images via @napi-rs/canvas, then sends through the standard vision/text completion path Key features: - Single PDF (`pdf` param) or multiple PDFs (`pdfs` array, up to 10) - Page range selection (`pages` param, e.g. "1-5", "1,3,7-9") - Model override (`model` param) and file size limits (`maxBytesMb`) - Auto-detects provider capability and falls back gracefully - Same security patterns as image tool (SSRF guards, sandbox support, local path roots, workspace-only policy) Config (agents.defaults): - pdfModel: primary/fallbacks (defaults to imageModel, then session model) - pdfMaxBytesMb: max PDF file size (default: 10) - pdfMaxPages: max pages to process (default: 20) Model catalog: - Extended ModelInputType to include "document" alongside "text"/"image" - Added modelSupportsDocument() capability check Files: - src/agents/tools/pdf-tool.ts - main tool factory - src/agents/tools/pdf-tool.helpers.ts - helpers (page range, config, etc.) - src/agents/tools/pdf-native-providers.ts - direct API calls for Anthropic/Google - src/agents/tools/pdf-tool.test.ts - 43 tests covering all paths - Modified: model-catalog.ts, openclaw-tools.ts, config schema/types/labels/help * fix: prepare pdf tool for merge (#31319) (thanks @tyler6204)
This commit is contained in:
@@ -835,6 +835,12 @@ Time format in system prompt. Default: `auto` (OS preference).
|
||||
primary: "openrouter/qwen/qwen-2.5-vl-72b-instruct:free",
|
||||
fallbacks: ["openrouter/google/gemini-2.0-flash-vision:free"],
|
||||
},
|
||||
pdfModel: {
|
||||
primary: "anthropic/claude-opus-4-6",
|
||||
fallbacks: ["openai/gpt-5-mini"],
|
||||
},
|
||||
pdfMaxBytesMb: 10,
|
||||
pdfMaxPages: 20,
|
||||
thinkingDefault: "low",
|
||||
verboseDefault: "off",
|
||||
elevatedDefault: "on",
|
||||
@@ -853,6 +859,11 @@ Time format in system prompt. Default: `auto` (OS preference).
|
||||
- `imageModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
|
||||
- Used by the `image` tool path as its vision-model config.
|
||||
- Also used as fallback routing when the selected/default model cannot accept image input.
|
||||
- `pdfModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
|
||||
- Used by the `pdf` tool for model routing.
|
||||
- If omitted, the PDF tool falls back to `imageModel`, then to best-effort provider defaults.
|
||||
- `pdfMaxBytesMb`: default PDF size limit for the `pdf` tool when `maxBytesMb` is not passed at call time.
|
||||
- `pdfMaxPages`: default maximum pages considered by extraction fallback mode in the `pdf` tool.
|
||||
- `model.primary`: format `provider/model` (e.g. `anthropic/claude-opus-4-6`). If you omit the provider, OpenClaw assumes `anthropic` (deprecated).
|
||||
- `models`: the configured model catalog and allowlist for `/model`. Each entry can include `alias` (shortcut) and `params` (provider-specific, for example `temperature`, `maxTokens`, `cacheRetention`, `context1m`).
|
||||
- `params` merge precedence (config): `agents.defaults.models["provider/model"].params` is the base, then `agents.list[].params` (matching agent id) overrides by key.
|
||||
|
||||
Reference in New Issue
Block a user