feat(secrets): expand SecretRef coverage across user-supplied credentials (#29580)

* feat(secrets): expand secret target coverage and gateway tooling * docs(secrets): align gateway and CLI secret docs * chore(protocol): regenerate swift gateway models for secrets methods * fix(config): restore talk apiKey fallback and stabilize runner test * ci(windows): reduce test worker count for shard stability * ci(windows): raise node heap for test shard stability * test(feishu): make proxy env precedence assertion windows-safe * fix(gateway): resolve auth password SecretInput refs for clients * fix(gateway): resolve remote SecretInput credentials for clients * fix(secrets): skip inactive refs in command snapshot assignments * fix(secrets): scope gateway.remote refs to effective auth surfaces * fix(secrets): ignore memory defaults when enabled agents disable search * fix(secrets): honor Google Chat serviceAccountRef inheritance * fix(secrets): address tsgo errors in command and gateway collectors * fix(secrets): avoid auth-store load in providers-only configure * fix(gateway): defer local password ref resolution by precedence * fix(secrets): gate telegram webhook secret refs by webhook mode * fix(secrets): gate slack signing secret refs to http mode * fix(secrets): skip telegram botToken refs when tokenFile is set * fix(secrets): gate discord pluralkit refs by enabled flag * fix(secrets): gate discord voice tts refs by voice enabled * test(secrets): make runtime fixture modes explicit * fix(cli): resolve local qr password secret refs * fix(cli): fail when gateway leaves command refs unresolved * fix(gateway): fail when local password SecretRef is unresolved * fix(gateway): fail when required remote SecretRefs are unresolved * fix(gateway): resolve local password refs only when password can win * fix(cli): skip local password SecretRef resolution on qr token override * test(gateway): cast SecretRef fixtures to OpenClawConfig * test(secrets): activate mode-gated targets in runtime coverage fixture * fix(cron): support SecretInput webhook tokens safely * fix(bluebubbles): support SecretInput passwords across config paths * fix(msteams): make appPassword SecretInput-safe in onboarding/token paths * fix(bluebubbles): align SecretInput schema helper typing * fix(cli): clarify secrets.resolve version-skew errors * refactor(secrets): return structured inactive paths from secrets.resolve * refactor(gateway): type onboarding secret writes as SecretInput * chore(protocol): regenerate swift models for secrets.resolve * feat(secrets): expand extension credential secretref support * fix(secrets): gate web-search refs by active provider * fix(onboarding): detect SecretRef credentials in extension status * fix(onboarding): allow keeping existing ref in secret prompt * fix(onboarding): resolve gateway password SecretRefs for probe and tui * fix(onboarding): honor secret-input-mode for local gateway auth * fix(acp): resolve gateway SecretInput credentials * fix(secrets): gate gateway.remote refs to remote surfaces * test(secrets): cover pattern matching and inactive array refs * docs(secrets): clarify secrets.resolve and remote active surfaces * fix(bluebubbles): keep existing SecretRef during onboarding * fix(tests): resolve CI type errors in new SecretRef coverage * fix(extensions): replace raw fetch with SSRF-guarded fetch * test(secrets): mark gateway remote targets active in runtime coverage * test(infra): normalize home-prefix expectation across platforms * fix(cli): only resolve local qr password refs in password mode * test(cli): cover local qr token mode with unresolved password ref * docs(cli): clarify local qr password ref resolution behavior * refactor(extensions): reuse sdk SecretInput helpers * fix(wizard): resolve onboarding env-template secrets before plaintext * fix(cli): surface secrets.resolve diagnostics in memory and qr * test(secrets): repair post-rebase runtime and fixtures * fix(gateway): skip remote password ref resolution when token wins * fix(secrets): treat tailscale remote gateway refs as active * fix(gateway): allow remote password fallback when token ref is unresolved * fix(gateway): ignore stale local password refs for none and trusted-proxy * fix(gateway): skip remote secret ref resolution on local call paths * test(cli): cover qr remote tailscale secret ref resolution * fix(secrets): align gateway password active-surface with auth inference * fix(cli): resolve inferred local gateway password refs in qr * fix(gateway): prefer resolvable remote password over token ref pre-resolution * test(gateway): cover none and trusted-proxy stale password refs * docs(secrets): sync qr and gateway active-surface behavior * fix: restore stability blockers from pre-release audit * Secrets: fix collector/runtime precedence contradictions * docs: align secrets and web credential docs * fix(rebase): resolve integration regressions after main rebase * fix(node-host): resolve gateway secret refs for auth * fix(secrets): harden secretinput runtime readers * gateway: skip inactive auth secretref resolution * cli: avoid gateway preflight for inactive secret refs * extensions: allow unresolved refs in onboarding status * tests: fix qr-cli module mock hoist ordering * Security: align audit checks with SecretInput resolution * Gateway: resolve local-mode remote fallback secret refs * Node host: avoid resolving inactive password secret refs * Secrets runtime: mark Slack appToken inactive for HTTP mode * secrets: keep inactive gateway remote refs non-blocking * cli: include agent memory secret targets in runtime resolution * docs(secrets): sync docs with active-surface and web search behavior * fix(secrets): keep telegram top-level token refs active for blank account tokens * fix(daemon): resolve gateway password secret refs for probe auth * fix(secrets): skip IRC NickServ ref resolution when NickServ is disabled * fix(secrets): align token inheritance and exec timeout defaults * docs(secrets): clarify active-surface notes in cli docs * cli: require secrets.resolve gateway capability * gateway: log auth secret surface diagnostics * secrets: remove dead provider resolver module * fix(secrets): restore gateway auth precedence and fallback resolution * fix(tests): align plugin runtime mock typings --------- Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-04-19 06:57:26 +00:00 · 2026-03-02 20:58:20 -06:00
parent f212351aed
commit 806803b7ef
236 changed files with 16810 additions and 2861 deletions
--- a/docs/gateway/secrets.md
+++ b/docs/gateway/secrets.md
@@ -1,35 +1,70 @@
 ---
 summary: "Secrets management: SecretRef contract, runtime snapshot behavior, and safe one-way scrubbing"
 read_when:
-  - Configuring SecretRefs for providers, auth profiles, skills, or Google Chat
-  - Operating secrets reload/audit/configure/apply safely in production
-  - Understanding fail-fast and last-known-good behavior
+  - Configuring SecretRefs for provider credentials and `auth-profiles.json` refs
+  - Operating secrets reload, audit, configure, and apply safely in production
+  - Understanding startup fail-fast, inactive-surface filtering, and last-known-good behavior
 title: "Secrets Management"
 ---

 # Secrets management

-OpenClaw supports additive secret references so credentials do not need to be stored as plaintext in config files.
+OpenClaw supports additive SecretRefs so supported credentials do not need to be stored as plaintext in configuration.

-Plaintext still works. Secret refs are optional.
+Plaintext still works. SecretRefs are opt-in per credential.

 ## Goals and runtime model

 Secrets are resolved into an in-memory runtime snapshot.

 - Resolution is eager during activation, not lazy on request paths.
- Startup fails fast if any referenced credential cannot be resolved.
- Reload uses atomic swap: full success or keep last-known-good.
- Runtime requests read from the active in-memory snapshot.
+- Startup fails fast when an effectively active SecretRef cannot be resolved.
+- Reload uses atomic swap: full success, or keep the last-known-good snapshot.
+- Runtime requests read from the active in-memory snapshot only.

-This keeps secret-provider outages off the hot request path.
+This keeps secret-provider outages off hot request paths.
+
+## Active-surface filtering
+
+SecretRefs are validated only on effectively active surfaces.
+
+- Enabled surfaces: unresolved refs block startup/reload.
+- Inactive surfaces: unresolved refs do not block startup/reload.
+- Inactive refs emit non-fatal diagnostics with code `SECRETS_REF_IGNORED_INACTIVE_SURFACE`.
+
+Examples of inactive surfaces:
+
+- Disabled channel/account entries.
+- Top-level channel credentials that no enabled account inherits.
+- Disabled tool/feature surfaces.
+- Web search provider-specific keys that are not selected by `tools.web.search.provider`.
+  In auto mode (provider unset), provider-specific keys are also active for provider auto-detection.
+- `gateway.remote.token` / `gateway.remote.password` SecretRefs are active (when `gateway.remote.enabled` is not `false`) if one of these is true:
+  - `gateway.mode=remote`
+  - `gateway.remote.url` is configured
+  - `gateway.tailscale.mode` is `serve` or `funnel`
+    In local mode without those remote surfaces:
+  - `gateway.remote.token` is active when token auth can win and no env/auth token is configured.
+  - `gateway.remote.password` is active only when password auth can win and no env/auth password is configured.
+
+## Gateway auth surface diagnostics
+
+When a SecretRef is configured on `gateway.auth.password`, `gateway.remote.token`, or
+`gateway.remote.password`, gateway startup/reload logs the surface state explicitly:
+
+- `active`: the SecretRef is part of the effective auth surface and must resolve.
+- `inactive`: the SecretRef is ignored for this runtime because another auth surface wins, or
+  because remote auth is disabled/not active.
+
+These entries are logged with `SECRETS_GATEWAY_AUTH_SURFACE` and include the reason used by the
+active-surface policy, so you can see why a credential was treated as active or inactive.

 ## Onboarding reference preflight

-When onboarding runs in interactive mode and you choose secret reference storage, OpenClaw performs a fast preflight check before saving:
+When onboarding runs in interactive mode and you choose SecretRef storage, OpenClaw runs preflight validation before saving:

 - Env refs: validates env var name and confirms a non-empty value is visible during onboarding.
- Provider refs (`file` or `exec`): validates the selected provider, resolves the provided `id`, and checks value type.
+- Provider refs (`file` or `exec`): validates provider selection, resolves `id`, and checks resolved value type.

 If validation fails, onboarding shows the error and lets you retry.

@@ -122,22 +157,24 @@ Define providers under `secrets.providers`:
 - `mode: "json"` expects JSON object payload and resolves `id` as pointer.
 - `mode: "singleValue"` expects ref id `"value"` and returns file contents.
 - Path must pass ownership/permission checks.
+- Windows fail-closed note: if ACL verification is unavailable for a path, resolution fails. For trusted paths only, set `allowInsecurePath: true` on that provider to bypass path security checks.

 ### Exec provider

 - Runs configured absolute binary path, no shell.
 - By default, `command` must point to a regular file (not a symlink).
 - Set `allowSymlinkCommand: true` to allow symlink command paths (for example Homebrew shims). OpenClaw validates the resolved target path.
- Enable `allowSymlinkCommand` only when required for trusted package-manager paths, and pair it with `trustedDirs` (for example `["/opt/homebrew"]`).
- When `trustedDirs` is set, checks apply to the resolved target path.
+- Pair `allowSymlinkCommand` with `trustedDirs` for package-manager paths (for example `["/opt/homebrew"]`).
 - Supports timeout, no-output timeout, output byte limits, env allowlist, and trusted dirs.
- Request payload (stdin):
+- Windows fail-closed note: if ACL verification is unavailable for the command path, resolution fails. For trusted paths only, set `allowInsecurePath: true` on that provider to bypass path security checks.
+
+Request payload (stdin):

 ```json
 { "protocolVersion": 1, "provider": "vault", "ids": ["providers/openai/apiKey"] }
 ```

- Response payload (stdout):
+Response payload (stdout):

 ```json
 { "protocolVersion": 1, "values": { "providers/openai/apiKey": "sk-..." } }
@@ -242,37 +279,33 @@ Optional per-id errors:
 }
 ```

-## In-scope fields (v1)
+## Supported credential surface

-### `~/.openclaw/openclaw.json`
+Canonical supported and unsupported credentials are listed in:

- `models.providers.<provider>.apiKey`
- `skills.entries.<skillKey>.apiKey`
- `channels.googlechat.serviceAccount`
- `channels.googlechat.serviceAccountRef`
- `channels.googlechat.accounts.<accountId>.serviceAccount`
- `channels.googlechat.accounts.<accountId>.serviceAccountRef`
+- [SecretRef Credential Surface](/reference/secretref-credential-surface)

-### `~/.openclaw/agents/<agentId>/agent/auth-profiles.json`
-
- `profiles.<profileId>.keyRef` for `type: "api_key"`
- `profiles.<profileId>.tokenRef` for `type: "token"`
-
-OAuth credential storage changes are out of scope.
+Runtime-minted or rotating credentials and OAuth refresh material are intentionally excluded from read-only SecretRef resolution.

 ## Required behavior and precedence

- Field without ref: unchanged.
- Field with ref: required at activation time.
- If plaintext and ref both exist, ref wins at runtime and plaintext is ignored.
+- Field without a ref: unchanged.
+- Field with a ref: required on active surfaces during activation.
+- If both plaintext and ref are present, ref takes precedence on supported precedence paths.

-Warning code:
+Warning and audit signals:

- `SECRETS_REF_OVERRIDES_PLAINTEXT`
+- `SECRETS_REF_OVERRIDES_PLAINTEXT` (runtime warning)
+- `REF_SHADOWED` (audit finding when `auth-profiles.json` credentials take precedence over `openclaw.json` refs)
+
+Google Chat compatibility behavior:
+
+- `serviceAccountRef` takes precedence over plaintext `serviceAccount`.
+- Plaintext value is ignored when sibling ref is set.

 ## Activation triggers

-Secret activation is attempted on:
+Secret activation runs on:

 - Startup (preflight plus final activation)
 - Config reload hot-apply path
@@ -283,9 +316,9 @@ Activation contract:

 - Success swaps the snapshot atomically.
 - Startup failure aborts gateway startup.
- Runtime reload failure keeps last-known-good snapshot.
+- Runtime reload failure keeps the last-known-good snapshot.

-## Degraded and recovered operator signals
+## Degraded and recovered signals

 When reload-time activation fails after a healthy state, OpenClaw enters degraded secrets state.

@@ -297,13 +330,22 @@ One-shot system event and log codes:
 Behavior:

 - Degraded: runtime keeps last-known-good snapshot.
- Recovered: emitted once after a successful activation.
+- Recovered: emitted once after the next successful activation.
 - Repeated failures while already degraded log warnings but do not spam events.
- Startup fail-fast does not emit degraded events because no runtime snapshot exists yet.
+- Startup fail-fast does not emit degraded events because runtime never became active.
+
+## Command-path resolution
+
+Credential-sensitive command paths that opt in (for example `openclaw memory` remote-memory paths and `openclaw qr --remote`) can resolve supported SecretRefs via gateway snapshot RPC.
+
+- When gateway is running, those command paths read from the active snapshot.
+- If a configured SecretRef is required and gateway is unavailable, command resolution fails fast with actionable diagnostics.
+- Snapshot refresh after backend secret rotation is handled by `openclaw secrets reload`.
+- Gateway RPC method used by these command paths: `secrets.resolve`.

 ## Audit and configure workflow

-Use this default operator flow:
+Default operator flow:

 ```bash
 openclaw secrets audit --check
@@ -311,26 +353,22 @@ openclaw secrets configure
 openclaw secrets audit --check
 ```

-Migration completeness:
-
- Include `skills.entries.<skillKey>.apiKey` targets when those skills use API keys.
- If `audit --check` still reports plaintext findings after a partial migration, migrate the remaining reported paths and rerun audit.
-
 ### `secrets audit`

 Findings include:

 - plaintext values at rest (`openclaw.json`, `auth-profiles.json`, `.env`)
 - unresolved refs
- precedence shadowing (`auth-profiles` taking priority over config refs)
- legacy residues (`auth.json`, OAuth out-of-scope reminders)
+- precedence shadowing (`auth-profiles.json` taking priority over `openclaw.json` refs)
+- legacy residues (`auth.json`, OAuth reminders)

 ### `secrets configure`

 Interactive helper that:

 - configures `secrets.providers` first (`env`/`file`/`exec`, add/edit/remove)
- lets you select secret-bearing fields in `openclaw.json`
+- lets you select supported secret-bearing fields in `openclaw.json` plus `auth-profiles.json` for one agent scope
+- can create a new `auth-profiles.json` mapping directly in the target picker
 - captures SecretRef details (`source`, `provider`, `id`)
 - runs preflight resolution
 - can apply immediately
@@ -339,10 +377,11 @@ Helpful modes:

 - `openclaw secrets configure --providers-only`
 - `openclaw secrets configure --skip-provider-setup`
+- `openclaw secrets configure --agent <id>`

-`configure` apply defaults to:
+`configure` apply defaults:

- scrub matching static creds from `auth-profiles.json` for targeted providers
+- scrub matching static credentials from `auth-profiles.json` for targeted providers
 - scrub legacy static `api_key` entries from `auth.json`
 - scrub matching known secret lines from `<config-dir>/.env`

@@ -361,26 +400,31 @@ For strict target/path contract details and exact rejection rules, see:

 ## One-way safety policy

-OpenClaw intentionally does **not** write rollback backups that contain pre-migration plaintext secret values.
+OpenClaw intentionally does not write rollback backups containing historical plaintext secret values.

 Safety model:

 - preflight must succeed before write mode
 - runtime activation is validated before commit
- apply updates files using atomic file replacement and best-effort in-memory restore on failure
+- apply updates files using atomic file replacement and best-effort restore on failure

-## `auth.json` compatibility notes
+## Legacy auth compatibility notes

-For static credentials, OpenClaw runtime no longer depends on plaintext `auth.json`.
+For static credentials, runtime no longer depends on plaintext legacy auth storage.

 - Runtime credential source is the resolved in-memory snapshot.
- Legacy `auth.json` static `api_key` entries are scrubbed when discovered.
- OAuth-related legacy compatibility behavior remains separate.
+- Legacy static `api_key` entries are scrubbed when discovered.
+- OAuth-related compatibility behavior remains separate.
+
+## Web UI note
+
+Some SecretInput unions are easier to configure in raw editor mode than in form mode.

 ## Related docs

 - CLI commands: [secrets](/cli/secrets)
 - Plan contract details: [Secrets Apply Plan Contract](/gateway/secrets-plan-contract)
+- Credential surface: [SecretRef Credential Surface](/reference/secretref-credential-surface)
 - Auth setup: [Authentication](/gateway/authentication)
 - Security posture: [Security](/gateway/security)
 - Environment precedence: [Environment Variables](/help/environment)