Agents: add system prompt safety guardrails (#5445)

* 🤖 agents: add system prompt safety guardrails

What:
- add safety guardrails to system prompt
- update system prompt docs
- update prompt tests

Why:
- discourage power-seeking or self-modification behavior
- clarify safety/oversight priority when conflicts arise

Tests:
- pnpm lint (pass)
- pnpm build (fails: DefaultResourceLoader missing in pi-coding-agent)
- pnpm test (not run; build failed)

* 🤖 agents: tighten safety wording for prompt guardrails

What:
- scope safety wording to system prompts/safety/tool policy changes
- document Safety inclusion in minimal prompt mode
- update safety prompt tests

Why:
- avoid blocking normal code changes or PR workflows
- keep prompt mode docs consistent with implementation

Tests:
- pnpm lint (pass)
- pnpm build (fails: DefaultResourceLoader missing in pi-coding-agent)
- pnpm test (not run; build failed)

* 🤖 docs: note safety guardrails are soft

What:
- document system prompt safety guardrails as advisory
- add security note on prompt guardrails vs hard controls

Why:
- clarify threat model and operator expectations
- avoid implying prompt text is an enforcement layer

Tests:
- pnpm lint (pass)
- pnpm build (fails: DefaultResourceLoader missing in pi-coding-agent)
- pnpm test (not run; build failed)
This commit is contained in:
Josh Palmer
2026-01-31 15:50:15 +01:00
committed by GitHub
parent 75093ebe1c
commit 7a6c40872d
5 changed files with 41 additions and 6 deletions

View File

@@ -260,7 +260,7 @@ This ensures OpenClaw's policy filtering, sandbox integration, and extended tool
## System Prompt Construction
The system prompt is built in `buildAgentSystemPrompt()` (`system-prompt.ts`). It assembles a full prompt with sections including Tooling, Tool Call Style, OpenClaw CLI reference, Skills, Docs, Workspace, Sandbox, Messaging, Reply Tags, Voice, Silent Replies, Heartbeats, Runtime metadata, plus Memory and Reactions when enabled, and optional context files and extra system prompt content. Sections are trimmed for minimal prompt mode used by subagents.
The system prompt is built in `buildAgentSystemPrompt()` (`system-prompt.ts`). It assembles a full prompt with sections including Tooling, Tool Call Style, Safety guardrails, OpenClaw CLI reference, Skills, Docs, Workspace, Sandbox, Messaging, Reply Tags, Voice, Silent Replies, Heartbeats, Runtime metadata, plus Memory and Reactions when enabled, and optional context files and extra system prompt content. Sections are trimmed for minimal prompt mode used by subagents.
The prompt is passed to pi via `systemPrompt` override: