mirror of
https://github.com/openclaw/openclaw.git
synced 2026-04-19 09:18:38 +00:00
Agents: add system prompt safety guardrails (#5445)
* 🤖 agents: add system prompt safety guardrails What: - add safety guardrails to system prompt - update system prompt docs - update prompt tests Why: - discourage power-seeking or self-modification behavior - clarify safety/oversight priority when conflicts arise Tests: - pnpm lint (pass) - pnpm build (fails: DefaultResourceLoader missing in pi-coding-agent) - pnpm test (not run; build failed) * 🤖 agents: tighten safety wording for prompt guardrails What: - scope safety wording to system prompts/safety/tool policy changes - document Safety inclusion in minimal prompt mode - update safety prompt tests Why: - avoid blocking normal code changes or PR workflows - keep prompt mode docs consistent with implementation Tests: - pnpm lint (pass) - pnpm build (fails: DefaultResourceLoader missing in pi-coding-agent) - pnpm test (not run; build failed) * 🤖 docs: note safety guardrails are soft What: - document system prompt safety guardrails as advisory - add security note on prompt guardrails vs hard controls Why: - clarify threat model and operator expectations - avoid implying prompt text is an enforcement layer Tests: - pnpm lint (pass) - pnpm build (fails: DefaultResourceLoader missing in pi-coding-agent) - pnpm test (not run; build failed)
This commit is contained in:
@@ -16,6 +16,7 @@ The prompt is assembled by OpenClaw and injected into each agent run.
|
||||
The prompt is intentionally compact and uses fixed sections:
|
||||
|
||||
- **Tooling**: current tool list + short descriptions.
|
||||
- **Safety**: short guardrail reminder to avoid power-seeking behavior or bypassing oversight.
|
||||
- **Skills** (when available): tells the model how to load skill instructions on demand.
|
||||
- **OpenClaw Self-Update**: how to run `config.apply` and `update.run`.
|
||||
- **Workspace**: working directory (`agents.defaults.workspace`).
|
||||
@@ -28,6 +29,8 @@ The prompt is intentionally compact and uses fixed sections:
|
||||
- **Runtime**: host, OS, node, model, repo root (when detected), thinking level (one line).
|
||||
- **Reasoning**: current visibility level + /reasoning toggle hint.
|
||||
|
||||
Safety guardrails in the system prompt are advisory. They guide model behavior but do not enforce policy. Use tool policy, exec approvals, sandboxing, and channel allowlists for hard enforcement; operators can disable these by design.
|
||||
|
||||
## Prompt modes
|
||||
|
||||
OpenClaw can render smaller system prompts for sub-agents. The runtime sets a
|
||||
@@ -36,9 +39,9 @@ OpenClaw can render smaller system prompts for sub-agents. The runtime sets a
|
||||
- `full` (default): includes all sections above.
|
||||
- `minimal`: used for sub-agents; omits **Skills**, **Memory Recall**, **OpenClaw
|
||||
Self-Update**, **Model Aliases**, **User Identity**, **Reply Tags**,
|
||||
**Messaging**, **Silent Replies**, and **Heartbeats**. Tooling, Workspace,
|
||||
Sandbox, Current Date & Time (when known), Runtime, and injected context stay
|
||||
available.
|
||||
**Messaging**, **Silent Replies**, and **Heartbeats**. Tooling, **Safety**,
|
||||
Workspace, Sandbox, Current Date & Time (when known), Runtime, and injected
|
||||
context stay available.
|
||||
- `none`: returns only the base identity line.
|
||||
|
||||
When `promptMode=minimal`, extra injected prompts are labeled **Subagent
|
||||
|
||||
Reference in New Issue
Block a user