When echoTranscript is enabled in tools.media.audio config, the
transcription text is sent back to the originating chat immediately
after successful audio transcription — before the agent processes it.
This lets users verify what was heard from their voice note.
Changes:
- config/types.tools.ts: add echoTranscript (bool) and echoFormat
(string template) to MediaUnderstandingConfig
- media-understanding/apply.ts: sendTranscriptEcho() helper that
resolves channel/to from ctx, guards on isDeliverableMessageChannel,
and calls deliverOutboundPayloads best-effort
- config/schema.help.ts: help text for both new fields
- config/schema.labels.ts: labels for both new fields
- media-understanding/apply.echo-transcript.test.ts: 10 vitest cases
covering disabled/enabled/custom-format/no-audio/failed-transcription/
non-deliverable-channel/missing-from/OriginatingTo/delivery-failure
Default echoFormat: '📝 "{transcript}"'
Closes#32102
Expose audio transcription through the PluginRuntime so external
plugins (e.g. marmot) can use openclaw's media-understanding provider
framework without importing unexported internal modules.
The new transcribeAudioFile() wraps runCapability({capability: "audio"})
and reads provider/model/apiKey from tools.media.audio in the config,
matching the pattern used by the Discord VC implementation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Increase test audio file sizes to meet MIN_AUDIO_FILE_BYTES (1024) threshold
introduced by the skip-empty-audio feature. Fix localPathRoots in skip-tiny-audio
tests so temp files pass path validation. Remove undefined loadApply() call
in apply.test.ts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a minimum file size guard (MIN_AUDIO_FILE_BYTES = 1024) before
sending audio to transcription APIs. Files below this threshold are
almost certainly empty or corrupt and would cause unhelpful errors
from Whisper/Deepgram/Groq providers.
Changes:
- Add 'tooSmall' skip reason to MediaUnderstandingSkipError
- Add MIN_AUDIO_FILE_BYTES constant (1024 bytes) to defaults
- Guard both provider and CLI audio paths in runner.ts
- Add comprehensive tests for tiny, empty, and valid audio files
- Update existing test fixtures to use audio files above threshold
runProviderEntry now calls resolveProxyFetchFromEnv() and passes the
result as fetchFn to transcribeAudio/describeVideo, so media provider
API calls respect HTTPS_PROXY/HTTP_PROXY behind corporate proxies.
The openai provider implements transcribeAudio via
transcribeOpenAiCompatibleAudio (Whisper API), but its capabilities
array only declared ["image"]. This caused the media-understanding
runner to skip the openai provider when processing inbound audio
messages, resulting in raw audio files being passed to agents
instead of transcribed text.
Fix: Add "audio" to the capabilities array so the runner correctly
selects the openai provider for audio transcription.
Co-authored-by: Cursor <cursoragent@cursor.com>