feat(audio): auto-echo transcription to chat before agent processing

When echoTranscript is enabled in tools.media.audio config, the
transcription text is sent back to the originating chat immediately
after successful audio transcription — before the agent processes it.
This lets users verify what was heard from their voice note.

Changes:
- config/types.tools.ts: add echoTranscript (bool) and echoFormat
  (string template) to MediaUnderstandingConfig
- media-understanding/apply.ts: sendTranscriptEcho() helper that
  resolves channel/to from ctx, guards on isDeliverableMessageChannel,
  and calls deliverOutboundPayloads best-effort
- config/schema.help.ts: help text for both new fields
- config/schema.labels.ts: labels for both new fields
- media-understanding/apply.echo-transcript.test.ts: 10 vitest cases
  covering disabled/enabled/custom-format/no-audio/failed-transcription/
  non-deliverable-channel/missing-from/OriginatingTo/delivery-failure

Default echoFormat: '📝 "{transcript}"'

Closes #32102
This commit is contained in:
AytuncYildizli
2026-03-02 23:31:57 +03:00
committed by Peter Steinberger
parent ef89b48785
commit 1b61269eec
5 changed files with 442 additions and 0 deletions

View File

@@ -545,6 +545,10 @@ export const FIELD_HELP: Record<string, string> = {
"Ordered model preferences specifically for audio understanding, used before shared media model fallback. Choose models optimized for transcription quality in your primary language/domain.",
"tools.media.audio.scope":
"Scope selector for when audio understanding runs across inbound messages and attachments. Keep focused scopes in high-volume channels to reduce cost and avoid accidental transcription.",
"tools.media.audio.echoTranscript":
"Echo the audio transcript back to the originating chat before agent processing. When enabled, users immediately see what was heard from their voice note, helping them verify transcription accuracy before the agent acts on it. Default: false.",
"tools.media.audio.echoFormat":
"Format string for the echoed transcript message. Use `{transcript}` as a placeholder for the transcribed text. Default: '📝 \"{transcript}\"'.",
"tools.media.video.enabled":
"Enable video understanding so clips can be summarized into text for downstream reasoning and responses. Disable when processing video is out of policy or too expensive for your deployment.",
"tools.media.video.maxBytes":