feat(voice-call): pre-cache inbound greeting for instant playback

Pre-generates TTS audio for the configured inboundGreeting at startup
and serves it instantly when an inbound call connects, eliminating the
500ms+ TTS synthesis delay on the first ring.

Changes:
- twilio.ts: Add cachedGreetingAudio storage with getter/setter
- runtime.ts: Pre-synthesize greeting TTS after provider initialization
- webhook.ts: Play cached audio directly via media stream on inbound
  connect, falling back to the original TTS path for outbound calls
  or when no cached audio is available

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
JayMishra-github
2026-02-16 10:18:44 -08:00
committed by Peter Steinberger
parent 27a4868c2d
commit 2c6db57554
3 changed files with 75 additions and 7 deletions

View File

@@ -62,6 +62,17 @@ export class TwilioProvider implements VoiceCallProvider {
/** Map of call SID to stream SID for media streams */
private callStreamMap = new Map<string, string>();
/** Pre-generated greeting audio for instant inbound playback */
private cachedGreetingAudio: Buffer | null = null;
setCachedGreetingAudio(audio: Buffer): void {
this.cachedGreetingAudio = audio;
console.log(`[voice-call] Cached greeting audio: ${audio.length} bytes`);
}
getCachedGreetingAudio(): Buffer | null {
return this.cachedGreetingAudio;
}
/** Per-call tokens for media stream authentication */
private streamAuthTokens = new Map<string, string>();