feat(agents): flush reply pipeline before compaction wait

Auto-compaction blocks the run pipeline after prompt() returns. Any
buffered block replies (the assistant's response) sit in the delivery
pipeline until compaction finishes — which can take 7+ minutes on large
contexts. The user sees no reply for the entire duration even though
the response is already fully generated.

Two changes ensure the response reaches the channel immediately:

1. attempt.ts: call onBlockReplyFlush() before waitForCompactionRetry()
   so the pipeline drains while compaction is still running.
2. handleAgentEnd: call onBlockReplyFlush() after flushBlockReplyBuffer()
   (mirroring the pattern already used by handleToolExecutionStart) so
   coalesced blocks are dispatched as soon as the turn ends.

Closes #35074
This commit is contained in:
SidQin-cyber
2026-03-05 13:07:35 +08:00
committed by Josh Lehman
parent 6084c26d00
commit 155853c057
2 changed files with 13 additions and 0 deletions

View File

@@ -1688,6 +1688,14 @@ export async function runEmbeddedAttempt(
const preCompactionSessionId = activeSession.sessionId;
try {
// Flush buffered block replies before waiting for compaction so the
// user receives the assistant response immediately. Without this,
// coalesced/buffered blocks stay in the pipeline until compaction
// finishes — which can take minutes on large contexts (#35074).
if (params.onBlockReplyFlush) {
await params.onBlockReplyFlush();
}
await abortable(waitForCompactionRetry());
} catch (err) {
if (isRunnerAbortError(err)) {

View File

@@ -73,6 +73,11 @@ export function handleAgentEnd(ctx: EmbeddedPiSubscribeContext) {
}
ctx.flushBlockReplyBuffer();
// Flush the reply pipeline so the response reaches the channel before
// compaction wait blocks the run. This mirrors the pattern used by
// handleToolExecutionStart and ensures delivery is not held hostage to
// long-running compaction (#35074).
void ctx.params.onBlockReplyFlush?.();
ctx.state.blockState.thinking = false;
ctx.state.blockState.final = false;