Large images (2000px) consume excessive context tokens when sent to LLMs.
1200px provides sufficient detail for most use cases while significantly
reducing token usage.
The 5MB byte limit remains unchanged as JPEG compression at 1200px
naturally produces smaller files.
(cherry picked from commit 40182123dd)
- Use stricter regex: /^[A-Za-z0-9+/]*={0,2}$/ ensures = only at end
- Normalize URL-safe base64 to standard (- → +, _ → /)
- Added tests for padding in wrong position and URL-safe normalization
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds explicit base64 format validation in sanitizeContentBlocksImages()
to prevent invalid image data from being sent to the Anthropic API.
The Problem:
- Node's Buffer.from(str, "base64") silently ignores invalid characters
- Invalid base64 passes local validation but fails at Anthropic's stricter API
- Once corrupted data persists in session history, every API call fails
The Fix:
- Add validateAndNormalizeBase64() function that:
- Strips data URL prefixes (e.g., "data:image/png;base64,...")
- Validates base64 character set with regex
- Checks for valid padding (0-2 '=' chars)
- Validates length is proper for base64 encoding
- Invalid images are replaced with descriptive text blocks
- Prevents permanent session corruption
Tests:
- Rejects invalid base64 characters
- Strips data URL prefixes correctly
- Rejects invalid padding
- Rejects invalid length
- Handles empty data gracefully
Closes#18212
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>