Commit Graph

1208 Commits

Author SHA1 Message Date
papersnake
2c2dfea60f Merge branch 'QuantumNous:main' into fix-claude-haiku 2025-12-26 16:23:34 +08:00
Calcium-Ion
654bb10b45 Merge pull request #2460 from seefs001/feature/gemini-flash-minial
fix(gemini): handle minimal reasoning effort budget
2025-12-26 13:57:56 +08:00
Calcium-Ion
0b1a562df9 Merge pull request #2477 from 1420970597/fix/anthropic-cache-billing
fix: 修复 Anthropic 渠道缓存计费错误
2025-12-24 16:59:23 +08:00
Seefs
a0c3d37d66 Merge pull request #2493 from shikaiwei1/patch-1 2025-12-24 16:52:24 +08:00
feitianbubu
3652dfdbd5 fix: check claudeResponse delta StopReason nil point 2025-12-24 11:54:23 +08:00
John Chen
dbaba87c39 为Moonshot添加缓存tokens读取逻辑
为Moonshot添加缓存tokens读取逻辑。其与智普V4的逻辑相同,所以共用逻辑
2025-12-22 17:05:16 +08:00
Seefs
28f7a4feef fix: 在Vertex Adapter过滤content[].part[].functionResponse.id 2025-12-21 17:22:04 +08:00
Seefs
5a64ae2a29 fix: 模型设置增加针对Vertex渠道过滤content[].part[].functionResponse.id的选项,默认启用 2025-12-21 17:09:49 +08:00
长安
0a2f12c04e fix: 修复 Anthropic 渠道缓存计费错误
## 问题描述

当使用 Anthropic 渠道通过 `/v1/chat/completions` 端点调用且启用缓存功能时,
计费逻辑错误地减去了缓存 tokens,导致严重的收入损失(94.5%)。

## 根本原因

不同 API 的 `prompt_tokens` 定义不同:

- **Anthropic API**: `input_tokens` 字段已经是纯输入 tokens(不包含缓存)
- **OpenAI API**: `prompt_tokens` 字段包含所有 tokens(包含缓存)
- **OpenRouter API**: `prompt_tokens` 字段包含所有 tokens(包含缓存)

当前 `postConsumeQuota` 函数对所有渠道都减去缓存 tokens,这对 Anthropic
渠道是错误的,因为其 `input_tokens` 已经不包含缓存。

## 修复方案

在 `relay/compatible_handler.go` 的 `postConsumeQuota` 函数中,添加渠道类型判断:

```go
if relayInfo.ChannelType != constant.ChannelTypeAnthropic {
    baseTokens = baseTokens.Sub(dCacheTokens)
}
```

只对非 Anthropic 渠道减去缓存 tokens。

## 影响分析

###  不受影响的场景

1. **无缓存调用**(所有渠道)
   - cache_tokens = 0
   - 减去 0 = 不减去
   - 结果:完全一致

2. **OpenAI/OpenRouter 渠道 + 缓存**
   - 继续减去缓存(因为 ChannelType != Anthropic)
   - 结果:完全一致

3. **Anthropic 渠道 + /v1/messages 端点**
   - 使用 PostClaudeConsumeQuota(不修改)
   - 结果:完全不受影响

###  修复的场景

4. **Anthropic 渠道 + /v1/chat/completions + 缓存**
   - 修复前:错误地减去缓存,导致 94.5% 收入损失
   - 修复后:不减去缓存,计费正确

## 验证数据

以实际记录 143509 为例:

| 项目 | 修复前 | 修复后 | 差异 |
|------|--------|--------|------|
| Quota | 10,489 | 191,330 | +180,841 |
| 费用 | ¥0.020978 | ¥0.382660 | +¥0.361682 |
| 收入恢复 | - | - | **+1724.1%** |

## 测试建议

1. 测试 Anthropic 渠道 + 缓存场景
2. 测试 OpenAI 渠道 + 缓存场景(确保不受影响)
3. 测试无缓存场景(确保不受影响)

## 相关 Issue

修复 Anthropic 渠道使用 prompt caching 时的计费错误。
2025-12-20 14:17:12 +08:00
Seefs
da24a165d0 fix(gemini): handle minimal reasoning effort budget
- Add minimal case to clampThinkingBudgetByEffort to avoid defaulting to full thinking budget
2025-12-18 08:10:46 +08:00
t0ng7u
8e3f9b1faa 🛡️ fix: prevent OOM on large/decompressed requests; skip heavy prompt meta when token count is disabled
Clamp request body size (including post-decompression) to avoid memory exhaustion caused by huge payloads/zip bombs, especially with large-context Claude requests. Add a configurable `MAX_REQUEST_BODY_MB` (default `32`) and document it.

- Enforce max request body size after gzip/br decompression via `http.MaxBytesReader`
- Add a secondary size guard in `common.GetRequestBody` and cache-safe handling
- Return **413 Request Entity Too Large** on oversized bodies in relay entry
- Avoid building large `TokenCountMeta.CombineText` when both token counting and sensitive check are disabled (use lightweight meta for pricing)
- Update READMEs (CN/EN/FR/JA) with `MAX_REQUEST_BODY_MB`
- Fix a handful of vet/formatting issues encountered during the change
- `go test ./...` passes
2025-12-16 17:00:19 +08:00
CaIon
7cae4a640b fix(audio): correct TotalTokens calculation for accurate usage reporting 2025-12-13 17:49:57 +08:00
CaIon
e36e2e1b69 feat(audio): enhance audio request handling with token type detection and streaming support 2025-12-13 17:24:23 +08:00
CaIon
21fca238bf refactor(error): replace dto.OpenAIError with types.OpenAIError for consistency 2025-12-13 16:43:57 +08:00
CaIon
b58fa3debc fix(helper): improve error handling in FlushWriter and related functions 2025-12-13 13:29:21 +08:00
CaIon
1c167c1068 refactor(auth): replace direct token group setting with context key retrieval 2025-12-13 01:38:12 +08:00
Calcium-Ion
30cb224793 Merge pull request #2429 from QuantumNous/feat/xhigh
feat(adaptor): add '-xhigh' suffix to reasoning effort options
2025-12-12 22:06:19 +08:00
CaIon
ce6fb95f96 refactor(relay): update channel retrieval to use RelayInfo structure 2025-12-12 22:04:38 +08:00
CaIon
50854c17bb feat(adaptor): add '-xhigh' suffix to reasoning effort options for model parsing 2025-12-12 20:53:48 +08:00
Calcium-Ion
147659fb6e Merge pull request #2426 from QuantumNous/feat/auto-cross-group-retry
feat(token): add cross-group retry option for token processing
2025-12-12 20:45:54 +08:00
CaIon
01b4039e96 feat(token): add cross-group retry option for token processing 2025-12-12 17:59:21 +08:00
zdwy5
e1bee48152 fix: 支持aws 通过全局参数透传或者渠道参数透传来 调用 (#2423)
* fix: 支持aws 通过全局参数透传或者渠道参数透传来 调用

* fix(aws): replace json.Unmarshal with common.Unmarshal for request body processing

---------

Co-authored-by: r0 <liangchunlei@01.ai>
Co-authored-by: CaIon <i@caion.me>
2025-12-12 17:09:27 +08:00
Seefs
4e69c98b42 Merge pull request #2412 from seefs001/pr-2372
feat: add openai video remix endpoint
2025-12-11 23:35:23 +08:00
Calcium-Ion
e346f0bf16 Merge pull request #2398 from seefs001/fix/video-proxy
fix: Use channel proxy settings for task query scenarios
2025-12-09 14:05:30 +08:00
Calcium-Ion
5212fbd73d Merge pull request #2358 from seefs001/fix/regrex-repeat-compile
fix: regex repeat compile
2025-12-09 14:01:07 +08:00
Calcium-Ion
9561c7b50f Merge pull request #2356 from seefs001/feature/zhipiu_4v_image
feat: zhipu 4v image generations
2025-12-09 14:00:20 +08:00
Seefs
5889571108 fix: Use channel proxy settings for task query scenarios 2025-12-09 11:15:27 +08:00
Seefs
72d2a94b0d Merge pull request #2229 from HynoR/chore/v1
fix: Set default to unsupported value for gpt-5 model series requests
2025-12-08 20:59:30 +08:00
Seefs
5eae6a3874 Merge pull request #2375 from FlowerRealm/feat/add-claude-haiku-4-5
feat: add claude-haiku-4-5-20251001 model support
2025-12-08 20:46:02 +08:00
Papersnake
681b37d104 feat: support claude-haiku-4-5-20251001 on vertex 2025-12-08 17:28:36 +08:00
firstmelody
121746a79e fix(adaptor): fix reasoning suffix not processing in vertex adapter 2025-12-08 01:12:29 +08:00
FlowerRealm
c3c119a9b4 feat: add claude-haiku-4-5-20251001 model support
- Add model to Claude ModelList
- Add model ratio (0.5, $1/1M input tokens)
- Add completion ratio support (5x, $5/1M output tokens)
- Add cache read ratio (0.1, $0.10/1M tokens)
- Add cache write ratio (1.25, $1.25/1M tokens)

Model specs:
- Context window: 200K tokens
- Max output: 64K tokens
- Release date: October 1, 2025
2025-12-05 18:54:20 +08:00
Seefs
896e4ac671 fix: regex repeat compile 2025-12-03 00:41:47 +08:00
Seefs
2e37347851 feat: zhipu v4 image generations 2025-12-02 22:56:58 +08:00
CaIon
45556c961f fix(price): adjust pre-consume quota logic for free models based on group ratio 2025-12-02 22:09:48 +08:00
Calcium-Ion
ffc45a756e Merge pull request #2344 from seefs001/feature/gemini-thinking-level
feat: gemini 3 thinking level gemini-3-pro-preview-high
2025-12-02 21:55:43 +08:00
Calcium-Ion
48635360cd Merge pull request #2355 from QuantumNous/feat/optimize-token-counter
feat: refactor token estimation logic
2025-12-02 21:51:09 +08:00
CaIon
f5b409d74f feat: refactor token estimation logic
- Introduced new OpenAI text models in `common/model.go`.
- Added `IsOpenAITextModel` function to check for OpenAI text models.
- Refactored token estimation methods across various channels to use estimated prompt tokens instead of direct prompt token counts.
- Updated related functions and structures to accommodate the new token estimation approach, enhancing overall token management.
2025-12-02 21:34:39 +08:00
Papersnake
3954feb993 fix: set MaxIdleConnsPerHost to 100 2025-12-02 09:55:03 +08:00
CaIon
4dbdbdec1d feat(gemini): implement markdown image handling in text processing 2025-12-01 17:54:41 +08:00
Seefs
b6a02d8303 feat: gemini 3 thinking level gemini-3-pro-preview-high 2025-12-01 16:40:46 +08:00
CaIon
98f92f990a feat(gemini): add validation and conversion for imageConfig parameters in extra_body 2025-11-30 19:31:08 +08:00
CaIon
3f7ea1fd83 fix(vertex): ensure sampleCount is a positive integer and update OtherRatios 2025-11-30 19:05:33 +08:00
Seefs
3257723a55 Revert "OAI生图接口支持gemini 3 pro image preview" 2025-11-30 18:49:18 +08:00
Calcium-Ion
b19b2d62df Merge pull request #2339 from QuantumNous/revert-2330-pr/fix-nano-banana-err
Revert "fix: nano-banana not compatible imageSize"
2025-11-30 18:48:09 +08:00
Calcium-Ion
f9c8624f2c Merge pull request #2338 from QuantumNous/revert-2321-pr/gemini-image-edit
Revert "Gemini Image系列支持图像编辑"
2025-11-30 18:48:01 +08:00
Calcium-Ion
6c8253156b Merge pull request #2337 from QuantumNous/revert-2315-pr/gemini-veo3.1-i2v
Revert "Gemini Veo3.1[AI Studio]增加图生视频支持"
2025-11-30 18:47:50 +08:00
Seefs
e29ff0060d Revert "fix: nano-banana not compatible imageSize" 2025-11-30 18:46:10 +08:00
Seefs
d4a2c2ab54 Revert "Gemini Image系列支持图像编辑" 2025-11-30 18:45:54 +08:00
Seefs
ded463ee57 Revert "Gemini Veo3.1[AI Studio]增加图生视频支持" 2025-11-30 18:45:37 +08:00