feat(adaptor): 新适配百炼多种图片生成模型

- wan2.6系列生图与编辑，适配多图生成计费 - wan2.5系列生图与编辑 - z-image-turbo生图，适配prompt_extend计费
fix: glm 4.7 finish reason (#2545 )
2026-04-18 09:57:27 +00:00 · 2025-12-29 23:00:17 +08:00 · 2025-12-29 19:41:15 +08:00 · 2025-12-29 14:53:31 +08:00 · 2025-12-29 14:13:33 +08:00 · 2025-12-28 15:55:35 +08:00
143 changed files with 13790 additions and 1097 deletions
--- a/.dockerignore
+++ b/.dockerignore
@@ -6,4 +6,5 @@
 Makefile
 docs
 .eslintcache
-.gocache
+.gocache
+/web/node_modules
--- a/.env.example
+++ b/.env.example
@@ -9,6 +9,14 @@
 # ENABLE_PPROF=true
 # 启用调试模式
 # DEBUG=true
+# Pyroscope 配置
+# PYROSCOPE_URL=http://localhost:4040
+# PYROSCOPE_APP_NAME=new-api
+# PYROSCOPE_BASIC_AUTH_USER=your-user
+# PYROSCOPE_BASIC_AUTH_PASSWORD=your-password
+# PYROSCOPE_MUTEX_RATE=5
+# PYROSCOPE_BLOCK_RATE=5
+# HOSTNAME=your-hostname

 # 数据库相关配置
 # 数据库连接字符串
--- a/.gitignore
+++ b/.gitignore
@@ -16,9 +16,13 @@ new-api
 tiktoken_cache
 .eslintcache
 .gocache
+.gomodcache/
 .cache
 web/bun.lock

 electron/node_modules
 electron/dist
 data/
+.gomodcache/
+.gocache-temp
+.gopath
--- a/README.en.md
+++ b/README.en.md
@@ -146,7 +146,7 @@ docker run --name new-api -d --restart always \

 🎉 After deployment is complete, visit `http://localhost:3000` to start using!

-📖 For more deployment methods, please refer to [Deployment Guide](https://docs.newapi.pro/installation)
+📖 For more deployment methods, please refer to [Deployment Guide](https://docs.newapi.pro/en/docs/installation)

 ---

@@ -154,7 +154,7 @@ docker run --name new-api -d --restart always \

 <div align="center">

-### 📖 [Official Documentation](https://docs.newapi.pro/) | [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/QuantumNous/new-api)
+### 📖 [Official Documentation](https://docs.newapi.pro/en/docs) | [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/QuantumNous/new-api)

 </div>

@@ -162,17 +162,17 @@ docker run --name new-api -d --restart always \

 | Category | Link |
 |------|------|
-| 🚀 Deployment Guide | [Installation Documentation](https://docs.newapi.pro/installation) |
-| ⚙️ Environment Configuration | [Environment Variables](https://docs.newapi.pro/installation/environment-variables) |
-| 📡 API Documentation | [API Documentation](https://docs.newapi.pro/api) |
-| ❓ FAQ | [FAQ](https://docs.newapi.pro/support/faq) |
-| 💬 Community Interaction | [Communication Channels](https://docs.newapi.pro/support/community-interaction) |
+| 🚀 Deployment Guide | [Installation Documentation](https://docs.newapi.pro/en/docs/installation) |
+| ⚙️ Environment Configuration | [Environment Variables](https://docs.newapi.pro/en/docs/installation/config-maintenance/environment-variables) |
+| 📡 API Documentation | [API Documentation](https://docs.newapi.pro/en/docs/api) |
+| ❓ FAQ | [FAQ](https://docs.newapi.pro/en/docs/support/faq) |
+| 💬 Community Interaction | [Communication Channels](https://docs.newapi.pro/en/docs/support/community-interaction) |

 ---

 ## ✨ Key Features

-> For detailed features, please refer to [Features Introduction](https://docs.newapi.pro/wiki/features-introduction)
+> For detailed features, please refer to [Features Introduction](https://docs.newapi.pro/en/docs/guide/wiki/basic-concepts/features-introduction)

 ### 🎨 Core Functions

@@ -201,11 +201,11 @@ docker run --name new-api -d --restart always \
 ### 🚀 Advanced Features

 **API Format Support:**
- ⚡ [OpenAI Responses](https://docs.newapi.pro/api/openai-responses)
- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/api/openai-realtime) (including Azure)
- ⚡ [Claude Messages](https://docs.newapi.pro/api/anthropic-chat)
- ⚡ [Google Gemini](https://docs.newapi.pro/api/google-gemini-chat/)
- 🔄 [Rerank Models](https://docs.newapi.pro/api/jinaai-rerank) (Cohere, Jina)
+- ⚡ [OpenAI Responses](https://docs.newapi.pro/en/docs/api/ai-model/chat/openai/create-response)
+- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/en/docs/api/ai-model/realtime/create-realtime-session) (including Azure)
+- ⚡ [Claude Messages](https://docs.newapi.pro/en/docs/api/ai-model/chat/create-message)
+- ⚡ [Google Gemini](https://doc.newapi.pro/en/api/google-gemini-chat)
+- 🔄 [Rerank Models](https://docs.newapi.pro/en/docs/api/ai-model/rerank/create-rerank) (Cohere, Jina)

 **Intelligent Routing:**
 - ⚖️ Channel weighted random
@@ -246,16 +246,16 @@ docker run --name new-api -d --restart always \

 ## 🤖 Model Support

-> For details, please refer to [API Documentation - Relay Interface](https://docs.newapi.pro/api)
+> For details, please refer to [API Documentation - Relay Interface](https://docs.newapi.pro/en/docs/api)

 | Model Type | Description | Documentation |
 |---------|------|------|
 | 🤖 OpenAI GPTs | gpt-4-gizmo-* series | - |
-| 🎨 Midjourney-Proxy | [Midjourney-Proxy(Plus)](https://github.com/novicezk/midjourney-proxy) | [Documentation](https://docs.newapi.pro/api/midjourney-proxy-image) |
-| 🎵 Suno-API | [Suno API](https://github.com/Suno-API/Suno-API) | [Documentation](https://docs.newapi.pro/api/suno-music) |
-| 🔄 Rerank | Cohere, Jina | [Documentation](https://docs.newapi.pro/api/jinaai-rerank) |
-| 💬 Claude | Messages format | [Documentation](https://docs.newapi.pro/api/anthropic-chat) |
-| 🌐 Gemini | Google Gemini format | [Documentation](https://docs.newapi.pro/api/google-gemini-chat/) |
+| 🎨 Midjourney-Proxy | [Midjourney-Proxy(Plus)](https://github.com/novicezk/midjourney-proxy) | [Documentation](https://doc.newapi.pro/en/api/midjourney-proxy-image) |
+| 🎵 Suno-API | [Suno API](https://github.com/Suno-API/Suno-API) | [Documentation](https://doc.newapi.pro/en/api/suno-music) |
+| 🔄 Rerank | Cohere, Jina | [Documentation](https://docs.newapi.pro/en/docs/api/ai-model/rerank/create-rerank) |
+| 💬 Claude | Messages format | [Documentation](https://docs.newapi.pro/en/docs/api/ai-model/chat/create-message) |
+| 🌐 Gemini | Google Gemini format | [Documentation](https://doc.newapi.pro/en/api/google-gemini-chat) |
 | 🔧 Dify | ChatFlow mode | - |
 | 🎯 Custom | Supports complete call address | - |

@@ -264,16 +264,16 @@ docker run --name new-api -d --restart always \
 <details>
 <summary>View complete interface list</summary>

- [Chat Interface (Chat Completions)](https://docs.newapi.pro/api/openai-chat)
- [Response Interface (Responses)](https://docs.newapi.pro/api/openai-responses)
- [Image Interface (Image)](https://docs.newapi.pro/api/openai-image)
- [Audio Interface (Audio)](https://docs.newapi.pro/api/openai-audio)
- [Video Interface (Video)](https://docs.newapi.pro/api/openai-video)
- [Embedding Interface (Embeddings)](https://docs.newapi.pro/api/openai-embeddings)
- [Rerank Interface (Rerank)](https://docs.newapi.pro/api/jinaai-rerank)
- [Realtime Conversation (Realtime)](https://docs.newapi.pro/api/openai-realtime)
- [Claude Chat](https://docs.newapi.pro/api/anthropic-chat)
- [Google Gemini Chat](https://docs.newapi.pro/api/google-gemini-chat/)
+- [Chat Interface (Chat Completions)](https://docs.newapi.pro/en/docs/api/ai-model/chat/openai/create-chat-completion)
+- [Response Interface (Responses)](https://docs.newapi.pro/en/docs/api/ai-model/chat/openai/create-response)
+- [Image Interface (Image)](https://docs.newapi.pro/en/docs/api/ai-model/images/openai/v1-images-generations--post)
+- [Audio Interface (Audio)](https://docs.newapi.pro/en/docs/api/ai-model/audio/openai/create-transcription)
+- [Video Interface (Video)](https://docs.newapi.pro/en/docs/api/ai-model/videos/create-video-generation)
+- [Embedding Interface (Embeddings)](https://docs.newapi.pro/en/docs/api/ai-model/embeddings/create-embedding)
+- [Rerank Interface (Rerank)](https://docs.newapi.pro/en/docs/api/ai-model/rerank/create-rerank)
+- [Realtime Conversation (Realtime)](https://docs.newapi.pro/en/docs/api/ai-model/realtime/create-realtime-session)
+- [Claude Chat](https://docs.newapi.pro/en/docs/api/ai-model/chat/create-message)
+- [Google Gemini Chat](https://doc.newapi.pro/en/api/google-gemini-chat)

 </details>

@@ -305,10 +305,18 @@ docker run --name new-api -d --restart always \
 | `REDIS_CONN_STRING` | Redis connection string | - |
 | `STREAMING_TIMEOUT` | Streaming timeout (seconds) | `300` |
 | `STREAM_SCANNER_MAX_BUFFER_MB` | Max per-line buffer (MB) for the stream scanner; increase when upstream sends huge image/base64 payloads | `64` |
+| `MAX_REQUEST_BODY_MB` | Max request body size (MB, counted **after decompression**; prevents huge requests/zip bombs from exhausting memory). Exceeding it returns `413` | `32` |
 | `AZURE_DEFAULT_API_VERSION` | Azure API version | `2025-04-01-preview` |
 | `ERROR_LOG_ENABLED` | Error log switch | `false` |
+| `PYROSCOPE_URL` | Pyroscope server address | - |
+| `PYROSCOPE_APP_NAME` | Pyroscope application name | `new-api` |
+| `PYROSCOPE_BASIC_AUTH_USER` | Pyroscope basic auth user | - |
+| `PYROSCOPE_BASIC_AUTH_PASSWORD` | Pyroscope basic auth password | - |
+| `PYROSCOPE_MUTEX_RATE` | Pyroscope mutex sampling rate | `5` |
+| `PYROSCOPE_BLOCK_RATE` | Pyroscope block sampling rate | `5` |
+| `HOSTNAME` | Hostname tag for Pyroscope | `new-api` |

-📖 **Complete configuration:** [Environment Variables Documentation](https://docs.newapi.pro/installation/environment-variables)
+📖 **Complete configuration:** [Environment Variables Documentation](https://docs.newapi.pro/en/docs/installation/config-maintenance/environment-variables)

 </details>

@@ -410,10 +418,10 @@ docker run --name new-api -d --restart always \

 | Resource | Link |
 |------|------|
-| 📘 FAQ | [FAQ](https://docs.newapi.pro/support/faq) |
-| 💬 Community Interaction | [Communication Channels](https://docs.newapi.pro/support/community-interaction) |
-| 🐛 Issue Feedback | [Issue Feedback](https://docs.newapi.pro/support/feedback-issues) |
-| 📚 Complete Documentation | [Official Documentation](https://docs.newapi.pro/support) |
+| 📘 FAQ | [FAQ](https://docs.newapi.pro/en/docs/support/faq) |
+| 💬 Community Interaction | [Communication Channels](https://docs.newapi.pro/en/docs/support/community-interaction) |
+| 🐛 Issue Feedback | [Issue Feedback](https://docs.newapi.pro/en/docs/support/feedback-issues) |
+| 📚 Complete Documentation | [Official Documentation](https://docs.newapi.pro/en/docs) |

 ### 🤝 Contribution Guide

@@ -442,7 +450,7 @@ Welcome all forms of contribution!

 If this project is helpful to you, welcome to give us a ⭐️ Star！

-**[Official Documentation](https://docs.newapi.pro/)** • **[Issue Feedback](https://github.com/Calcium-Ion/new-api/issues)** • **[Latest Release](https://github.com/Calcium-Ion/new-api/releases)**
+**[Official Documentation](https://docs.newapi.pro/en/docs)** • **[Issue Feedback](https://github.com/Calcium-Ion/new-api/issues)** • **[Latest Release](https://github.com/Calcium-Ion/new-api/releases)**

 <sub>Built with ❤️ by QuantumNous</sub>

--- a/README.fr.md
+++ b/README.fr.md
@@ -146,7 +146,7 @@ docker run --name new-api -d --restart always \

 🎉 Après le déploiement, visitez `http://localhost:3000` pour commencer à utiliser!

-📖 Pour plus de méthodes de déploiement, veuillez vous référer à [Guide de déploiement](https://docs.newapi.pro/installation)
+📖 Pour plus de méthodes de déploiement, veuillez vous référer à [Guide de déploiement](https://docs.newapi.pro/en/docs/installation)

 ---

@@ -154,7 +154,7 @@ docker run --name new-api -d --restart always \

 <div align="center">

-### 📖 [Documentation officielle](https://docs.newapi.pro/) | [![Demander à DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/QuantumNous/new-api)
+### 📖 [Documentation officielle](https://docs.newapi.pro/en/docs) | [![Demander à DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/QuantumNous/new-api)

 </div>

@@ -162,17 +162,17 @@ docker run --name new-api -d --restart always \

 | Catégorie | Lien |
 |------|------|
-| 🚀 Guide de déploiement | [Documentation d'installation](https://docs.newapi.pro/installation) |
-| ⚙️ Configuration de l'environnement | [Variables d'environnement](https://docs.newapi.pro/installation/environment-variables) |
-| 📡 Documentation de l'API | [Documentation de l'API](https://docs.newapi.pro/api) |
-| ❓ FAQ | [FAQ](https://docs.newapi.pro/support/faq) |
-| 💬 Interaction avec la communauté | [Canaux de communication](https://docs.newapi.pro/support/community-interaction) |
+| 🚀 Guide de déploiement | [Documentation d'installation](https://docs.newapi.pro/en/docs/installation) |
+| ⚙️ Configuration de l'environnement | [Variables d'environnement](https://docs.newapi.pro/en/docs/installation/config-maintenance/environment-variables) |
+| 📡 Documentation de l'API | [Documentation de l'API](https://docs.newapi.pro/en/docs/api) |
+| ❓ FAQ | [FAQ](https://docs.newapi.pro/en/docs/support/faq) |
+| 💬 Interaction avec la communauté | [Canaux de communication](https://docs.newapi.pro/en/docs/support/community-interaction) |

 ---

 ## ✨ Fonctionnalités clés

-> Pour les fonctionnalités détaillées, veuillez vous référer à [Présentation des fonctionnalités](https://docs.newapi.pro/wiki/features-introduction) |
+> Pour les fonctionnalités détaillées, veuillez vous référer à [Présentation des fonctionnalités](https://docs.newapi.pro/en/docs/guide/wiki/basic-concepts/features-introduction) |

 ### 🎨 Fonctions principales

@@ -200,11 +200,11 @@ docker run --name new-api -d --restart always \
 ### 🚀 Fonctionnalités avancées

 **Prise en charge des formats d'API:**
- ⚡ [OpenAI Responses](https://docs.newapi.pro/api/openai-responses)
- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/api/openai-realtime) (y compris Azure)
- ⚡ [Claude Messages](https://docs.newapi.pro/api/anthropic-chat)
- ⚡ [Google Gemini](https://docs.newapi.pro/api/google-gemini-chat/)
- 🔄 [Modèles Rerank](https://docs.newapi.pro/api/jinaai-rerank) (Cohere, Jina)
+- ⚡ [OpenAI Responses](https://docs.newapi.pro/en/docs/api/ai-model/chat/openai/create-response)
+- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/en/docs/api/ai-model/realtime/create-realtime-session) (y compris Azure)
+- ⚡ [Claude Messages](https://docs.newapi.pro/en/docs/api/ai-model/chat/create-message)
+- ⚡ [Google Gemini](https://doc.newapi.pro/en/api/google-gemini-chat)
+- 🔄 [Modèles Rerank](https://docs.newapi.pro/en/docs/api/ai-model/rerank/create-rerank) (Cohere, Jina)

 **Routage intelligent:**
 - ⚖️ Sélection aléatoire pondérée des canaux
@@ -242,16 +242,16 @@ docker run --name new-api -d --restart always \

 ## 🤖 Prise en charge des modèles

-> Pour les détails, veuillez vous référer à [Documentation de l'API - Interface de relais](https://docs.newapi.pro/api)
+> Pour les détails, veuillez vous référer à [Documentation de l'API - Interface de relais](https://docs.newapi.pro/en/docs/api)

 | Type de modèle | Description | Documentation |
 |---------|------|------|
 | 🤖 OpenAI GPTs | série gpt-4-gizmo-* | - |
-| 🎨 Midjourney-Proxy | [Midjourney-Proxy(Plus)](https://github.com/novicezk/midjourney-proxy) | [Documentation](https://docs.newapi.pro/api/midjourney-proxy-image) |
-| 🎵 Suno-API | [Suno API](https://github.com/Suno-API/Suno-API) | [Documentation](https://docs.newapi.pro/api/suno-music) |
-| 🔄 Rerank | Cohere, Jina | [Documentation](https://docs.newapi.pro/api/jinaai-rerank) |
-| 💬 Claude | Format Messages | [Documentation](https://docs.newapi.pro/api/anthropic-chat) |
-| 🌐 Gemini | Format Google Gemini | [Documentation](https://docs.newapi.pro/api/google-gemini-chat/) |
+| 🎨 Midjourney-Proxy | [Midjourney-Proxy(Plus)](https://github.com/novicezk/midjourney-proxy) | [Documentation](https://doc.newapi.pro/en/api/midjourney-proxy-image) |
+| 🎵 Suno-API | [Suno API](https://github.com/Suno-API/Suno-API) | [Documentation](https://doc.newapi.pro/en/api/suno-music) |
+| 🔄 Rerank | Cohere, Jina | [Documentation](https://docs.newapi.pro/en/docs/api/ai-model/rerank/create-rerank) |
+| 💬 Claude | Format Messages | [Documentation](https://docs.newapi.pro/en/docs/api/ai-model/chat/create-message) |
+| 🌐 Gemini | Format Google Gemini | [Documentation](https://doc.newapi.pro/en/api/google-gemini-chat) |
 | 🔧 Dify | Mode ChatFlow | - |
 | 🎯 Personnalisé | Prise en charge de l'adresse d'appel complète | - |

@@ -260,16 +260,16 @@ docker run --name new-api -d --restart always \
 <details>
 <summary>Voir la liste complète des interfaces</summary>

- [Interface de discussion (Chat Completions)](https://docs.newapi.pro/api/openai-chat)
- [Interface de réponse (Responses)](https://docs.newapi.pro/api/openai-responses)
- [Interface d'image (Image)](https://docs.newapi.pro/api/openai-image)
- [Interface audio (Audio)](https://docs.newapi.pro/api/openai-audio)
- [Interface vidéo (Video)](https://docs.newapi.pro/api/openai-video)
- [Interface d'incorporation (Embeddings)](https://docs.newapi.pro/api/openai-embeddings)
- [Interface de rerank (Rerank)](https://docs.newapi.pro/api/jinaai-rerank)
- [Conversation en temps réel (Realtime)](https://docs.newapi.pro/api/openai-realtime)
- [Discussion Claude](https://docs.newapi.pro/api/anthropic-chat)
- [Discussion Google Gemini](https://docs.newapi.pro/api/google-gemini-chat/)
+- [Interface de discussion (Chat Completions)](https://docs.newapi.pro/en/docs/api/ai-model/chat/openai/create-chat-completion)
+- [Interface de réponse (Responses)](https://docs.newapi.pro/en/docs/api/ai-model/chat/openai/create-response)
+- [Interface d'image (Image)](https://docs.newapi.pro/en/docs/api/ai-model/images/openai/v1-images-generations--post)
+- [Interface audio (Audio)](https://docs.newapi.pro/en/docs/api/ai-model/audio/openai/create-transcription)
+- [Interface vidéo (Video)](https://docs.newapi.pro/en/docs/api/ai-model/videos/create-video-generation)
+- [Interface d'incorporation (Embeddings)](https://docs.newapi.pro/en/docs/api/ai-model/embeddings/create-embedding)
+- [Interface de rerank (Rerank)](https://docs.newapi.pro/en/docs/api/ai-model/rerank/create-rerank)
+- [Conversation en temps réel (Realtime)](https://docs.newapi.pro/en/docs/api/ai-model/realtime/create-realtime-session)
+- [Discussion Claude](https://docs.newapi.pro/en/docs/api/ai-model/chat/create-message)
+- [Discussion Google Gemini](https://doc.newapi.pro/en/api/google-gemini-chat)

 </details>

@@ -301,10 +301,18 @@ docker run --name new-api -d --restart always \
 | `REDIS_CONN_STRING` | Chaine de connexion Redis | - |
 | `STREAMING_TIMEOUT` | Délai d'expiration du streaming (secondes) | `300` |
 | `STREAM_SCANNER_MAX_BUFFER_MB` | Taille max du buffer par ligne (Mo) pour le scanner SSE ; à augmenter quand les sorties image/base64 sont très volumineuses (ex. images 4K) | `64` |
+| `MAX_REQUEST_BODY_MB` | Taille maximale du corps de requête (Mo, comptée **après décompression** ; évite les requêtes énormes/zip bombs qui saturent la mémoire). Dépassement ⇒ `413` | `32` |
 | `AZURE_DEFAULT_API_VERSION` | Version de l'API Azure | `2025-04-01-preview` |
 | `ERROR_LOG_ENABLED` | Interrupteur du journal d'erreurs | `false` |
+| `PYROSCOPE_URL` | Adresse du serveur Pyroscope | - |
+| `PYROSCOPE_APP_NAME` | Nom de l'application Pyroscope | `new-api` |
+| `PYROSCOPE_BASIC_AUTH_USER` | Utilisateur Basic Auth Pyroscope | - |
+| `PYROSCOPE_BASIC_AUTH_PASSWORD` | Mot de passe Basic Auth Pyroscope | - |
+| `PYROSCOPE_MUTEX_RATE` | Taux d'échantillonnage mutex Pyroscope | `5` |
+| `PYROSCOPE_BLOCK_RATE` | Taux d'échantillonnage block Pyroscope | `5` |
+| `HOSTNAME` | Nom d'hôte tagué pour Pyroscope | `new-api` |

-📖 **Configuration complète:** [Documentation des variables d'environnement](https://docs.newapi.pro/installation/environment-variables)
+📖 **Configuration complète:** [Documentation des variables d'environnement](https://docs.newapi.pro/en/docs/installation/config-maintenance/environment-variables)

 </details>

@@ -404,10 +412,10 @@ docker run --name new-api -d --restart always \

 | Ressource | Lien |
 |------|------|
-| 📘 FAQ | [FAQ](https://docs.newapi.pro/support/faq) |
-| 💬 Interaction avec la communauté | [Canaux de communication](https://docs.newapi.pro/support/community-interaction) |
-| 🐛 Commentaires sur les problèmes | [Commentaires sur les problèmes](https://docs.newapi.pro/support/feedback-issues) |
-| 📚 Documentation complète | [Documentation officielle](https://docs.newapi.pro/support) |
+| 📘 FAQ | [FAQ](https://docs.newapi.pro/en/docs/support/faq) |
+| 💬 Interaction avec la communauté | [Canaux de communication](https://docs.newapi.pro/en/docs/support/community-interaction) |
+| 🐛 Commentaires sur les problèmes | [Commentaires sur les problèmes](https://docs.newapi.pro/en/docs/support/feedback-issues) |
+| 📚 Documentation complète | [Documentation officielle](https://docs.newapi.pro/en/docs) |

 ### 🤝 Guide de contribution

@@ -436,7 +444,7 @@ Bienvenue à toutes les formes de contribution!

 Si ce projet vous est utile, bienvenue à nous donner une ⭐️ Étoile！

-**[Documentation officielle](https://docs.newapi.pro/)** • **[Commentaires sur les problèmes](https://github.com/Calcium-Ion/new-api/issues)** • **[Dernière version](https://github.com/Calcium-Ion/new-api/releases)**
+**[Documentation officielle](https://docs.newapi.pro/en/docs)** • **[Commentaires sur les problèmes](https://github.com/Calcium-Ion/new-api/issues)** • **[Dernière version](https://github.com/Calcium-Ion/new-api/releases)**

 <sub>Construit avec ❤️ par QuantumNous</sub>

--- a/README.ja.md
+++ b/README.ja.md
@@ -146,7 +146,7 @@ docker run --name new-api -d --restart always \

 🎉 デプロイが完了したら、`http://localhost:3000` にアクセスして使用を開始してください！

-📖 その他のデプロイ方法については[デプロイガイド](https://docs.newapi.pro/installation)を参照してください。
+📖 その他のデプロイ方法については[デプロイガイド](https://docs.newapi.pro/ja/docs/installation)を参照してください。

 ---

@@ -154,7 +154,7 @@ docker run --name new-api -d --restart always \

 <div align="center">

-### 📖 [公式ドキュメント](https://docs.newapi.pro/) | [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/QuantumNous/new-api)
+### 📖 [公式ドキュメント](https://docs.newapi.pro/ja/docs) | [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/QuantumNous/new-api)

 </div>

@@ -162,17 +162,17 @@ docker run --name new-api -d --restart always \

 | カテゴリ | リンク |
 |------|------|
-| 🚀 デプロイガイド | [インストールドキュメント](https://docs.newapi.pro/installation) |
-| ⚙️ 環境設定 | [環境変数](https://docs.newapi.pro/installation/environment-variables) |
-| 📡 APIドキュメント | [APIドキュメント](https://docs.newapi.pro/api) |
-| ❓ よくある質問 | [FAQ](https://docs.newapi.pro/support/faq) |
-| 💬 コミュニティ交流 | [交流チャネル](https://docs.newapi.pro/support/community-interaction) |
+| 🚀 デプロイガイド | [インストールドキュメント](https://docs.newapi.pro/ja/docs/installation) |
+| ⚙️ 環境設定 | [環境変数](https://docs.newapi.pro/ja/docs/installation/config-maintenance/environment-variables) |
+| 📡 APIドキュメント | [APIドキュメント](https://docs.newapi.pro/ja/docs/api) |
+| ❓ よくある質問 | [FAQ](https://docs.newapi.pro/ja/docs/support/faq) |
+| 💬 コミュニティ交流 | [交流チャネル](https://docs.newapi.pro/ja/docs/support/community-interaction) |

 ---

 ## ✨ 主な機能

-> 詳細な機能については[機能説明](https://docs.newapi.pro/wiki/features-introduction)を参照してください。
+> 詳細な機能については[機能説明](https://docs.newapi.pro/ja/docs/guide/wiki/basic-concepts/features-introduction)を参照してください。

 ### 🎨 コア機能

@@ -202,15 +202,15 @@ docker run --name new-api -d --restart always \
 ### 🚀 高度な機能

 **APIフォーマットサポート:**
- ⚡ [OpenAI Responses](https://docs.newapi.pro/api/openai-responses)
- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/api/openai-realtime)（Azureを含む）
- ⚡ [Claude Messages](https://docs.newapi.pro/api/anthropic-chat)
- ⚡ [Google Gemini](https://docs.newapi.pro/api/google-gemini-chat/)
- 🔄 [Rerankモデル](https://docs.newapi.pro/api/jinaai-rerank)
- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/api/openai-realtime)
- ⚡ [Claude Messages](https://docs.newapi.pro/api/anthropic-chat)
- ⚡ [Google Gemini](https://docs.newapi.pro/api/google-gemini-chat/)
- 🔄 [Rerankモデル](https://docs.newapi.pro/api/jinaai-rerank)（Cohere、Jina）
+- ⚡ [OpenAI Responses](https://docs.newapi.pro/ja/docs/api/ai-model/chat/openai/create-response)
+- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/ja/docs/api/ai-model/realtime/create-realtime-session)（Azureを含む）
+- ⚡ [Claude Messages](https://docs.newapi.pro/ja/docs/api/ai-model/chat/create-message)
+- ⚡ [Google Gemini](https://doc.newapi.pro/ja/api/google-gemini-chat)
+- 🔄 [Rerankモデル](https://docs.newapi.pro/ja/docs/api/ai-model/rerank/create-rerank)
+- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/ja/docs/api/ai-model/realtime/create-realtime-session)
+- ⚡ [Claude Messages](https://docs.newapi.pro/ja/docs/api/ai-model/chat/create-message)
+- ⚡ [Google Gemini](https://doc.newapi.pro/ja/api/google-gemini-chat)
+- 🔄 [Rerankモデル](https://docs.newapi.pro/ja/docs/api/ai-model/rerank/create-rerank)（Cohere、Jina）

 **インテリジェントルーティング:**
 - ⚖️ チャネル重み付けランダム
@@ -251,16 +251,16 @@ docker run --name new-api -d --restart always \

 ## 🤖 モデルサポート

-> 詳細については[APIドキュメント - 中継インターフェース](https://docs.newapi.pro/api)
+> 詳細については[APIドキュメント - 中継インターフェース](https://docs.newapi.pro/ja/docs/api)

 | モデルタイプ | 説明 | ドキュメント |
 |---------|------|------|
 | 🤖 OpenAI GPTs | gpt-4-gizmo-* シリーズ | - |
-| 🎨 Midjourney-Proxy | [Midjourney-Proxy(Plus)](https://github.com/novicezk/midjourney-proxy) | [ドキュメント](https://docs.newapi.pro/api/midjourney-proxy-image) |
-| 🎵 Suno-API | [Suno API](https://github.com/Suno-API/Suno-API) | [ドキュメント](https://docs.newapi.pro/api/suno-music) |
-| 🔄 Rerank | Cohere、Jina | [ドキュメント](https://docs.newapi.pro/api/jinaai-rerank) |
-| 💬 Claude | Messagesフォーマット | [ドキュメント](https://docs.newapi.pro/api/suno-music) |
-| 🌐 Gemini | Google Geminiフォーマット | [ドキュメント](https://docs.newapi.pro/api/google-gemini-chat/) |
+| 🎨 Midjourney-Proxy | [Midjourney-Proxy(Plus)](https://github.com/novicezk/midjourney-proxy) | [ドキュメント](https://doc.newapi.pro/ja/api/midjourney-proxy-image) |
+| 🎵 Suno-API | [Suno API](https://github.com/Suno-API/Suno-API) | [ドキュメント](https://doc.newapi.pro/ja/api/suno-music) |
+| 🔄 Rerank | Cohere、Jina | [ドキュメント](https://docs.newapi.pro/ja/docs/api/ai-model/rerank/create-rerank) |
+| 💬 Claude | Messagesフォーマット | [ドキュメント](https://docs.newapi.pro/ja/docs/api/ai-model/chat/create-message) |
+| 🌐 Gemini | Google Geminiフォーマット | [ドキュメント](https://doc.newapi.pro/ja/api/google-gemini-chat) |
 | 🔧 Dify | ChatFlowモード | - |
 | 🎯 カスタム | 完全な呼び出しアドレスの入力をサポート | - |

@@ -269,16 +269,16 @@ docker run --name new-api -d --restart always \
 <details>
 <summary>完全なインターフェースリストを表示</summary>

- [チャットインターフェース (Chat Completions)](https://docs.newapi.pro/api/openai-chat)
- [レスポンスインターフェース (Responses)](https://docs.newapi.pro/api/openai-responses)
- [イメージインターフェース (Image)](https://docs.newapi.pro/api/openai-image)
- [オーディオインターフェース (Audio)](https://docs.newapi.pro/api/openai-audio)
- [ビデオインターフェース (Video)](https://docs.newapi.pro/api/openai-video)
- [エンベッドインターフェース (Embeddings)](https://docs.newapi.pro/api/openai-embeddings)
- [再ランク付けインターフェース (Rerank)](https://docs.newapi.pro/api/jinaai-rerank)
- [リアルタイム対話インターフェース (Realtime)](https://docs.newapi.pro/api/openai-realtime)
- [Claudeチャット](https://docs.newapi.pro/api/anthropic-chat)
- [Google Geminiチャット](https://docs.newapi.pro/api/google-gemini-chat/)
+- [チャットインターフェース (Chat Completions)](https://docs.newapi.pro/ja/docs/api/ai-model/chat/openai/create-chat-completion)
+- [レスポンスインターフェース (Responses)](https://docs.newapi.pro/ja/docs/api/ai-model/chat/openai/create-response)
+- [イメージインターフェース (Image)](https://docs.newapi.pro/ja/docs/api/ai-model/images/openai/v1-images-generations--post)
+- [オーディオインターフェース (Audio)](https://docs.newapi.pro/ja/docs/api/ai-model/audio/openai/create-transcription)
+- [ビデオインターフェース (Video)](https://docs.newapi.pro/ja/docs/api/ai-model/videos/create-video-generation)
+- [エンベッドインターフェース (Embeddings)](https://docs.newapi.pro/ja/docs/api/ai-model/embeddings/create-embedding)
+- [再ランク付けインターフェース (Rerank)](https://docs.newapi.pro/ja/docs/api/ai-model/rerank/create-rerank)
+- [リアルタイム対話インターフェース (Realtime)](https://docs.newapi.pro/ja/docs/api/ai-model/realtime/create-realtime-session)
+- [Claudeチャット](https://docs.newapi.pro/ja/docs/api/ai-model/chat/create-message)
+- [Google Geminiチャット](https://doc.newapi.pro/ja/api/google-gemini-chat)

 </details>

@@ -310,10 +310,18 @@ docker run --name new-api -d --restart always \
 | `REDIS_CONN_STRING` | Redis接続文字列 | - |
 | `STREAMING_TIMEOUT` | ストリーミング応答のタイムアウト時間（秒） | `300` |
 | `STREAM_SCANNER_MAX_BUFFER_MB` | ストリームスキャナの1行あたりバッファ上限（MB）。4K画像など巨大なbase64 `data:` ペイロードを扱う場合は値を増加させてください | `64` |
+| `MAX_REQUEST_BODY_MB` | リクエストボディ最大サイズ（MB、**解凍後**に計測。巨大リクエスト/zip bomb によるメモリ枯渇を防止）。超過時は `413` | `32` |
 | `AZURE_DEFAULT_API_VERSION` | Azure APIバージョン | `2025-04-01-preview` |
 | `ERROR_LOG_ENABLED` | エラーログスイッチ | `false` |
+| `PYROSCOPE_URL` | Pyroscopeサーバーのアドレス | - |
+| `PYROSCOPE_APP_NAME` | Pyroscopeアプリ名 | `new-api` |
+| `PYROSCOPE_BASIC_AUTH_USER` | Pyroscope Basic Authユーザー | - |
+| `PYROSCOPE_BASIC_AUTH_PASSWORD` | Pyroscope Basic Authパスワード | - |
+| `PYROSCOPE_MUTEX_RATE` | Pyroscope mutexサンプリング率 | `5` |
+| `PYROSCOPE_BLOCK_RATE` | Pyroscope blockサンプリング率 | `5` |
+| `HOSTNAME` | Pyroscope用のホスト名タグ | `new-api` |

-📖 **完全な設定:** [環境変数ドキュメント](https://docs.newapi.pro/installation/environment-variables)
+📖 **完全な設定:** [環境変数ドキュメント](https://docs.newapi.pro/ja/docs/installation/config-maintenance/environment-variables)

 </details>

@@ -413,10 +421,10 @@ docker run --name new-api -d --restart always \

 | リソース | リンク |
 |------|------|
-| 📘 よくある質問 | [FAQ](https://docs.newapi.pro/support/faq) |
-| 💬 コミュニティ交流 | [交流チャネル](https://docs.newapi.pro/support/community-interaction) |
-| 🐛 問題のフィードバック | [問題フィードバック](https://docs.newapi.pro/support/feedback-issues) |
-| 📚 完全なドキュメント | [公式ドキュメント](https://docs.newapi.pro/support) |
+| 📘 よくある質問 | [FAQ](https://docs.newapi.pro/ja/docs/support/faq) |
+| 💬 コミュニティ交流 | [交流チャネル](https://docs.newapi.pro/ja/docs/support/community-interaction) |
+| 🐛 問題のフィードバック | [問題フィードバック](https://docs.newapi.pro/ja/docs/support/feedback-issues) |
+| 📚 完全なドキュメント | [公式ドキュメント](https://docs.newapi.pro/ja/docs) |

 ### 🤝 貢献ガイド

@@ -445,7 +453,7 @@ docker run --name new-api -d --restart always \

 このプロジェクトがあなたのお役に立てたなら、ぜひ ⭐️ スターをください！

-**[公式ドキュメント](https://docs.newapi.pro/)** • **[問題フィードバック](https://github.com/Calcium-Ion/new-api/issues)** • **[最新リリース](https://github.com/Calcium-Ion/new-api/releases)**
+**[公式ドキュメント](https://docs.newapi.pro/ja/docs)** • **[問題フィードバック](https://github.com/Calcium-Ion/new-api/issues)** • **[最新リリース](https://github.com/Calcium-Ion/new-api/releases)**

 <sub>❤️ で構築された QuantumNous</sub>

--- a/README.md
+++ b/README.md
@@ -146,7 +146,7 @@ docker run --name new-api -d --restart always \

 🎉 部署完成后，访问 `http://localhost:3000` 即可使用！

-📖 更多部署方式请参考 [部署指南](https://docs.newapi.pro/installation)
+📖 更多部署方式请参考 [部署指南](https://docs.newapi.pro/zh/docs/installation)

 ---

@@ -154,7 +154,7 @@ docker run --name new-api -d --restart always \

 <div align="center">

-### 📖 [官方文档](https://docs.newapi.pro/) | [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/QuantumNous/new-api)
+### 📖 [官方文档](https://docs.newapi.pro/zh/docs) | [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/QuantumNous/new-api)

 </div>

@@ -162,17 +162,17 @@ docker run --name new-api -d --restart always \

 | 分类 | 链接 |
 |------|------|
-| 🚀 部署指南 | [安装文档](https://docs.newapi.pro/installation) |
-| ⚙️ 环境配置 | [环境变量](https://docs.newapi.pro/installation/environment-variables) |
-| 📡 接口文档 | [API 文档](https://docs.newapi.pro/api) |
-| ❓ 常见问题 | [FAQ](https://docs.newapi.pro/support/faq) |
-| 💬 社区交流 | [交流渠道](https://docs.newapi.pro/support/community-interaction) |
+| 🚀 部署指南 | [安装文档](https://docs.newapi.pro/zh/docs/installation) |
+| ⚙️ 环境配置 | [环境变量](https://docs.newapi.pro/zh/docs/installation/config-maintenance/environment-variables) |
+| 📡 接口文档 | [API 文档](https://docs.newapi.pro/zh/docs/api) |
+| ❓ 常见问题 | [FAQ](https://docs.newapi.pro/zh/docs/support/faq) |
+| 💬 社区交流 | [交流渠道](https://docs.newapi.pro/zh/docs/support/community-interaction) |

 ---

 ## ✨ 主要特性

-> 详细特性请参考 [特性说明](https://docs.newapi.pro/wiki/features-introduction)
+> 详细特性请参考 [特性说明](https://docs.newapi.pro/zh/docs/guide/wiki/basic-concepts/features-introduction)

 ### 🎨 核心功能

@@ -202,11 +202,11 @@ docker run --name new-api -d --restart always \
 ### 🚀 高级功能

 **API 格式支持：**
- ⚡ [OpenAI Responses](https://docs.newapi.pro/api/openai-responses)
- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/api/openai-realtime)（含 Azure）
- ⚡ [Claude Messages](https://docs.newapi.pro/api/anthropic-chat)
- ⚡ [Google Gemini](https://docs.newapi.pro/api/google-gemini-chat/)
- 🔄 [Rerank 模型](https://docs.newapi.pro/api/jinaai-rerank)（Cohere、Jina）
+- ⚡ [OpenAI Responses](https://docs.newapi.pro/zh/docs/api/ai-model/chat/openai/create-response)
+- ⚡ [OpenAI Realtime API](https://docs.newapi.pro/zh/docs/api/ai-model/realtime/create-realtime-session)（含 Azure）
+- ⚡ [Claude Messages](https://docs.newapi.pro/zh/docs/api/ai-model/chat/create-message)
+- ⚡ [Google Gemini](https://doc.newapi.pro/api/google-gemini-chat)
+- 🔄 [Rerank 模型](https://docs.newapi.pro/zh/docs/api/ai-model/rerank/create-rerank)（Cohere、Jina）

 **智能路由：**
 - ⚖️ 渠道加权随机
@@ -247,16 +247,16 @@ docker run --name new-api -d --restart always \

 ## 🤖 模型支持

-> 详情请参考 [接口文档 - 中继接口](https://docs.newapi.pro/api)
+> 详情请参考 [接口文档 - 中继接口](https://docs.newapi.pro/zh/docs/api)

 | 模型类型 | 说明 | 文档 |
 |---------|------|------|
 | 🤖 OpenAI GPTs | gpt-4-gizmo-* 系列 | - |
-| 🎨 Midjourney-Proxy | [Midjourney-Proxy(Plus)](https://github.com/novicezk/midjourney-proxy) | [文档](https://docs.newapi.pro/api/midjourney-proxy-image) |
-| 🎵 Suno-API | [Suno API](https://github.com/Suno-API/Suno-API) | [文档](https://docs.newapi.pro/api/suno-music) |
-| 🔄 Rerank | Cohere、Jina | [文档](https://docs.newapi.pro/api/jinaai-rerank) |
-| 💬 Claude | Messages 格式 | [文档](https://docs.newapi.pro/api/anthropic-chat) |
-| 🌐 Gemini | Google Gemini 格式 | [文档](https://docs.newapi.pro/api/google-gemini-chat/) |
+| 🎨 Midjourney-Proxy | [Midjourney-Proxy(Plus)](https://github.com/novicezk/midjourney-proxy) | [文档](https://doc.newapi.pro/api/midjourney-proxy-image) |
+| 🎵 Suno-API | [Suno API](https://github.com/Suno-API/Suno-API) | [文档](https://doc.newapi.pro/api/suno-music) |
+| 🔄 Rerank | Cohere、Jina | [文档](https://docs.newapi.pro/zh/docs/api/ai-model/rerank/create-rerank) |
+| 💬 Claude | Messages 格式 | [文档](https://docs.newapi.pro/zh/docs/api/ai-model/chat/create-message) |
+| 🌐 Gemini | Google Gemini 格式 | [文档](https://doc.newapi.pro/api/google-gemini-chat) |
 | 🔧 Dify | ChatFlow 模式 | - |
 | 🎯 自定义 | 支持完整调用地址 | - |

@@ -265,16 +265,16 @@ docker run --name new-api -d --restart always \
 <details>
 <summary>查看完整接口列表</summary>

- [聊天接口 (Chat Completions)](https://docs.newapi.pro/api/openai-chat)
- [响应接口 (Responses)](https://docs.newapi.pro/api/openai-responses)
- [图像接口 (Image)](https://docs.newapi.pro/api/openai-image)
- [音频接口 (Audio)](https://docs.newapi.pro/api/openai-audio)
- [视频接口 (Video)](https://docs.newapi.pro/api/openai-video)
- [嵌入接口 (Embeddings)](https://docs.newapi.pro/api/openai-embeddings)
- [重排序接口 (Rerank)](https://docs.newapi.pro/api/jinaai-rerank)
- [实时对话 (Realtime)](https://docs.newapi.pro/api/openai-realtime)
- [Claude 聊天](https://docs.newapi.pro/api/anthropic-chat)
- [Google Gemini 聊天](https://docs.newapi.pro/api/google-gemini-chat)
+- [聊天接口 (Chat Completions)](https://docs.newapi.pro/zh/docs/api/ai-model/chat/openai/create-chat-completion)
+- [响应接口 (Responses)](https://docs.newapi.pro/zh/docs/api/ai-model/chat/openai/create-response)
+- [图像接口 (Image)](https://docs.newapi.pro/zh/docs/api/ai-model/images/openai/v1-images-generations--post)
+- [音频接口 (Audio)](https://docs.newapi.pro/zh/docs/api/ai-model/audio/openai/create-transcription)
+- [视频接口 (Video)](https://docs.newapi.pro/zh/docs/api/ai-model/videos/create-video-generation)
+- [嵌入接口 (Embeddings)](https://docs.newapi.pro/zh/docs/api/ai-model/embeddings/create-embedding)
+- [重排序接口 (Rerank)](https://docs.newapi.pro/zh/docs/api/ai-model/rerank/create-rerank)
+- [实时对话 (Realtime)](https://docs.newapi.pro/zh/docs/api/ai-model/realtime/create-realtime-session)
+- [Claude 聊天](https://docs.newapi.pro/zh/docs/api/ai-model/chat/create-message)
+- [Google Gemini 聊天](https://doc.newapi.pro/api/google-gemini-chat)

 </details>

@@ -306,10 +306,18 @@ docker run --name new-api -d --restart always \
 | `REDIS_CONN_STRING` | Redis 连接字符串                                                  | - |
 | `STREAMING_TIMEOUT` | 流式超时时间（秒）                                                    | `300` |
 | `STREAM_SCANNER_MAX_BUFFER_MB` | 流式扫描器单行最大缓冲（MB），图像生成等超大 `data:` 片段（如 4K 图片 base64）需适当调大 | `64` |
+| `MAX_REQUEST_BODY_MB` | 请求体最大大小（MB，**解压后**计；防止超大请求/zip bomb 导致内存暴涨），超过将返回 `413` | `32` |
 | `AZURE_DEFAULT_API_VERSION` | Azure API 版本                                                 | `2025-04-01-preview` |
 | `ERROR_LOG_ENABLED` | 错误日志开关                                                       | `false` |
+| `PYROSCOPE_URL` | Pyroscope 服务地址                                            | - |
+| `PYROSCOPE_APP_NAME` | Pyroscope 应用名                                        | `new-api` |
+| `PYROSCOPE_BASIC_AUTH_USER` | Pyroscope Basic Auth 用户名                        | - |
+| `PYROSCOPE_BASIC_AUTH_PASSWORD` | Pyroscope Basic Auth 密码                  | - |
+| `PYROSCOPE_MUTEX_RATE` | Pyroscope mutex 采样率                               | `5` |
+| `PYROSCOPE_BLOCK_RATE` | Pyroscope block 采样率                               | `5` |
+| `HOSTNAME` | Pyroscope 标签里的主机名                                          | `new-api` |

-📖 **完整配置：** [环境变量文档](https://docs.newapi.pro/installation/environment-variables)
+📖 **完整配置：** [环境变量文档](https://docs.newapi.pro/zh/docs/installation/config-maintenance/environment-variables)

 </details>

@@ -411,10 +419,10 @@ docker run --name new-api -d --restart always \

 | 资源 | 链接 |
 |------|------|
-| 📘 常见问题 | [FAQ](https://docs.newapi.pro/support/faq) |
-| 💬 社区交流 | [交流渠道](https://docs.newapi.pro/support/community-interaction) |
-| 🐛 反馈问题 | [问题反馈](https://docs.newapi.pro/support/feedback-issues) |
-| 📚 完整文档 | [官方文档](https://docs.newapi.pro/support) |
+| 📘 常见问题 | [FAQ](https://docs.newapi.pro/zh/docs/support/faq) |
+| 💬 社区交流 | [交流渠道](https://docs.newapi.pro/zh/docs/support/community-interaction) |
+| 🐛 反馈问题 | [问题反馈](https://docs.newapi.pro/zh/docs/support/feedback-issues) |
+| 📚 完整文档 | [官方文档](https://docs.newapi.pro/zh/docs) |

 ### 🤝 贡献指南

@@ -443,7 +451,7 @@ docker run --name new-api -d --restart always \

 如果这个项目对你有帮助，欢迎给我们一个 ⭐️ Star！

-**[官方文档](https://docs.newapi.pro/)** • **[问题反馈](https://github.com/Calcium-Ion/new-api/issues)** • **[最新发布](https://github.com/Calcium-Ion/new-api/releases)**
+**[官方文档](https://docs.newapi.pro/zh/docs)** • **[问题反馈](https://github.com/Calcium-Ion/new-api/issues)** • **[最新发布](https://github.com/Calcium-Ion/new-api/releases)**

 <sub>Built with ❤️ by QuantumNous</sub>

--- a/common/audio.go
+++ b/common/audio.go
@@ -71,15 +71,66 @@ func getMP3Duration(r io.Reader) (float64, error) {

 // getWAVDuration 解析 WAV 文件头以获取时长。
 func getWAVDuration(r io.ReadSeeker) (float64, error) {
+	// 1. 强制复位指针
+	r.Seek(0, io.SeekStart)
+
 	dec := wav.NewDecoder(r)
+
+	// IsValidFile 会读取 fmt 块
 	if !dec.IsValidFile() {
 		return 0, errors.New("invalid wav file")
 	}
-	d, err := dec.Duration()
-	if err != nil {
-		return 0, errors.Wrap(err, "failed to get wav duration")
+
+	// 尝试寻找 data 块
+	if err := dec.FwdToPCM(); err != nil {
+		return 0, errors.Wrap(err, "failed to find PCM data chunk")
 	}
-	return d.Seconds(), nil
+
+	pcmSize := int64(dec.PCMSize)
+
+	// 如果读出来的 Size 是 0，尝试用文件大小反推
+	if pcmSize == 0 {
+		// 获取文件总大小
+		currentPos, _ := r.Seek(0, io.SeekCurrent) // 当前通常在 data chunk header 之后
+		endPos, _ := r.Seek(0, io.SeekEnd)
+		fileSize := endPos
+
+		// 恢复位置（虽然如果不继续读也没关系）
+		r.Seek(currentPos, io.SeekStart)
+
+		// 数据区大小 ≈ 文件总大小 - 当前指针位置(即Header大小)
+		// 注意：FwdToPCM 成功后，CurrentPos 应该刚好指向 Data 区数据的开始
+		// 或者是 Data Chunk ID + Size 之后。
+		// WAV Header 一般 44 字节。
+		if fileSize > 44 {
+			// 如果 FwdToPCM 成功，Reader 应该位于 data 块的数据起始处
+			// 所以剩余的所有字节理论上都是音频数据
+			pcmSize = fileSize - currentPos
+
+			// 简单的兜底：如果算出来还是负数或0，强制按文件大小-44计算
+			if pcmSize <= 0 {
+				pcmSize = fileSize - 44
+			}
+		}
+	}
+
+	numChans := int64(dec.NumChans)
+	bitDepth := int64(dec.BitDepth)
+	sampleRate := float64(dec.SampleRate)
+
+	if sampleRate == 0 || numChans == 0 || bitDepth == 0 {
+		return 0, errors.New("invalid wav header metadata")
+	}
+
+	bytesPerFrame := numChans * (bitDepth / 8)
+	if bytesPerFrame == 0 {
+		return 0, errors.New("invalid byte depth calculation")
+	}
+
+	totalFrames := pcmSize / bytesPerFrame
+
+	durationSeconds := float64(totalFrames) / sampleRate
+	return durationSeconds, nil
 }

 // getFLACDuration 解析 FLAC 文件的 STREAMINFO 块。
--- a/common/gin.go
+++ b/common/gin.go
@@ -2,7 +2,7 @@ package common

 import (
 	"bytes"
-	"errors"
+	"fmt"
 	"io"
 	"mime"
 	"mime/multipart"
@@ -12,24 +12,61 @@ import (
 	"time"

 	"github.com/QuantumNous/new-api/constant"
+	"github.com/pkg/errors"

 	"github.com/gin-gonic/gin"
 )

 const KeyRequestBody = "key_request_body"

-func GetRequestBody(c *gin.Context) ([]byte, error) {
-	requestBody, _ := c.Get(KeyRequestBody)
-	if requestBody != nil {
-		return requestBody.([]byte), nil
+var ErrRequestBodyTooLarge = errors.New("request body too large")
+
+func IsRequestBodyTooLargeError(err error) bool {
+	if err == nil {
+		return false
 	}
-	requestBody, err := io.ReadAll(c.Request.Body)
+	if errors.Is(err, ErrRequestBodyTooLarge) {
+		return true
+	}
+	var mbe *http.MaxBytesError
+	return errors.As(err, &mbe)
+}
+
+func GetRequestBody(c *gin.Context) ([]byte, error) {
+	cached, exists := c.Get(KeyRequestBody)
+	if exists && cached != nil {
+		if b, ok := cached.([]byte); ok {
+			return b, nil
+		}
+	}
+	maxMB := constant.MaxRequestBodyMB
+	if maxMB < 0 {
+		// no limit
+		body, err := io.ReadAll(c.Request.Body)
+		_ = c.Request.Body.Close()
+		if err != nil {
+			return nil, err
+		}
+		c.Set(KeyRequestBody, body)
+		return body, nil
+	}
+	maxBytes := int64(maxMB) << 20
+
+	limited := io.LimitReader(c.Request.Body, maxBytes+1)
+	body, err := io.ReadAll(limited)
 	if err != nil {
+		_ = c.Request.Body.Close()
+		if IsRequestBodyTooLargeError(err) {
+			return nil, errors.Wrap(ErrRequestBodyTooLarge, fmt.Sprintf("request body exceeds %d MB", maxMB))
+		}
 		return nil, err
 	}
 	_ = c.Request.Body.Close()
-	c.Set(KeyRequestBody, requestBody)
-	return requestBody.([]byte), nil
+	if int64(len(body)) > maxBytes {
+		return nil, errors.Wrap(ErrRequestBodyTooLarge, fmt.Sprintf("request body exceeds %d MB", maxMB))
+	}
+	c.Set(KeyRequestBody, body)
+	return body, nil
 }

 func UnmarshalBodyReusable(c *gin.Context, v any) error {
--- a/common/init.go
+++ b/common/init.go
@@ -117,6 +117,8 @@ func initConstantEnv() {
 	constant.DifyDebug = GetEnvOrDefaultBool("DIFY_DEBUG", true)
 	constant.MaxFileDownloadMB = GetEnvOrDefault("MAX_FILE_DOWNLOAD_MB", 20)
 	constant.StreamScannerMaxBufferMB = GetEnvOrDefault("STREAM_SCANNER_MAX_BUFFER_MB", 64)
+	// MaxRequestBodyMB 请求体最大大小（解压后），用于防止超大请求/zip bomb导致内存暴涨
+	constant.MaxRequestBodyMB = GetEnvOrDefault("MAX_REQUEST_BODY_MB", 64)
 	// ForceStreamOption 覆盖请求参数，强制返回usage信息
 	constant.ForceStreamOption = GetEnvOrDefaultBool("FORCE_STREAM_OPTION", true)
 	constant.CountToken = GetEnvOrDefaultBool("CountToken", true)
--- a/common/ip.go
+++ b/common/ip.go
@@ -2,6 +2,15 @@ package common

 import "net"

+func IsIP(s string) bool {
+	ip := net.ParseIP(s)
+	return ip != nil
+}
+
+func ParseIP(s string) net.IP {
+	return net.ParseIP(s)
+}
+
 func IsPrivateIP(ip net.IP) bool {
 	if ip.IsLoopback() || ip.IsLinkLocalUnicast() || ip.IsLinkLocalMulticast() {
 		return true
@@ -20,3 +29,23 @@ func IsPrivateIP(ip net.IP) bool {
 	}
 	return false
 }
+
+func IsIpInCIDRList(ip net.IP, cidrList []string) bool {
+	for _, cidr := range cidrList {
+		_, network, err := net.ParseCIDR(cidr)
+		if err != nil {
+			// 尝试作为单个IP处理
+			if whitelistIP := net.ParseIP(cidr); whitelistIP != nil {
+				if ip.Equal(whitelistIP) {
+					return true
+				}
+			}
+			continue
+		}
+
+		if network.Contains(ip) {
+			return true
+		}
+	}
+	return false
+}
--- a/common/pyro.go
+++ b/common/pyro.go
@@ -0,0 +1,56 @@
+package common
+
+import (
+	"runtime"
+
+	"github.com/grafana/pyroscope-go"
+)
+
+func StartPyroScope() error {
+
+	pyroscopeUrl := GetEnvOrDefaultString("PYROSCOPE_URL", "")
+	if pyroscopeUrl == "" {
+		return nil
+	}
+
+	pyroscopeAppName := GetEnvOrDefaultString("PYROSCOPE_APP_NAME", "new-api")
+	pyroscopeBasicAuthUser := GetEnvOrDefaultString("PYROSCOPE_BASIC_AUTH_USER", "")
+	pyroscopeBasicAuthPassword := GetEnvOrDefaultString("PYROSCOPE_BASIC_AUTH_PASSWORD", "")
+	pyroscopeHostname := GetEnvOrDefaultString("HOSTNAME", "new-api")
+
+	mutexRate := GetEnvOrDefault("PYROSCOPE_MUTEX_RATE", 5)
+	blockRate := GetEnvOrDefault("PYROSCOPE_BLOCK_RATE", 5)
+
+	runtime.SetMutexProfileFraction(mutexRate)
+	runtime.SetBlockProfileRate(blockRate)
+
+	_, err := pyroscope.Start(pyroscope.Config{
+		ApplicationName: pyroscopeAppName,
+
+		ServerAddress:     pyroscopeUrl,
+		BasicAuthUser:     pyroscopeBasicAuthUser,
+		BasicAuthPassword: pyroscopeBasicAuthPassword,
+
+		Logger: nil,
+
+		Tags: map[string]string{"hostname": pyroscopeHostname},
+
+		ProfileTypes: []pyroscope.ProfileType{
+			pyroscope.ProfileCPU,
+			pyroscope.ProfileAllocObjects,
+			pyroscope.ProfileAllocSpace,
+			pyroscope.ProfileInuseObjects,
+			pyroscope.ProfileInuseSpace,
+
+			pyroscope.ProfileGoroutines,
+			pyroscope.ProfileMutexCount,
+			pyroscope.ProfileMutexDuration,
+			pyroscope.ProfileBlockCount,
+			pyroscope.ProfileBlockDuration,
+		},
+	})
+	if err != nil {
+		return err
+	}
+	return nil
+}
--- a/common/ssrf_protection.go
+++ b/common/ssrf_protection.go
@@ -186,23 +186,7 @@ func isIPListed(ip net.IP, list []string) bool {
 		return false
 	}

-	for _, whitelistCIDR := range list {
-		_, network, err := net.ParseCIDR(whitelistCIDR)
-		if err != nil {
-			// 尝试作为单个IP处理
-			if whitelistIP := net.ParseIP(whitelistCIDR); whitelistIP != nil {
-				if ip.Equal(whitelistIP) {
-					return true
-				}
-			}
-			continue
-		}
-
-		if network.Contains(ip) {
-			return true
-		}
-	}
-	return false
+	return IsIpInCIDRList(ip, list)
 }

 // IsIPAccessAllowed 检查IP是否允许访问
--- a/common/utils.go
+++ b/common/utils.go
@@ -217,11 +217,6 @@ func IntMax(a int, b int) int {
 	}
 }

-func IsIP(s string) bool {
-	ip := net.ParseIP(s)
-	return ip != nil
-}
-
 func GetUUID() string {
 	code := uuid.New().String()
 	code = strings.Replace(code, "-", "", -1)
--- a/constant/context_key.go
+++ b/constant/context_key.go
@@ -21,7 +21,6 @@ const (
 	ContextKeyTokenCrossGroupRetry   ContextKey = "token_cross_group_retry"

 	/* channel related keys */
-	ContextKeyAutoGroupIndex           ContextKey = "auto_group_index"
 	ContextKeyChannelId                ContextKey = "channel_id"
 	ContextKeyChannelName              ContextKey = "channel_name"
 	ContextKeyChannelCreateTime        ContextKey = "channel_create_time"
@@ -39,6 +38,10 @@ const (
 	ContextKeyChannelMultiKeyIndex     ContextKey = "channel_multi_key_index"
 	ContextKeyChannelKey               ContextKey = "channel_key"

+	ContextKeyAutoGroup           ContextKey = "auto_group"
+	ContextKeyAutoGroupIndex      ContextKey = "auto_group_index"
+	ContextKeyAutoGroupRetryIndex ContextKey = "auto_group_retry_index"
+
 	/* user related keys */
 	ContextKeyUserId      ContextKey = "id"
 	ContextKeyUserSetting ContextKey = "user_setting"
--- a/constant/env.go
+++ b/constant/env.go
@@ -9,6 +9,7 @@ var CountToken bool
 var GetMediaToken bool
 var GetMediaTokenNotStream bool
 var UpdateTask bool
+var MaxRequestBodyMB int
 var AzureDefaultAPIVersion string
 var GeminiVisionMaxImageNum int
 var NotifyLimitCount int
--- a/controller/billing.go
+++ b/controller/billing.go
@@ -2,9 +2,9 @@ package controller

 import (
 	"github.com/QuantumNous/new-api/common"
-	"github.com/QuantumNous/new-api/dto"
 	"github.com/QuantumNous/new-api/model"
 	"github.com/QuantumNous/new-api/setting/operation_setting"
+	"github.com/QuantumNous/new-api/types"
 	"github.com/gin-gonic/gin"
 )

@@ -29,7 +29,7 @@ func GetSubscription(c *gin.Context) {
 		expiredTime = 0
 	}
 	if err != nil {
-		openAIError := dto.OpenAIError{
+		openAIError := types.OpenAIError{
 			Message: err.Error(),
 			Type:    "upstream_error",
 		}
@@ -81,7 +81,7 @@ func GetUsage(c *gin.Context) {
 		quota, err = model.GetUserUsedQuota(userId)
 	}
 	if err != nil {
-		openAIError := dto.OpenAIError{
+		openAIError := types.OpenAIError{
 			Message: err.Error(),
 			Type:    "new_api_error",
 		}
--- a/controller/channel-test.go
+++ b/controller/channel-test.go
@@ -97,6 +97,11 @@ func testChannel(channel *model.Channel, testModel string, endpointType string)
 		if channel.Type == constant.ChannelTypeVolcEngine && strings.Contains(testModel, "seedream") {
 			requestPath = "/v1/images/generations"
 		}
+
+		// responses-only models
+		if strings.Contains(strings.ToLower(testModel), "codex") {
+			requestPath = "/v1/responses"
+		}
 	}

 	c.Request = &http.Request{
@@ -176,7 +181,7 @@ func testChannel(channel *model.Channel, testModel string, endpointType string)
 		}
 	}

-	request := buildTestRequest(testModel, endpointType)
+	request := buildTestRequest(testModel, endpointType, channel)

 	info, err := relaycommon.GenRelayInfo(c, relayFormat, request, nil)

@@ -319,6 +324,16 @@ func testChannel(channel *model.Channel, testModel string, endpointType string)
 		httpResp = resp.(*http.Response)
 		if httpResp.StatusCode != http.StatusOK {
 			err := service.RelayErrorHandler(c.Request.Context(), httpResp, true)
+			common.SysError(fmt.Sprintf(
+				"channel test bad response: channel_id=%d name=%s type=%d model=%s endpoint_type=%s status=%d err=%v",
+				channel.Id,
+				channel.Name,
+				channel.Type,
+				testModel,
+				endpointType,
+				httpResp.StatusCode,
+				err,
+			))
 			return testResult{
 				context:     c,
 				localErr:    err,
@@ -389,7 +404,7 @@ func testChannel(channel *model.Channel, testModel string, endpointType string)
 	}
 }

-func buildTestRequest(model string, endpointType string) dto.Request {
+func buildTestRequest(model string, endpointType string, channel *model.Channel) dto.Request {
 	// 根据端点类型构建不同的测试请求
 	if endpointType != "" {
 		switch constant.EndpointType(endpointType) {
@@ -423,7 +438,7 @@ func buildTestRequest(model string, endpointType string) dto.Request {
 			}
 		case constant.EndpointTypeAnthropic, constant.EndpointTypeGemini, constant.EndpointTypeOpenAI:
 			// 返回 GeneralOpenAIRequest
-			maxTokens := uint(10)
+			maxTokens := uint(16)
 			if constant.EndpointType(endpointType) == constant.EndpointTypeGemini {
 				maxTokens = 3000
 			}
@@ -453,6 +468,14 @@ func buildTestRequest(model string, endpointType string) dto.Request {
 		}
 	}

+	// Responses-only models (e.g. codex series)
+	if strings.Contains(strings.ToLower(model), "codex") {
+		return &dto.OpenAIResponsesRequest{
+			Model: model,
+			Input: json.RawMessage("\"hi\""),
+		}
+	}
+
 	// Chat/Completion 请求 - 返回 GeneralOpenAIRequest
 	testRequest := &dto.GeneralOpenAIRequest{
 		Model:  model,
@@ -466,7 +489,7 @@ func buildTestRequest(model string, endpointType string) dto.Request {
 	}

 	if strings.HasPrefix(model, "o") {
-		testRequest.MaxCompletionTokens = 10
+		testRequest.MaxCompletionTokens = 16
 	} else if strings.Contains(model, "thinking") {
 		if !strings.Contains(model, "claude") {
 			testRequest.MaxTokens = 50
@@ -474,7 +497,7 @@ func buildTestRequest(model string, endpointType string) dto.Request {
 	} else if strings.Contains(model, "gemini") {
 		testRequest.MaxTokens = 3000
 	} else {
-		testRequest.MaxTokens = 10
+		testRequest.MaxTokens = 16
 	}

 	return testRequest
--- a/controller/channel.go
+++ b/controller/channel.go
@@ -11,16 +11,18 @@ import (
 	"github.com/QuantumNous/new-api/constant"
 	"github.com/QuantumNous/new-api/dto"
 	"github.com/QuantumNous/new-api/model"
+	"github.com/QuantumNous/new-api/relay/channel/ollama"
 	"github.com/QuantumNous/new-api/service"

 	"github.com/gin-gonic/gin"
 )

 type OpenAIModel struct {
-	ID         string `json:"id"`
-	Object     string `json:"object"`
-	Created    int64  `json:"created"`
-	OwnedBy    string `json:"owned_by"`
+	ID         string         `json:"id"`
+	Object     string         `json:"object"`
+	Created    int64          `json:"created"`
+	OwnedBy    string         `json:"owned_by"`
+	Metadata   map[string]any `json:"metadata,omitempty"`
 	Permission []struct {
 		ID                 string `json:"id"`
 		Object             string `json:"object"`
@@ -207,6 +209,57 @@ func FetchUpstreamModels(c *gin.Context) {
 		baseURL = channel.GetBaseURL()
 	}

+	// 对于 Ollama 渠道，使用特殊处理
+	if channel.Type == constant.ChannelTypeOllama {
+		key := strings.Split(channel.Key, "\n")[0]
+		models, err := ollama.FetchOllamaModels(baseURL, key)
+		if err != nil {
+			c.JSON(http.StatusOK, gin.H{
+				"success": false,
+				"message": fmt.Sprintf("获取Ollama模型失败: %s", err.Error()),
+			})
+			return
+		}
+
+		result := OpenAIModelsResponse{
+			Data: make([]OpenAIModel, 0, len(models)),
+		}
+
+		for _, modelInfo := range models {
+			metadata := map[string]any{}
+			if modelInfo.Size > 0 {
+				metadata["size"] = modelInfo.Size
+			}
+			if modelInfo.Digest != "" {
+				metadata["digest"] = modelInfo.Digest
+			}
+			if modelInfo.ModifiedAt != "" {
+				metadata["modified_at"] = modelInfo.ModifiedAt
+			}
+			details := modelInfo.Details
+			if details.ParentModel != "" || details.Format != "" || details.Family != "" || len(details.Families) > 0 || details.ParameterSize != "" || details.QuantizationLevel != "" {
+				metadata["details"] = modelInfo.Details
+			}
+			if len(metadata) == 0 {
+				metadata = nil
+			}
+
+			result.Data = append(result.Data, OpenAIModel{
+				ID:       modelInfo.Name,
+				Object:   "model",
+				Created:  0,
+				OwnedBy:  "ollama",
+				Metadata: metadata,
+			})
+		}
+
+		c.JSON(http.StatusOK, gin.H{
+			"success": true,
+			"data":    result.Data,
+		})
+		return
+	}
+
 	var url string
 	switch channel.Type {
 	case constant.ChannelTypeGemini:
@@ -975,6 +1028,32 @@ func FetchModels(c *gin.Context) {
 		baseURL = constant.ChannelBaseURLs[req.Type]
 	}

+	// remove line breaks and extra spaces.
+	key := strings.TrimSpace(req.Key)
+	key = strings.Split(key, "\n")[0]
+
+	if req.Type == constant.ChannelTypeOllama {
+		models, err := ollama.FetchOllamaModels(baseURL, key)
+		if err != nil {
+			c.JSON(http.StatusOK, gin.H{
+				"success": false,
+				"message": fmt.Sprintf("获取Ollama模型失败: %s", err.Error()),
+			})
+			return
+		}
+
+		names := make([]string, 0, len(models))
+		for _, modelInfo := range models {
+			names = append(names, modelInfo.Name)
+		}
+
+		c.JSON(http.StatusOK, gin.H{
+			"success": true,
+			"data":    names,
+		})
+		return
+	}
+
 	client := &http.Client{}
 	url := fmt.Sprintf("%s/v1/models", baseURL)

@@ -987,10 +1066,6 @@ func FetchModels(c *gin.Context) {
 		return
 	}

-	// remove line breaks and extra spaces.
-	key := strings.TrimSpace(req.Key)
-	// If the key contains a line break, only take the first part.
-	key = strings.Split(key, "\n")[0]
 	request.Header.Set("Authorization", "Bearer "+key)

 	response, err := client.Do(request)
@@ -1640,3 +1715,262 @@ func ManageMultiKeys(c *gin.Context) {
 		return
 	}
 }
+
+// OllamaPullModel 拉取 Ollama 模型
+func OllamaPullModel(c *gin.Context) {
+	var req struct {
+		ChannelID int    `json:"channel_id"`
+		ModelName string `json:"model_name"`
+	}
+
+	if err := c.ShouldBindJSON(&req); err != nil {
+		c.JSON(http.StatusBadRequest, gin.H{
+			"success": false,
+			"message": "Invalid request parameters",
+		})
+		return
+	}
+
+	if req.ChannelID == 0 || req.ModelName == "" {
+		c.JSON(http.StatusBadRequest, gin.H{
+			"success": false,
+			"message": "Channel ID and model name are required",
+		})
+		return
+	}
+
+	// 获取渠道信息
+	channel, err := model.GetChannelById(req.ChannelID, true)
+	if err != nil {
+		c.JSON(http.StatusNotFound, gin.H{
+			"success": false,
+			"message": "Channel not found",
+		})
+		return
+	}
+
+	// 检查是否是 Ollama 渠道
+	if channel.Type != constant.ChannelTypeOllama {
+		c.JSON(http.StatusBadRequest, gin.H{
+			"success": false,
+			"message": "This operation is only supported for Ollama channels",
+		})
+		return
+	}
+
+	baseURL := constant.ChannelBaseURLs[channel.Type]
+	if channel.GetBaseURL() != "" {
+		baseURL = channel.GetBaseURL()
+	}
+
+	key := strings.Split(channel.Key, "\n")[0]
+	err = ollama.PullOllamaModel(baseURL, key, req.ModelName)
+	if err != nil {
+		c.JSON(http.StatusInternalServerError, gin.H{
+			"success": false,
+			"message": fmt.Sprintf("Failed to pull model: %s", err.Error()),
+		})
+		return
+	}
+
+	c.JSON(http.StatusOK, gin.H{
+		"success": true,
+		"message": fmt.Sprintf("Model %s pulled successfully", req.ModelName),
+	})
+}
+
+// OllamaPullModelStream 流式拉取 Ollama 模型
+func OllamaPullModelStream(c *gin.Context) {
+	var req struct {
+		ChannelID int    `json:"channel_id"`
+		ModelName string `json:"model_name"`
+	}
+
+	if err := c.ShouldBindJSON(&req); err != nil {
+		c.JSON(http.StatusBadRequest, gin.H{
+			"success": false,
+			"message": "Invalid request parameters",
+		})
+		return
+	}
+
+	if req.ChannelID == 0 || req.ModelName == "" {
+		c.JSON(http.StatusBadRequest, gin.H{
+			"success": false,
+			"message": "Channel ID and model name are required",
+		})
+		return
+	}
+
+	// 获取渠道信息
+	channel, err := model.GetChannelById(req.ChannelID, true)
+	if err != nil {
+		c.JSON(http.StatusNotFound, gin.H{
+			"success": false,
+			"message": "Channel not found",
+		})
+		return
+	}
+
+	// 检查是否是 Ollama 渠道
+	if channel.Type != constant.ChannelTypeOllama {
+		c.JSON(http.StatusBadRequest, gin.H{
+			"success": false,
+			"message": "This operation is only supported for Ollama channels",
+		})
+		return
+	}
+
+	baseURL := constant.ChannelBaseURLs[channel.Type]
+	if channel.GetBaseURL() != "" {
+		baseURL = channel.GetBaseURL()
+	}
+
+	// 设置 SSE 头部
+	c.Header("Content-Type", "text/event-stream")
+	c.Header("Cache-Control", "no-cache")
+	c.Header("Connection", "keep-alive")
+	c.Header("Access-Control-Allow-Origin", "*")
+
+	key := strings.Split(channel.Key, "\n")[0]
+
+	// 创建进度回调函数
+	progressCallback := func(progress ollama.OllamaPullResponse) {
+		data, _ := json.Marshal(progress)
+		fmt.Fprintf(c.Writer, "data: %s\n\n", string(data))
+		c.Writer.Flush()
+	}
+
+	// 执行拉取
+	err = ollama.PullOllamaModelStream(baseURL, key, req.ModelName, progressCallback)
+
+	if err != nil {
+		errorData, _ := json.Marshal(gin.H{
+			"error": err.Error(),
+		})
+		fmt.Fprintf(c.Writer, "data: %s\n\n", string(errorData))
+	} else {
+		successData, _ := json.Marshal(gin.H{
+			"message": fmt.Sprintf("Model %s pulled successfully", req.ModelName),
+		})
+		fmt.Fprintf(c.Writer, "data: %s\n\n", string(successData))
+	}
+
+	// 发送结束标志
+	fmt.Fprintf(c.Writer, "data: [DONE]\n\n")
+	c.Writer.Flush()
+}
+
+// OllamaDeleteModel 删除 Ollama 模型
+func OllamaDeleteModel(c *gin.Context) {
+	var req struct {
+		ChannelID int    `json:"channel_id"`
+		ModelName string `json:"model_name"`
+	}
+
+	if err := c.ShouldBindJSON(&req); err != nil {
+		c.JSON(http.StatusBadRequest, gin.H{
+			"success": false,
+			"message": "Invalid request parameters",
+		})
+		return
+	}
+
+	if req.ChannelID == 0 || req.ModelName == "" {
+		c.JSON(http.StatusBadRequest, gin.H{
+			"success": false,
+			"message": "Channel ID and model name are required",
+		})
+		return
+	}
+
+	// 获取渠道信息
+	channel, err := model.GetChannelById(req.ChannelID, true)
+	if err != nil {
+		c.JSON(http.StatusNotFound, gin.H{
+			"success": false,
+			"message": "Channel not found",
+		})
+		return
+	}
+
+	// 检查是否是 Ollama 渠道
+	if channel.Type != constant.ChannelTypeOllama {
+		c.JSON(http.StatusBadRequest, gin.H{
+			"success": false,
+			"message": "This operation is only supported for Ollama channels",
+		})
+		return
+	}
+
+	baseURL := constant.ChannelBaseURLs[channel.Type]
+	if channel.GetBaseURL() != "" {
+		baseURL = channel.GetBaseURL()
+	}
+
+	key := strings.Split(channel.Key, "\n")[0]
+	err = ollama.DeleteOllamaModel(baseURL, key, req.ModelName)
+	if err != nil {
+		c.JSON(http.StatusInternalServerError, gin.H{
+			"success": false,
+			"message": fmt.Sprintf("Failed to delete model: %s", err.Error()),
+		})
+		return
+	}
+
+	c.JSON(http.StatusOK, gin.H{
+		"success": true,
+		"message": fmt.Sprintf("Model %s deleted successfully", req.ModelName),
+	})
+}
+
+// OllamaVersion 获取 Ollama 服务版本信息
+func OllamaVersion(c *gin.Context) {
+	id, err := strconv.Atoi(c.Param("id"))
+	if err != nil {
+		c.JSON(http.StatusBadRequest, gin.H{
+			"success": false,
+			"message": "Invalid channel id",
+		})
+		return
+	}
+
+	channel, err := model.GetChannelById(id, true)
+	if err != nil {
+		c.JSON(http.StatusNotFound, gin.H{
+			"success": false,
+			"message": "Channel not found",
+		})
+		return
+	}
+
+	if channel.Type != constant.ChannelTypeOllama {
+		c.JSON(http.StatusBadRequest, gin.H{
+			"success": false,
+			"message": "This operation is only supported for Ollama channels",
+		})
+		return
+	}
+
+	baseURL := constant.ChannelBaseURLs[channel.Type]
+	if channel.GetBaseURL() != "" {
+		baseURL = channel.GetBaseURL()
+	}
+
+	key := strings.Split(channel.Key, "\n")[0]
+	version, err := ollama.FetchOllamaVersion(baseURL, key)
+	if err != nil {
+		c.JSON(http.StatusOK, gin.H{
+			"success": false,
+			"message": fmt.Sprintf("获取Ollama版本失败: %s", err.Error()),
+		})
+		return
+	}
+
+	c.JSON(http.StatusOK, gin.H{
+		"success": true,
+		"data": gin.H{
+			"version": version,
+		},
+	})
+}
--- a/controller/deployment.go
+++ b/controller/deployment.go
@@ -0,0 +1,781 @@
+package controller
+
+import (
+	"fmt"
+	"strconv"
+	"strings"
+	"time"
+
+	"github.com/QuantumNous/new-api/common"
+	"github.com/QuantumNous/new-api/pkg/ionet"
+	"github.com/gin-gonic/gin"
+)
+
+func getIoAPIKey(c *gin.Context) (string, bool) {
+	common.OptionMapRWMutex.RLock()
+	enabled := common.OptionMap["model_deployment.ionet.enabled"] == "true"
+	apiKey := common.OptionMap["model_deployment.ionet.api_key"]
+	common.OptionMapRWMutex.RUnlock()
+	if !enabled || strings.TrimSpace(apiKey) == "" {
+		common.ApiErrorMsg(c, "io.net model deployment is not enabled or api key missing")
+		return "", false
+	}
+	return apiKey, true
+}
+
+func getIoClient(c *gin.Context) (*ionet.Client, bool) {
+	apiKey, ok := getIoAPIKey(c)
+	if !ok {
+		return nil, false
+	}
+	return ionet.NewClient(apiKey), true
+}
+
+func getIoEnterpriseClient(c *gin.Context) (*ionet.Client, bool) {
+	apiKey, ok := getIoAPIKey(c)
+	if !ok {
+		return nil, false
+	}
+	return ionet.NewEnterpriseClient(apiKey), true
+}
+
+func TestIoNetConnection(c *gin.Context) {
+	var req struct {
+		APIKey string `json:"api_key"`
+	}
+
+	if err := c.ShouldBindJSON(&req); err != nil {
+		common.ApiErrorMsg(c, "invalid request payload")
+		return
+	}
+
+	apiKey := strings.TrimSpace(req.APIKey)
+	if apiKey == "" {
+		common.ApiErrorMsg(c, "api_key is required")
+		return
+	}
+
+	client := ionet.NewEnterpriseClient(apiKey)
+	result, err := client.GetMaxGPUsPerContainer()
+	if err != nil {
+		if apiErr, ok := err.(*ionet.APIError); ok {
+			message := strings.TrimSpace(apiErr.Message)
+			if message == "" {
+				message = "failed to validate api key"
+			}
+			common.ApiErrorMsg(c, message)
+			return
+		}
+		common.ApiError(c, err)
+		return
+	}
+
+	totalHardware := 0
+	totalAvailable := 0
+	if result != nil {
+		totalHardware = len(result.Hardware)
+		totalAvailable = result.Total
+		if totalAvailable == 0 {
+			for _, hw := range result.Hardware {
+				totalAvailable += hw.Available
+			}
+		}
+	}
+
+	common.ApiSuccess(c, gin.H{
+		"hardware_count":  totalHardware,
+		"total_available": totalAvailable,
+	})
+}
+
+func requireDeploymentID(c *gin.Context) (string, bool) {
+	deploymentID := strings.TrimSpace(c.Param("id"))
+	if deploymentID == "" {
+		common.ApiErrorMsg(c, "deployment ID is required")
+		return "", false
+	}
+	return deploymentID, true
+}
+
+func requireContainerID(c *gin.Context) (string, bool) {
+	containerID := strings.TrimSpace(c.Param("container_id"))
+	if containerID == "" {
+		common.ApiErrorMsg(c, "container ID is required")
+		return "", false
+	}
+	return containerID, true
+}
+
+func mapIoNetDeployment(d ionet.Deployment) map[string]interface{} {
+	var created int64
+	if d.CreatedAt.IsZero() {
+		created = time.Now().Unix()
+	} else {
+		created = d.CreatedAt.Unix()
+	}
+
+	timeRemainingHours := d.ComputeMinutesRemaining / 60
+	timeRemainingMins := d.ComputeMinutesRemaining % 60
+	var timeRemaining string
+	if timeRemainingHours > 0 {
+		timeRemaining = fmt.Sprintf("%d hour %d minutes", timeRemainingHours, timeRemainingMins)
+	} else if timeRemainingMins > 0 {
+		timeRemaining = fmt.Sprintf("%d minutes", timeRemainingMins)
+	} else {
+		timeRemaining = "completed"
+	}
+
+	hardwareInfo := fmt.Sprintf("%s %s x%d", d.BrandName, d.HardwareName, d.HardwareQuantity)
+
+	return map[string]interface{}{
+		"id":                        d.ID,
+		"deployment_name":           d.Name,
+		"container_name":            d.Name,
+		"status":                    strings.ToLower(d.Status),
+		"type":                      "Container",
+		"time_remaining":            timeRemaining,
+		"time_remaining_minutes":    d.ComputeMinutesRemaining,
+		"hardware_info":             hardwareInfo,
+		"hardware_name":             d.HardwareName,
+		"brand_name":                d.BrandName,
+		"hardware_quantity":         d.HardwareQuantity,
+		"completed_percent":         d.CompletedPercent,
+		"compute_minutes_served":    d.ComputeMinutesServed,
+		"compute_minutes_remaining": d.ComputeMinutesRemaining,
+		"created_at":                created,
+		"updated_at":                created,
+		"model_name":                "",
+		"model_version":             "",
+		"instance_count":            d.HardwareQuantity,
+		"resource_config": map[string]interface{}{
+			"cpu":    "",
+			"memory": "",
+			"gpu":    strconv.Itoa(d.HardwareQuantity),
+		},
+		"description": "",
+		"provider":    "io.net",
+	}
+}
+
+func computeStatusCounts(total int, deployments []ionet.Deployment) map[string]int64 {
+	counts := map[string]int64{
+		"all": int64(total),
+	}
+
+	for _, status := range []string{"running", "completed", "failed", "deployment requested", "termination requested", "destroyed"} {
+		counts[status] = 0
+	}
+
+	for _, d := range deployments {
+		status := strings.ToLower(strings.TrimSpace(d.Status))
+		counts[status] = counts[status] + 1
+	}
+
+	return counts
+}
+
+func GetAllDeployments(c *gin.Context) {
+	pageInfo := common.GetPageQuery(c)
+	client, ok := getIoEnterpriseClient(c)
+	if !ok {
+		return
+	}
+
+	status := c.Query("status")
+	opts := &ionet.ListDeploymentsOptions{
+		Status:    strings.ToLower(strings.TrimSpace(status)),
+		Page:      pageInfo.GetPage(),
+		PageSize:  pageInfo.GetPageSize(),
+		SortBy:    "created_at",
+		SortOrder: "desc",
+	}
+
+	dl, err := client.ListDeployments(opts)
+	if err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	items := make([]map[string]interface{}, 0, len(dl.Deployments))
+	for _, d := range dl.Deployments {
+		items = append(items, mapIoNetDeployment(d))
+	}
+
+	data := gin.H{
+		"page":          pageInfo.GetPage(),
+		"page_size":     pageInfo.GetPageSize(),
+		"total":         dl.Total,
+		"items":         items,
+		"status_counts": computeStatusCounts(dl.Total, dl.Deployments),
+	}
+	common.ApiSuccess(c, data)
+}
+
+func SearchDeployments(c *gin.Context) {
+	pageInfo := common.GetPageQuery(c)
+	client, ok := getIoEnterpriseClient(c)
+	if !ok {
+		return
+	}
+
+	status := strings.ToLower(strings.TrimSpace(c.Query("status")))
+	keyword := strings.TrimSpace(c.Query("keyword"))
+
+	dl, err := client.ListDeployments(&ionet.ListDeploymentsOptions{
+		Status:    status,
+		Page:      pageInfo.GetPage(),
+		PageSize:  pageInfo.GetPageSize(),
+		SortBy:    "created_at",
+		SortOrder: "desc",
+	})
+	if err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	filtered := make([]ionet.Deployment, 0, len(dl.Deployments))
+	if keyword == "" {
+		filtered = dl.Deployments
+	} else {
+		kw := strings.ToLower(keyword)
+		for _, d := range dl.Deployments {
+			if strings.Contains(strings.ToLower(d.Name), kw) {
+				filtered = append(filtered, d)
+			}
+		}
+	}
+
+	items := make([]map[string]interface{}, 0, len(filtered))
+	for _, d := range filtered {
+		items = append(items, mapIoNetDeployment(d))
+	}
+
+	total := dl.Total
+	if keyword != "" {
+		total = len(filtered)
+	}
+
+	data := gin.H{
+		"page":      pageInfo.GetPage(),
+		"page_size": pageInfo.GetPageSize(),
+		"total":     total,
+		"items":     items,
+	}
+	common.ApiSuccess(c, data)
+}
+
+func GetDeployment(c *gin.Context) {
+	client, ok := getIoEnterpriseClient(c)
+	if !ok {
+		return
+	}
+
+	deploymentID, ok := requireDeploymentID(c)
+	if !ok {
+		return
+	}
+
+	details, err := client.GetDeployment(deploymentID)
+	if err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	data := map[string]interface{}{
+		"id":              details.ID,
+		"deployment_name": details.ID,
+		"model_name":      "",
+		"model_version":   "",
+		"status":          strings.ToLower(details.Status),
+		"instance_count":  details.TotalContainers,
+		"hardware_id":     details.HardwareID,
+		"resource_config": map[string]interface{}{
+			"cpu":    "",
+			"memory": "",
+			"gpu":    strconv.Itoa(details.TotalGPUs),
+		},
+		"created_at":                details.CreatedAt.Unix(),
+		"updated_at":                details.CreatedAt.Unix(),
+		"description":               "",
+		"amount_paid":               details.AmountPaid,
+		"completed_percent":         details.CompletedPercent,
+		"gpus_per_container":        details.GPUsPerContainer,
+		"total_gpus":                details.TotalGPUs,
+		"total_containers":          details.TotalContainers,
+		"hardware_name":             details.HardwareName,
+		"brand_name":                details.BrandName,
+		"compute_minutes_served":    details.ComputeMinutesServed,
+		"compute_minutes_remaining": details.ComputeMinutesRemaining,
+		"locations":                 details.Locations,
+		"container_config":          details.ContainerConfig,
+	}
+
+	common.ApiSuccess(c, data)
+}
+
+func UpdateDeploymentName(c *gin.Context) {
+	client, ok := getIoEnterpriseClient(c)
+	if !ok {
+		return
+	}
+
+	deploymentID, ok := requireDeploymentID(c)
+	if !ok {
+		return
+	}
+
+	var req struct {
+		Name string `json:"name" binding:"required"`
+	}
+
+	if err := c.ShouldBindJSON(&req); err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	updateReq := &ionet.UpdateClusterNameRequest{
+		Name: strings.TrimSpace(req.Name),
+	}
+
+	if updateReq.Name == "" {
+		common.ApiErrorMsg(c, "deployment name cannot be empty")
+		return
+	}
+
+	available, err := client.CheckClusterNameAvailability(updateReq.Name)
+	if err != nil {
+		common.ApiError(c, fmt.Errorf("failed to check name availability: %w", err))
+		return
+	}
+
+	if !available {
+		common.ApiErrorMsg(c, "deployment name is not available, please choose a different name")
+		return
+	}
+
+	resp, err := client.UpdateClusterName(deploymentID, updateReq)
+	if err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	data := gin.H{
+		"status":  resp.Status,
+		"message": resp.Message,
+		"id":      deploymentID,
+		"name":    updateReq.Name,
+	}
+	common.ApiSuccess(c, data)
+}
+
+func UpdateDeployment(c *gin.Context) {
+	client, ok := getIoEnterpriseClient(c)
+	if !ok {
+		return
+	}
+
+	deploymentID, ok := requireDeploymentID(c)
+	if !ok {
+		return
+	}
+
+	var req ionet.UpdateDeploymentRequest
+	if err := c.ShouldBindJSON(&req); err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	resp, err := client.UpdateDeployment(deploymentID, &req)
+	if err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	data := gin.H{
+		"status":        resp.Status,
+		"deployment_id": resp.DeploymentID,
+	}
+	common.ApiSuccess(c, data)
+}
+
+func ExtendDeployment(c *gin.Context) {
+	client, ok := getIoEnterpriseClient(c)
+	if !ok {
+		return
+	}
+
+	deploymentID, ok := requireDeploymentID(c)
+	if !ok {
+		return
+	}
+
+	var req ionet.ExtendDurationRequest
+	if err := c.ShouldBindJSON(&req); err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	details, err := client.ExtendDeployment(deploymentID, &req)
+	if err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	data := mapIoNetDeployment(ionet.Deployment{
+		ID:                      details.ID,
+		Status:                  details.Status,
+		Name:                    deploymentID,
+		CompletedPercent:        float64(details.CompletedPercent),
+		HardwareQuantity:        details.TotalGPUs,
+		BrandName:               details.BrandName,
+		HardwareName:            details.HardwareName,
+		ComputeMinutesServed:    details.ComputeMinutesServed,
+		ComputeMinutesRemaining: details.ComputeMinutesRemaining,
+		CreatedAt:               details.CreatedAt,
+	})
+
+	common.ApiSuccess(c, data)
+}
+
+func DeleteDeployment(c *gin.Context) {
+	client, ok := getIoEnterpriseClient(c)
+	if !ok {
+		return
+	}
+
+	deploymentID, ok := requireDeploymentID(c)
+	if !ok {
+		return
+	}
+
+	resp, err := client.DeleteDeployment(deploymentID)
+	if err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	data := gin.H{
+		"status":        resp.Status,
+		"deployment_id": resp.DeploymentID,
+		"message":       "Deployment termination requested successfully",
+	}
+	common.ApiSuccess(c, data)
+}
+
+func CreateDeployment(c *gin.Context) {
+	client, ok := getIoEnterpriseClient(c)
+	if !ok {
+		return
+	}
+
+	var req ionet.DeploymentRequest
+	if err := c.ShouldBindJSON(&req); err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	resp, err := client.DeployContainer(&req)
+	if err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	data := gin.H{
+		"deployment_id": resp.DeploymentID,
+		"status":        resp.Status,
+		"message":       "Deployment created successfully",
+	}
+	common.ApiSuccess(c, data)
+}
+
+func GetHardwareTypes(c *gin.Context) {
+	client, ok := getIoEnterpriseClient(c)
+	if !ok {
+		return
+	}
+
+	hardwareTypes, totalAvailable, err := client.ListHardwareTypes()
+	if err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	data := gin.H{
+		"hardware_types":  hardwareTypes,
+		"total":           len(hardwareTypes),
+		"total_available": totalAvailable,
+	}
+	common.ApiSuccess(c, data)
+}
+
+func GetLocations(c *gin.Context) {
+	client, ok := getIoClient(c)
+	if !ok {
+		return
+	}
+
+	locationsResp, err := client.ListLocations()
+	if err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	total := locationsResp.Total
+	if total == 0 {
+		total = len(locationsResp.Locations)
+	}
+
+	data := gin.H{
+		"locations": locationsResp.Locations,
+		"total":     total,
+	}
+	common.ApiSuccess(c, data)
+}
+
+func GetAvailableReplicas(c *gin.Context) {
+	client, ok := getIoEnterpriseClient(c)
+	if !ok {
+		return
+	}
+
+	hardwareIDStr := c.Query("hardware_id")
+	gpuCountStr := c.Query("gpu_count")
+
+	if hardwareIDStr == "" {
+		common.ApiErrorMsg(c, "hardware_id parameter is required")
+		return
+	}
+
+	hardwareID, err := strconv.Atoi(hardwareIDStr)
+	if err != nil || hardwareID <= 0 {
+		common.ApiErrorMsg(c, "invalid hardware_id parameter")
+		return
+	}
+
+	gpuCount := 1
+	if gpuCountStr != "" {
+		if parsed, err := strconv.Atoi(gpuCountStr); err == nil && parsed > 0 {
+			gpuCount = parsed
+		}
+	}
+
+	replicas, err := client.GetAvailableReplicas(hardwareID, gpuCount)
+	if err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	common.ApiSuccess(c, replicas)
+}
+
+func GetPriceEstimation(c *gin.Context) {
+	client, ok := getIoEnterpriseClient(c)
+	if !ok {
+		return
+	}
+
+	var req ionet.PriceEstimationRequest
+	if err := c.ShouldBindJSON(&req); err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	priceResp, err := client.GetPriceEstimation(&req)
+	if err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	common.ApiSuccess(c, priceResp)
+}
+
+func CheckClusterNameAvailability(c *gin.Context) {
+	client, ok := getIoEnterpriseClient(c)
+	if !ok {
+		return
+	}
+
+	clusterName := strings.TrimSpace(c.Query("name"))
+	if clusterName == "" {
+		common.ApiErrorMsg(c, "name parameter is required")
+		return
+	}
+
+	available, err := client.CheckClusterNameAvailability(clusterName)
+	if err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	data := gin.H{
+		"available": available,
+		"name":      clusterName,
+	}
+	common.ApiSuccess(c, data)
+}
+
+func GetDeploymentLogs(c *gin.Context) {
+	client, ok := getIoClient(c)
+	if !ok {
+		return
+	}
+
+	deploymentID, ok := requireDeploymentID(c)
+	if !ok {
+		return
+	}
+
+	containerID := c.Query("container_id")
+	if containerID == "" {
+		common.ApiErrorMsg(c, "container_id parameter is required")
+		return
+	}
+	level := c.Query("level")
+	stream := c.Query("stream")
+	cursor := c.Query("cursor")
+	limitStr := c.Query("limit")
+	follow := c.Query("follow") == "true"
+
+	var limit int = 100
+	if limitStr != "" {
+		if parsedLimit, err := strconv.Atoi(limitStr); err == nil && parsedLimit > 0 {
+			limit = parsedLimit
+			if limit > 1000 {
+				limit = 1000
+			}
+		}
+	}
+
+	opts := &ionet.GetLogsOptions{
+		Level:  level,
+		Stream: stream,
+		Limit:  limit,
+		Cursor: cursor,
+		Follow: follow,
+	}
+
+	if startTime := c.Query("start_time"); startTime != "" {
+		if t, err := time.Parse(time.RFC3339, startTime); err == nil {
+			opts.StartTime = &t
+		}
+	}
+	if endTime := c.Query("end_time"); endTime != "" {
+		if t, err := time.Parse(time.RFC3339, endTime); err == nil {
+			opts.EndTime = &t
+		}
+	}
+
+	rawLogs, err := client.GetContainerLogsRaw(deploymentID, containerID, opts)
+	if err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	common.ApiSuccess(c, rawLogs)
+}
+
+func ListDeploymentContainers(c *gin.Context) {
+	client, ok := getIoEnterpriseClient(c)
+	if !ok {
+		return
+	}
+
+	deploymentID, ok := requireDeploymentID(c)
+	if !ok {
+		return
+	}
+
+	containers, err := client.ListContainers(deploymentID)
+	if err != nil {
+		common.ApiError(c, err)
+		return
+	}
+
+	items := make([]map[string]interface{}, 0)
+	if containers != nil {
+		items = make([]map[string]interface{}, 0, len(containers.Workers))
+		for _, ctr := range containers.Workers {
+			events := make([]map[string]interface{}, 0, len(ctr.ContainerEvents))
+			for _, event := range ctr.ContainerEvents {
+				events = append(events, map[string]interface{}{
+					"time":    event.Time.Unix(),
+					"message": event.Message,
+				})
+			}
+
+			items = append(items, map[string]interface{}{
+				"container_id":       ctr.ContainerID,
+				"device_id":          ctr.DeviceID,
+				"status":             strings.ToLower(strings.TrimSpace(ctr.Status)),
+				"hardware":           ctr.Hardware,
+				"brand_name":         ctr.BrandName,
+				"created_at":         ctr.CreatedAt.Unix(),
+				"uptime_percent":     ctr.UptimePercent,
+				"gpus_per_container": ctr.GPUsPerContainer,
+				"public_url":         ctr.PublicURL,
+				"events":             events,
+			})
+		}
+	}
+
+	response := gin.H{
+		"total":      0,
+		"containers": items,
+	}
+	if containers != nil {
+		response["total"] = containers.Total
+	}
+
+	common.ApiSuccess(c, response)
+}
+
+func GetContainerDetails(c *gin.Context) {
+	client, ok := getIoEnterpriseClient(c)
+	if !ok {
+		return
+	}
+
+	deploymentID, ok := requireDeploymentID(c)
+	if !ok {
+		return
+	}
+
+	containerID, ok := requireContainerID(c)
+	if !ok {
+		return
+	}
+
+	details, err := client.GetContainerDetails(deploymentID, containerID)
+	if err != nil {
+		common.ApiError(c, err)
+		return
+	}
+	if details == nil {
+		common.ApiErrorMsg(c, "container details not found")
+		return
+	}
+
+	events := make([]map[string]interface{}, 0, len(details.ContainerEvents))
+	for _, event := range details.ContainerEvents {
+		events = append(events, map[string]interface{}{
+			"time":    event.Time.Unix(),
+			"message": event.Message,
+		})
+	}
+
+	data := gin.H{
+		"deployment_id":      deploymentID,
+		"container_id":       details.ContainerID,
+		"device_id":          details.DeviceID,
+		"status":             strings.ToLower(strings.TrimSpace(details.Status)),
+		"hardware":           details.Hardware,
+		"brand_name":         details.BrandName,
+		"created_at":         details.CreatedAt.Unix(),
+		"uptime_percent":     details.UptimePercent,
+		"gpus_per_container": details.GPUsPerContainer,
+		"public_url":         details.PublicURL,
+		"events":             events,
+	}
+
+	common.ApiSuccess(c, data)
+}
--- a/controller/discord.go
+++ b/controller/discord.go
@@ -114,7 +114,7 @@ func DiscordOAuth(c *gin.Context) {
 		DiscordBind(c)
 		return
 	}
-		if !system_setting.GetDiscordSettings().Enabled {
+	if !system_setting.GetDiscordSettings().Enabled {
 		c.JSON(http.StatusOK, gin.H{
 			"success": false,
 			"message": "管理员未开启通过 Discord 登录以及注册",
--- a/controller/model.go
+++ b/controller/model.go
@@ -18,6 +18,7 @@ import (
 	"github.com/QuantumNous/new-api/service"
 	"github.com/QuantumNous/new-api/setting/operation_setting"
 	"github.com/QuantumNous/new-api/setting/ratio_setting"
+	"github.com/QuantumNous/new-api/types"
 	"github.com/gin-gonic/gin"
 	"github.com/samber/lo"
 )
@@ -275,7 +276,7 @@ func RetrieveModel(c *gin.Context, modelType int) {
 			c.JSON(200, aiModel)
 		}
 	} else {
-		openAIError := dto.OpenAIError{
+		openAIError := types.OpenAIError{
 			Message: fmt.Sprintf("The model '%s' does not exist", modelId),
 			Type:    "invalid_request_error",
 			Param:   "model",
--- a/controller/model_sync.go
+++ b/controller/model_sync.go
@@ -249,7 +249,9 @@ func ensureVendorID(vendorName string, vendorByName map[string]upstreamVendor, v
 	return 0
 }

-// SyncUpstreamModels 同步上游模型与供应商，仅对「未配置模型」生效
+// SyncUpstreamModels 同步上游模型与供应商：
+// - 默认仅创建「未配置模型」
+// - 可通过 overwrite 选择性覆盖更新本地已有模型的字段（前提：sync_official <> 0）
 func SyncUpstreamModels(c *gin.Context) {
 	var req syncRequest
 	// 允许空体
@@ -260,12 +262,26 @@ func SyncUpstreamModels(c *gin.Context) {
 		c.JSON(http.StatusOK, gin.H{"success": false, "message": err.Error()})
 		return
 	}
-	if len(missing) == 0 {
-		c.JSON(http.StatusOK, gin.H{"success": true, "data": gin.H{
-			"created_models":  0,
-			"created_vendors": 0,
-			"skipped_models":  []string{},
-		}})
+
+	// 若既无缺失模型需要创建，也未指定覆盖更新字段，则无需请求上游数据，直接返回
+	if len(missing) == 0 && len(req.Overwrite) == 0 {
+		modelsURL, vendorsURL := getUpstreamURLs(req.Locale)
+		c.JSON(http.StatusOK, gin.H{
+			"success": true,
+			"data": gin.H{
+				"created_models":  0,
+				"created_vendors": 0,
+				"updated_models":  0,
+				"skipped_models":  []string{},
+				"created_list":    []string{},
+				"updated_list":    []string{},
+				"source": gin.H{
+					"locale":      req.Locale,
+					"models_url":  modelsURL,
+					"vendors_url": vendorsURL,
+				},
+			},
+		})
 		return
 	}

@@ -315,9 +331,9 @@ func SyncUpstreamModels(c *gin.Context) {
 	createdModels := 0
 	createdVendors := 0
 	updatedModels := 0
-	var skipped []string
-	var createdList []string
-	var updatedList []string
+	skipped := make([]string, 0)
+	createdList := make([]string, 0)
+	updatedList := make([]string, 0)

 	// 本地缓存：vendorName -> id
 	vendorIDCache := make(map[string]int)
--- a/controller/playground.go
+++ b/controller/playground.go
@@ -3,10 +3,7 @@ package controller
 import (
 	"errors"
 	"fmt"
-	"time"

-	"github.com/QuantumNous/new-api/common"
-	"github.com/QuantumNous/new-api/constant"
 	"github.com/QuantumNous/new-api/middleware"
 	"github.com/QuantumNous/new-api/model"
 	relaycommon "github.com/QuantumNous/new-api/relay/common"
@@ -54,12 +51,6 @@ func Playground(c *gin.Context) {
 		Group:  relayInfo.UsingGroup,
 	}
 	_ = middleware.SetupContextForToken(c, tempToken)
-	_, newAPIError = getChannel(c, relayInfo, 0)
-	if newAPIError != nil {
-		return
-	}
-	//middleware.SetupContextForSelectedChannel(c, channel, playgroundRequest.Model)
-	common.SetContextKey(c, constant.ContextKeyRequestStartTime, time.Now())

 	Relay(c, types.RelayFormatOpenAI)
 }
--- a/controller/relay.go
+++ b/controller/relay.go
@@ -2,6 +2,7 @@ package controller

 import (
 	"bytes"
+	"errors"
 	"fmt"
 	"io"
 	"log"
@@ -104,7 +105,12 @@ func Relay(c *gin.Context, relayFormat types.RelayFormat) {

 	request, err := helper.GetAndValidateRequest(c, relayFormat)
 	if err != nil {
-		newAPIError = types.NewError(err, types.ErrorCodeInvalidRequest)
+		// Map "request body too large" to 413 so clients can handle it correctly
+		if common.IsRequestBodyTooLargeError(err) || errors.Is(err, common.ErrRequestBodyTooLarge) {
+			newAPIError = types.NewErrorWithStatusCode(err, types.ErrorCodeReadRequestBodyFailed, http.StatusRequestEntityTooLarge, types.ErrOptionWithSkipRetry())
+		} else {
+			newAPIError = types.NewError(err, types.ErrorCodeInvalidRequest)
+		}
 		return
 	}

@@ -114,9 +120,17 @@ func Relay(c *gin.Context, relayFormat types.RelayFormat) {
 		return
 	}

-	meta := request.GetTokenCountMeta()
+	needSensitiveCheck := setting.ShouldCheckPromptSensitive()
+	needCountToken := constant.CountToken
+	// Avoid building huge CombineText (strings.Join) when token counting and sensitive check are both disabled.
+	var meta *types.TokenCountMeta
+	if needSensitiveCheck || needCountToken {
+		meta = request.GetTokenCountMeta()
+	} else {
+		meta = fastTokenCountMetaForPricing(request)
+	}

-	if setting.ShouldCheckPromptSensitive() {
+	if needSensitiveCheck && meta != nil {
 		contains, words := service.CheckSensitiveText(meta.CombineText)
 		if contains {
 			logger.LogWarn(c, fmt.Sprintf("user sensitive words detected: %s", strings.Join(words, ", ")))
@@ -157,16 +171,32 @@ func Relay(c *gin.Context, relayFormat types.RelayFormat) {
 		}
 	}()

-	for i := 0; i <= common.RetryTimes; i++ {
-		channel, err := getChannel(c, relayInfo, i)
-		if err != nil {
-			logger.LogError(c, err.Error())
-			newAPIError = err
+	retryParam := &service.RetryParam{
+		Ctx:        c,
+		TokenGroup: relayInfo.TokenGroup,
+		ModelName:  relayInfo.OriginModelName,
+		Retry:      common.GetPointer(0),
+	}
+
+	for ; retryParam.GetRetry() <= common.RetryTimes; retryParam.IncreaseRetry() {
+		channel, channelErr := getChannel(c, relayInfo, retryParam)
+		if channelErr != nil {
+			logger.LogError(c, channelErr.Error())
+			newAPIError = channelErr
 			break
 		}

 		addUsedChannel(c, channel.Id)
-		requestBody, _ := common.GetRequestBody(c)
+		requestBody, bodyErr := common.GetRequestBody(c)
+		if bodyErr != nil {
+			// Ensure consistent 413 for oversized bodies even when error occurs later (e.g., retry path)
+			if common.IsRequestBodyTooLargeError(bodyErr) || errors.Is(bodyErr, common.ErrRequestBodyTooLarge) {
+				newAPIError = types.NewErrorWithStatusCode(bodyErr, types.ErrorCodeReadRequestBodyFailed, http.StatusRequestEntityTooLarge, types.ErrOptionWithSkipRetry())
+			} else {
+				newAPIError = types.NewErrorWithStatusCode(bodyErr, types.ErrorCodeReadRequestBodyFailed, http.StatusBadRequest, types.ErrOptionWithSkipRetry())
+			}
+			break
+		}
 		c.Request.Body = io.NopCloser(bytes.NewBuffer(requestBody))

 		switch relayFormat {
@@ -186,7 +216,7 @@ func Relay(c *gin.Context, relayFormat types.RelayFormat) {

 		processChannelError(c, *types.NewChannelError(channel.Id, channel.Type, channel.Name, channel.ChannelInfo.IsMultiKey, common.GetContextKeyString(c, constant.ContextKeyChannelKey), channel.GetAutoBan()), newAPIError)

-		if !shouldRetry(c, newAPIError, common.RetryTimes-i) {
+		if !shouldRetry(c, newAPIError, common.RetryTimes-retryParam.GetRetry()) {
 			break
 		}
 	}
@@ -211,8 +241,35 @@ func addUsedChannel(c *gin.Context, channelId int) {
 	c.Set("use_channel", useChannel)
 }

-func getChannel(c *gin.Context, info *relaycommon.RelayInfo, retryCount int) (*model.Channel, *types.NewAPIError) {
-	if retryCount == 0 {
+func fastTokenCountMetaForPricing(request dto.Request) *types.TokenCountMeta {
+	if request == nil {
+		return &types.TokenCountMeta{}
+	}
+	meta := &types.TokenCountMeta{
+		TokenType: types.TokenTypeTokenizer,
+	}
+	switch r := request.(type) {
+	case *dto.GeneralOpenAIRequest:
+		if r.MaxCompletionTokens > r.MaxTokens {
+			meta.MaxTokens = int(r.MaxCompletionTokens)
+		} else {
+			meta.MaxTokens = int(r.MaxTokens)
+		}
+	case *dto.OpenAIResponsesRequest:
+		meta.MaxTokens = int(r.MaxOutputTokens)
+	case *dto.ClaudeRequest:
+		meta.MaxTokens = int(r.MaxTokens)
+	case *dto.ImageRequest:
+		// Pricing for image requests depends on ImagePriceRatio; safe to compute even when CountToken is disabled.
+		return r.GetTokenCountMeta()
+	default:
+		// Best-effort: leave CombineText empty to avoid large allocations.
+	}
+	return meta
+}
+
+func getChannel(c *gin.Context, info *relaycommon.RelayInfo, retryParam *service.RetryParam) (*model.Channel, *types.NewAPIError) {
+	if info.ChannelMeta == nil {
 		autoBan := c.GetBool("auto_ban")
 		autoBanInt := 1
 		if !autoBan {
@@ -225,7 +282,7 @@ func getChannel(c *gin.Context, info *relaycommon.RelayInfo, retryCount int) (*m
 			AutoBan: &autoBanInt,
 		}, nil
 	}
-	channel, selectGroup, err := service.CacheGetRandomSatisfiedChannel(c, info.TokenGroup, info.OriginModelName, retryCount)
+	channel, selectGroup, err := service.CacheGetRandomSatisfiedChannel(retryParam)

 	info.PriceData.GroupRatioInfo = helper.HandleGroupRatio(c, info)

@@ -370,7 +427,7 @@ func RelayMidjourney(c *gin.Context) {
 }

 func RelayNotImplemented(c *gin.Context) {
-	err := dto.OpenAIError{
+	err := types.OpenAIError{
 		Message: "API not implemented",
 		Type:    "new_api_error",
 		Param:   "",
@@ -382,7 +439,7 @@ func RelayNotImplemented(c *gin.Context) {
 }

 func RelayNotFound(c *gin.Context) {
-	err := dto.OpenAIError{
+	err := types.OpenAIError{
 		Message: fmt.Sprintf("Invalid URL (%s %s)", c.Request.Method, c.Request.URL.Path),
 		Type:    "invalid_request_error",
 		Param:   "",
@@ -405,8 +462,14 @@ func RelayTask(c *gin.Context) {
 	if taskErr == nil {
 		retryTimes = 0
 	}
-	for i := 0; shouldRetryTaskRelay(c, channelId, taskErr, retryTimes) && i < retryTimes; i++ {
-		channel, newAPIError := getChannel(c, relayInfo, i)
+	retryParam := &service.RetryParam{
+		Ctx:        c,
+		TokenGroup: relayInfo.TokenGroup,
+		ModelName:  relayInfo.OriginModelName,
+		Retry:      common.GetPointer(0),
+	}
+	for ; shouldRetryTaskRelay(c, channelId, taskErr, retryTimes) && retryParam.GetRetry() < retryTimes; retryParam.IncreaseRetry() {
+		channel, newAPIError := getChannel(c, relayInfo, retryParam)
 		if newAPIError != nil {
 			logger.LogError(c, fmt.Sprintf("CacheGetRandomSatisfiedChannel failed: %s", newAPIError.Error()))
 			taskErr = service.TaskErrorWrapperLocal(newAPIError.Err, "get_channel_failed", http.StatusInternalServerError)
@@ -416,10 +479,18 @@ func RelayTask(c *gin.Context) {
 		useChannel := c.GetStringSlice("use_channel")
 		useChannel = append(useChannel, fmt.Sprintf("%d", channelId))
 		c.Set("use_channel", useChannel)
-		logger.LogInfo(c, fmt.Sprintf("using channel #%d to retry (remain times %d)", channel.Id, i))
+		logger.LogInfo(c, fmt.Sprintf("using channel #%d to retry (remain times %d)", channel.Id, retryParam.GetRetry()))
 		//middleware.SetupContextForSelectedChannel(c, channel, originalModel)

-		requestBody, _ := common.GetRequestBody(c)
+		requestBody, err := common.GetRequestBody(c)
+		if err != nil {
+			if common.IsRequestBodyTooLargeError(err) || errors.Is(err, common.ErrRequestBodyTooLarge) {
+				taskErr = service.TaskErrorWrapperLocal(err, "read_request_body_failed", http.StatusRequestEntityTooLarge)
+			} else {
+				taskErr = service.TaskErrorWrapperLocal(err, "read_request_body_failed", http.StatusBadRequest)
+			}
+			break
+		}
 		c.Request.Body = io.NopCloser(bytes.NewBuffer(requestBody))
 		taskErr = taskRelayHandler(c, relayInfo)
 	}
--- a/controller/task.go
+++ b/controller/task.go
@@ -88,7 +88,7 @@ func UpdateSunoTaskAll(ctx context.Context, taskChannelM map[int][]string, taskM
 	for channelId, taskIds := range taskChannelM {
 		err := updateSunoTaskAll(ctx, channelId, taskIds, taskM)
 		if err != nil {
-			logger.LogError(ctx, fmt.Sprintf("渠道 #%d 更新异步任务失败: %d", channelId, err.Error()))
+			logger.LogError(ctx, fmt.Sprintf("渠道 #%d 更新异步任务失败: %s", channelId, err.Error()))
 		}
 	}
 	return nil
@@ -141,7 +141,7 @@ func updateSunoTaskAll(ctx context.Context, channelId int, taskIds []string, tas
 		return err
 	}
 	if !responseItems.IsSuccess() {
-		common.SysLog(fmt.Sprintf("渠道 #%d 未完成的任务有: %d, 成功获取到任务数: %d", channelId, len(taskIds), string(responseBody)))
+		common.SysLog(fmt.Sprintf("渠道 #%d 未完成的任务有: %d, 成功获取到任务数: %s", channelId, len(taskIds), string(responseBody)))
 		return err
 	}

--- a/controller/token.go
+++ b/controller/token.go
@@ -1,6 +1,7 @@
 package controller

 import (
+	"fmt"
 	"net/http"
 	"strconv"
 	"strings"
@@ -149,6 +150,24 @@ func AddToken(c *gin.Context) {
 		})
 		return
 	}
+	// 非无限额度时，检查额度值是否超出有效范围
+	if !token.UnlimitedQuota {
+		if token.RemainQuota < 0 {
+			c.JSON(http.StatusOK, gin.H{
+				"success": false,
+				"message": "额度值不能为负数",
+			})
+			return
+		}
+		maxQuotaValue := int((1000000000 * common.QuotaPerUnit))
+		if token.RemainQuota > maxQuotaValue {
+			c.JSON(http.StatusOK, gin.H{
+				"success": false,
+				"message": fmt.Sprintf("额度值超出有效范围，最大值为 %d", maxQuotaValue),
+			})
+			return
+		}
+	}
 	key, err := common.GenerateKey()
 	if err != nil {
 		c.JSON(http.StatusOK, gin.H{
@@ -171,6 +190,7 @@ func AddToken(c *gin.Context) {
 		ModelLimits:        token.ModelLimits,
 		AllowIps:           token.AllowIps,
 		Group:              token.Group,
+		CrossGroupRetry:    token.CrossGroupRetry,
 	}
 	err = cleanToken.Insert()
 	if err != nil {
@@ -215,6 +235,23 @@ func UpdateToken(c *gin.Context) {
 		})
 		return
 	}
+	if !token.UnlimitedQuota {
+		if token.RemainQuota < 0 {
+			c.JSON(http.StatusOK, gin.H{
+				"success": false,
+				"message": "额度值不能为负数",
+			})
+			return
+		}
+		maxQuotaValue := int((1000000000 * common.QuotaPerUnit))
+		if token.RemainQuota > maxQuotaValue {
+			c.JSON(http.StatusOK, gin.H{
+				"success": false,
+				"message": fmt.Sprintf("额度值超出有效范围，最大值为 %d", maxQuotaValue),
+			})
+			return
+		}
+	}
 	cleanToken, err := model.GetTokenByIds(token.Id, userId)
 	if err != nil {
 		common.ApiError(c, err)
@@ -260,7 +297,6 @@ func UpdateToken(c *gin.Context) {
 		"message": "",
 		"data":    cleanToken,
 	})
-	return
 }

 type TokenBatch struct {
--- a/controller/topup_creem.go
+++ b/controller/topup_creem.go
@@ -7,12 +7,12 @@ import (
 	"encoding/hex"
 	"encoding/json"
 	"fmt"
-	"io"
-	"log"
-	"net/http"
 	"github.com/QuantumNous/new-api/common"
 	"github.com/QuantumNous/new-api/model"
 	"github.com/QuantumNous/new-api/setting"
+	"io"
+	"log"
+	"net/http"
 	"time"

 	"github.com/gin-gonic/gin"
--- a/controller/user.go
+++ b/controller/user.go
@@ -110,18 +110,17 @@ func setupLogin(user *model.User, c *gin.Context) {
 		})
 		return
 	}
-	cleanUser := model.User{
-		Id:          user.Id,
-		Username:    user.Username,
-		DisplayName: user.DisplayName,
-		Role:        user.Role,
-		Status:      user.Status,
-		Group:       user.Group,
-	}
 	c.JSON(http.StatusOK, gin.H{
 		"message": "",
 		"success": true,
-		"data":    cleanUser,
+		"data": map[string]any{
+			"id":           user.Id,
+			"username":     user.Username,
+			"display_name": user.DisplayName,
+			"role":         user.Role,
+			"status":       user.Status,
+			"group":        user.Group,
+		},
 	})
 }

@@ -764,7 +763,10 @@ func checkUpdatePassword(originalPassword string, newPassword string, userId int
 	if err != nil {
 		return
 	}
-	if !common.ValidatePasswordAndHash(originalPassword, currentUser.Password) {
+
+	// 密码不为空,需要验证原密码
+	// 支持第一次账号绑定时原密码为空的情况
+	if !common.ValidatePasswordAndHash(originalPassword, currentUser.Password) && currentUser.Password != "" {
 		err = fmt.Errorf("原密码错误")
 		return
 	}
--- a/docs/ionet-client.md
+++ b/docs/ionet-client.md
@@ -0,0 +1,7 @@
+Request URL
+https://api.io.solutions/v1/io-cloud/clusters/654fc0a9-0d4a-4db4-9b95-3f56189348a2/update-name
+Request Method
+PUT
+
+{"status":"succeeded","message":"Cluster name updated successfully"}
+
--- a/dto/audio.go
+++ b/dto/audio.go
@@ -2,6 +2,7 @@ package dto

 import (
 	"encoding/json"
+	"strings"

 	"github.com/QuantumNous/new-api/types"

@@ -24,11 +25,14 @@ func (r *AudioRequest) GetTokenCountMeta() *types.TokenCountMeta {
 		CombineText: r.Input,
 		TokenType:   types.TokenTypeTextNumber,
 	}
+	if strings.Contains(r.Model, "gpt") {
+		meta.TokenType = types.TokenTypeTokenizer
+	}
 	return meta
 }

 func (r *AudioRequest) IsStream(c *gin.Context) bool {
-	return false
+	return r.StreamFormat == "sse"
 }

 func (r *AudioRequest) SetModelName(modelName string) {
--- a/dto/error.go
+++ b/dto/error.go
@@ -1,26 +1,32 @@
 package dto

-import "github.com/QuantumNous/new-api/types"
+import (
+	"encoding/json"

-type OpenAIError struct {
-	Message string `json:"message"`
-	Type    string `json:"type"`
-	Param   string `json:"param"`
-	Code    any    `json:"code"`
-}
+	"github.com/QuantumNous/new-api/common"
+	"github.com/QuantumNous/new-api/types"
+)
+
+//type OpenAIError struct {
+//	Message string `json:"message"`
+//	Type    string `json:"type"`
+//	Param   string `json:"param"`
+//	Code    any    `json:"code"`
+//}

 type OpenAIErrorWithStatusCode struct {
-	Error      OpenAIError `json:"error"`
-	StatusCode int         `json:"status_code"`
+	Error      types.OpenAIError `json:"error"`
+	StatusCode int               `json:"status_code"`
 	LocalError bool
 }

 type GeneralErrorResponse struct {
-	Error    types.OpenAIError `json:"error"`
-	Message  string            `json:"message"`
-	Msg      string            `json:"msg"`
-	Err      string            `json:"err"`
-	ErrorMsg string            `json:"error_msg"`
+	Error    json.RawMessage `json:"error"`
+	Message  string          `json:"message"`
+	Msg      string          `json:"msg"`
+	Err      string          `json:"err"`
+	ErrorMsg string          `json:"error_msg"`
+	Metadata json.RawMessage   `json:"metadata,omitempty"`
 	Header   struct {
 		Message string `json:"message"`
 	} `json:"header"`
@@ -31,9 +37,35 @@ type GeneralErrorResponse struct {
 	} `json:"response"`
 }

+func (e GeneralErrorResponse) TryToOpenAIError() *types.OpenAIError {
+	var openAIError types.OpenAIError
+	if len(e.Error) > 0 {
+		err := common.Unmarshal(e.Error, &openAIError)
+		if err == nil && openAIError.Message != "" {
+			return &openAIError
+		}
+	}
+	return nil
+}
+
 func (e GeneralErrorResponse) ToMessage() string {
-	if e.Error.Message != "" {
-		return e.Error.Message
+	if len(e.Error) > 0 {
+		switch common.GetJsonType(e.Error) {
+		case "object":
+			var openAIError types.OpenAIError
+			err := common.Unmarshal(e.Error, &openAIError)
+			if err == nil && openAIError.Message != "" {
+				return openAIError.Message
+			}
+		case "string":
+			var msg string
+			err := common.Unmarshal(e.Error, &msg)
+			if err == nil && msg != "" {
+				return msg
+			}
+		default:
+			return string(e.Error)
+		}
 	}
 	if e.Message != "" {
 		return e.Message
--- a/dto/gemini.go
+++ b/dto/gemini.go
@@ -22,6 +22,27 @@ type GeminiChatRequest struct {
 	CachedContent      string                     `json:"cachedContent,omitempty"`
 }

+// UnmarshalJSON allows GeminiChatRequest to accept both snake_case and camelCase fields.
+func (r *GeminiChatRequest) UnmarshalJSON(data []byte) error {
+	type Alias GeminiChatRequest
+	var aux struct {
+		Alias
+		SystemInstructionSnake *GeminiChatContent `json:"system_instruction,omitempty"`
+	}
+
+	if err := common.Unmarshal(data, &aux); err != nil {
+		return err
+	}
+
+	*r = GeminiChatRequest(aux.Alias)
+
+	if aux.SystemInstructionSnake != nil {
+		r.SystemInstructions = aux.SystemInstructionSnake
+	}
+
+	return nil
+}
+
 type ToolConfig struct {
 	FunctionCallingConfig *FunctionCallingConfig `json:"functionCallingConfig,omitempty"`
 	RetrievalConfig       *RetrievalConfig       `json:"retrievalConfig,omitempty"`
--- a/dto/openai_image.go
+++ b/dto/openai_image.go
@@ -167,9 +167,9 @@ func (i *ImageRequest) SetModelName(modelName string) {
 }

 type ImageResponse struct {
-	Data    []ImageData `json:"data"`
-	Created int64       `json:"created"`
-	Extra   any         `json:"extra,omitempty"`
+	Data     []ImageData     `json:"data"`
+	Created  int64           `json:"created"`
+	Metadata json.RawMessage `json:"metadata,omitempty"`
 }
 type ImageData struct {
 	Url           string `json:"url"`
--- a/dto/openai_request.go
+++ b/dto/openai_request.go
@@ -23,6 +23,8 @@ type FormatJsonSchema struct {
 	Strict      json.RawMessage `json:"strict,omitempty"`
 }

+// GeneralOpenAIRequest represents a general request structure for OpenAI-compatible APIs.
+// 参数增加规范：无引用的参数必须使用json.RawMessage类型，并添加omitempty标签
 type GeneralOpenAIRequest struct {
 	Model               string            `json:"model,omitempty"`
 	Messages            []Message         `json:"messages,omitempty"`
@@ -82,8 +84,9 @@ type GeneralOpenAIRequest struct {
 	Reasoning json.RawMessage `json:"reasoning,omitempty"`
 	// Ali Qwen Params
 	VlHighResolutionImages json.RawMessage `json:"vl_high_resolution_images,omitempty"`
-	EnableThinking         any             `json:"enable_thinking,omitempty"`
+	EnableThinking         json.RawMessage `json:"enable_thinking,omitempty"`
 	ChatTemplateKwargs     json.RawMessage `json:"chat_template_kwargs,omitempty"`
+	EnableSearch           json.RawMessage `json:"enable_search,omitempty"`
 	// ollama Params
 	Think json.RawMessage `json:"think,omitempty"`
 	// baidu v2
--- a/go.mod
+++ b/go.mod
@@ -27,6 +27,7 @@ require (
 	github.com/golang-jwt/jwt/v5 v5.3.0
 	github.com/google/uuid v1.6.0
 	github.com/gorilla/websocket v1.5.0
+	github.com/grafana/pyroscope-go v1.2.7
 	github.com/jfreymuth/oggvorbis v1.0.5
 	github.com/jinzhu/copier v0.4.0
 	github.com/joho/godotenv v1.5.1
@@ -36,6 +37,7 @@ require (
 	github.com/samber/lo v1.52.0
 	github.com/shirou/gopsutil v3.21.11+incompatible
 	github.com/shopspring/decimal v1.4.0
+	github.com/stretchr/testify v1.11.1
 	github.com/stripe/stripe-go/v81 v81.4.0
 	github.com/tcolgate/mp3 v0.0.0-20170426193717-e79c5a46d300
 	github.com/thanhpk/randstr v1.0.6
@@ -62,6 +64,7 @@ require (
 	github.com/bytedance/sonic/loader v0.3.0 // indirect
 	github.com/cespare/xxhash/v2 v2.3.0 // indirect
 	github.com/cloudwego/base64x v0.1.6 // indirect
+	github.com/davecgh/go-spew v1.1.1 // indirect
 	github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect
 	github.com/dlclark/regexp2 v1.11.5 // indirect
 	github.com/dustin/go-humanize v1.0.1 // indirect
@@ -77,11 +80,11 @@ require (
 	github.com/go-sql-driver/mysql v1.7.0 // indirect
 	github.com/go-webauthn/x v0.1.25 // indirect
 	github.com/goccy/go-json v0.10.2 // indirect
-	github.com/google/go-cmp v0.6.0 // indirect
 	github.com/google/go-tpm v0.9.5 // indirect
 	github.com/gorilla/context v1.1.1 // indirect
 	github.com/gorilla/securecookie v1.1.1 // indirect
 	github.com/gorilla/sessions v1.2.1 // indirect
+	github.com/grafana/pyroscope-go/godeltaprof v0.1.9 // indirect
 	github.com/icza/bitio v1.1.0 // indirect
 	github.com/jackc/pgpassfile v1.0.0 // indirect
 	github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect
@@ -91,6 +94,7 @@ require (
 	github.com/jinzhu/inflection v1.0.0 // indirect
 	github.com/jinzhu/now v1.1.5 // indirect
 	github.com/json-iterator/go v1.1.12 // indirect
+	github.com/klauspost/compress v1.17.8 // indirect
 	github.com/klauspost/cpuid/v2 v2.3.0 // indirect
 	github.com/leodido/go-urn v1.4.0 // indirect
 	github.com/mattn/go-isatty v0.0.20 // indirect
@@ -101,7 +105,9 @@ require (
 	github.com/modern-go/reflect2 v1.0.2 // indirect
 	github.com/ncruces/go-strftime v0.1.9 // indirect
 	github.com/pelletier/go-toml/v2 v2.2.1 // indirect
+	github.com/pmezard/go-difflib v1.0.0 // indirect
 	github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
+	github.com/stretchr/objx v0.5.2 // indirect
 	github.com/tidwall/match v1.1.1 // indirect
 	github.com/tidwall/pretty v1.2.0 // indirect
 	github.com/tklauser/go-sysconf v0.3.12 // indirect
--- a/go.sum
+++ b/go.sum
@@ -118,9 +118,8 @@ github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeN
 github.com/google/go-tpm v0.9.5 h1:ocUmnDebX54dnW+MQWGQRbdaAcJELsa6PqZhJ48KwVU=
 github.com/google/go-tpm v0.9.5/go.mod h1:h9jEsEECg7gtLis0upRBQU+GhYVH6jMjrFxI8u6bVUY=
 github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
-github.com/google/pprof v0.0.0-20221118152302-e6195bd50e26 h1:Xim43kblpZXfIBQsbuBVKCudVG457BR2GZFIz3uw3hQ=
-github.com/google/pprof v0.0.0-20221118152302-e6195bd50e26/go.mod h1:dDKJzRmX4S37WGHujM7tX//fmj1uioxKzKxz3lo4HJo=
 github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e h1:ijClszYn+mADRFY17kjQEVQ1XRhq2/JR1M3sGqeJoxs=
+github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e/go.mod h1:boTsfXsheKC2y+lKOCMpSfarhxDeIzfZG1jqGcPl3cA=
 github.com/google/uuid v1.1.2/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
 github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
 github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
@@ -132,6 +131,10 @@ github.com/gorilla/sessions v1.2.1 h1:DHd3rPN5lE3Ts3D8rKkQ8x/0kqfeNmBAaiSi+o7Fsg
 github.com/gorilla/sessions v1.2.1/go.mod h1:dk2InVEVJ0sfLlnXv9EAgkf6ecYs/i80K/zI+bUmuGM=
 github.com/gorilla/websocket v1.5.0 h1:PPwGk2jz7EePpoHN/+ClbZu8SPxiqlu12wZP/3sWmnc=
 github.com/gorilla/websocket v1.5.0/go.mod h1:YR8l580nyteQvAITg2hZ9XVh4b55+EU/adAjf1fMHhE=
+github.com/grafana/pyroscope-go v1.2.7 h1:VWBBlqxjyR0Cwk2W6UrE8CdcdD80GOFNutj0Kb1T8ac=
+github.com/grafana/pyroscope-go v1.2.7/go.mod h1:o/bpSLiJYYP6HQtvcoVKiE9s5RiNgjYTj1DhiddP2Pc=
+github.com/grafana/pyroscope-go/godeltaprof v0.1.9 h1:c1Us8i6eSmkW+Ez05d3co8kasnuOY813tbMN8i/a3Og=
+github.com/grafana/pyroscope-go/godeltaprof v0.1.9/go.mod h1:2+l7K7twW49Ct4wFluZD3tZ6e0SjanjcUUBPVD/UuGU=
 github.com/icza/bitio v1.1.0 h1:ysX4vtldjdi3Ygai5m1cWy4oLkhWTAi+SyO6HC8L9T0=
 github.com/icza/bitio v1.1.0/go.mod h1:0jGnlLAx8MKMr9VGnn/4YrvZiprkvBelsVIbA9Jjr9A=
 github.com/icza/mighty v0.0.0-20180919140131-cfd07d671de6 h1:8UsGZ2rr2ksmEru6lToqnXgA8Mz1DP11X4zSJ159C3k=
@@ -160,12 +163,15 @@ github.com/joho/godotenv v1.5.1/go.mod h1:f4LDr5Voq0i2e/R5DDNOoa2zzDfwtkZa6DnEwA
 github.com/json-iterator/go v1.1.9/go.mod h1:KdQUCv79m/52Kvf8AW2vK1V8akMuk1QjK/uOdHXbAo4=
 github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnrnM=
 github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo=
+github.com/klauspost/compress v1.17.8 h1:YcnTYrq7MikUT7k0Yb5eceMmALQPYBW/Xltxn0NAMnU=
+github.com/klauspost/compress v1.17.8/go.mod h1:Di0epgTjJY877eYKx5yC51cX2A2Vl2ibi7bDH9ttBbw=
 github.com/klauspost/cpuid/v2 v2.3.0 h1:S4CRMLnYUhGeDFDqkGriYKdfoFlDnMtqTiI/sFzhA9Y=
 github.com/klauspost/cpuid/v2 v2.3.0/go.mod h1:hqwkgyIinND0mEev00jJYCxPNVRVXFQeu1XKlok6oO0=
 github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo=
 github.com/kr/pretty v0.2.1/go.mod h1:ipq/a2n7PKx3OHsz4KJII5eveXtPO4qwEXGdVfWzfnI=
-github.com/kr/pretty v0.3.0 h1:WgNl7dwNpEZ6jJ9k1snq4pZsg7DOEN8hP9Xw0Tsjwk0=
 github.com/kr/pretty v0.3.0/go.mod h1:640gp4NfQd8pI5XOwp5fnNeVWj67G7CFk/SaSQn7NBk=
+github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE=
+github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk=
 github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
 github.com/kr/pty v1.1.8/go.mod h1:O1sed60cT9XZ5uDucP5qwvh+TE3NnUj51EiZO/lmSfw=
 github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI=
@@ -214,14 +220,11 @@ github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZb
 github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
 github.com/pquerna/otp v1.5.0 h1:NMMR+WrmaqXU4EzdGJEE1aUUI0AMRzsp96fFFWNPwxs=
 github.com/pquerna/otp v1.5.0/go.mod h1:dkJfzwRKNiegxyNb54X/3fLwhCynbMspSyWKnvi1AEg=
-github.com/remyoudompheng/bigfft v0.0.0-20200410134404-eec4a21b6bb0/go.mod h1:qqbHyh8v60DhA7CoWK5oRCqLrMHRGoxYCSS9EjAz6Eo=
 github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec h1:W09IVJc94icq4NjY3clb7Lk8O1qJ8BdBEF8z0ibU0rE=
 github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec/go.mod h1:qqbHyh8v60DhA7CoWK5oRCqLrMHRGoxYCSS9EjAz6Eo=
 github.com/rogpeppe/go-internal v1.6.1/go.mod h1:xXDCJY+GAPziupqXw64V24skbSoqbTEfhy4qGm1nDQc=
 github.com/rogpeppe/go-internal v1.8.0 h1:FCbCCtXNOY3UtUuHUYaghJg4y7Fd14rXifAYUAtL9R8=
 github.com/rogpeppe/go-internal v1.8.0/go.mod h1:WmiCO8CzOY8rg0OYDC4/i/2WRWAB6poM+XZ2dLUbcbE=
-github.com/samber/lo v1.39.0 h1:4gTz1wUhNYLhFSKl6O+8peW0v2F4BCY034GRpU9WnuA=
-github.com/samber/lo v1.39.0/go.mod h1:+m/ZKRl6ClXCE2Lgf3MsQlWfh4bn1bz6CXEOxnEXnEA=
 github.com/samber/lo v1.52.0 h1:Rvi+3BFHES3A8meP33VPAxiBZX/Aws5RxrschYGjomw=
 github.com/samber/lo v1.52.0/go.mod h1:4+MXEGsJzbKGaUEQFKBq2xtfuznW9oz/WrgyzMzRoM0=
 github.com/shirou/gopsutil v3.21.11+incompatible h1:+1+c1VGhc88SSonWP6foOcLhvnKlUeu/erjjvaPEYiI=
@@ -231,6 +234,7 @@ github.com/shopspring/decimal v1.4.0/go.mod h1:gawqmDU56v4yIKSwfBSFip1HdCCXN8/+D
 github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
 github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw=
 github.com/stretchr/objx v0.5.0/go.mod h1:Yh+to48EsGEfYuaHDzXPcE3xhTkx73EhmCGUpEOglKo=
+github.com/stretchr/objx v0.5.2 h1:xuMeJ0Sdp5ZMRXx/aWO6RZxdr3beISkG5/G/aIRr3pY=
 github.com/stretchr/objx v0.5.2/go.mod h1:FRsXN1f5AsAjCGJKqEizvkpNtU+EGNCLh3NxZ/8L+MA=
 github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
 github.com/stretchr/testify v1.4.0/go.mod h1:j7eGeouHqKxXV5pUuKE4zz7dFj8WfuZ+81PSLYec5m4=
@@ -288,12 +292,12 @@ golang.org/x/arch v0.21.0/go.mod h1:dNHoOeKiyja7GTvF9NJS1l3Z2yntpQNzgrjh1cU103A=
 golang.org/x/crypto v0.0.0-20210711020723-a769d52b0f97/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
 golang.org/x/crypto v0.45.0 h1:jMBrvKuj23MTlT0bQEOBcAE0mjg8mK9RXFhRH6nyF3Q=
 golang.org/x/crypto v0.45.0/go.mod h1:XTGrrkGJve7CYK7J8PEww4aY7gM3qMCElcJQ8n8JdX4=
-golang.org/x/exp v0.0.0-20240404231335-c0f41cb1a7a0 h1:985EYyeCOxTpcgOTJpflJUwOeEz0CQOdPt73OzpE9F8=
-golang.org/x/exp v0.0.0-20240404231335-c0f41cb1a7a0/go.mod h1:/lliqkxwWAhPjf5oSOIJup2XcqJaw8RGS6k3TGEc7GI=
 golang.org/x/exp v0.0.0-20250620022241-b7579e27df2b h1:M2rDM6z3Fhozi9O7NWsxAkg/yqS/lQJ6PmkyIV3YP+o=
 golang.org/x/exp v0.0.0-20250620022241-b7579e27df2b/go.mod h1:3//PLf8L/X+8b4vuAfHzxeRUl04Adcb341+IGKfnqS8=
 golang.org/x/image v0.23.0 h1:HseQ7c2OpPKTPVzNjG5fwJsOTCiiwS4QdsYi5XU6H68=
 golang.org/x/image v0.23.0/go.mod h1:wJJBTdLfCCf3tiHa1fNxpZmUI4mmoZvwMCPP0ddoNKY=
+golang.org/x/mod v0.29.0 h1:HV8lRxZC4l2cr3Zq1LvtOsi/ThTgWnUk/y64QSs8GwA=
+golang.org/x/mod v0.29.0/go.mod h1:NyhrlYXJ2H4eJiRy/WDBO6HMqZQ6q9nk4JzS3NuCK+w=
 golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
 golang.org/x/net v0.0.0-20210520170846-37e1c6afe023/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y=
 golang.org/x/net v0.47.0 h1:Mx+4dIFzqraBXUugkia1OOvlD6LemFo1ALMHjrXDOhY=
@@ -321,6 +325,8 @@ golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
 golang.org/x/text v0.31.0 h1:aC8ghyu4JhP8VojJ2lEHBnochRno1sgL6nEi9WGFGMM=
 golang.org/x/text v0.31.0/go.mod h1:tKRAlv61yKIjGGHX/4tP1LTbc13YSec1pxVEWXzfoeM=
 golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
+golang.org/x/tools v0.38.0 h1:Hx2Xv8hISq8Lm16jvBZ2VQf+RLmbd7wVUsALibYI/IQ=
+golang.org/x/tools v0.38.0/go.mod h1:yEsQ/d/YK8cjh0L6rZlY8tgtlKiBNTL14pGDJPJpYQs=
 golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
 google.golang.org/protobuf v1.26.0-rc.1/go.mod h1:jlhhOSvTdKEhbULTjvd4ARK9grFBp09yW+WbY/TyQbw=
 google.golang.org/protobuf v1.28.0/go.mod h1:HV8QOd/L58Z+nl8r43ehVNZIU/HEI6OcFqwMG9pJV4I=
@@ -350,19 +356,29 @@ gorm.io/driver/postgres v1.5.2/go.mod h1:fmpX0m2I1PKuR7mKZiEluwrP3hbs+ps7JIGMUBp
 gorm.io/gorm v1.23.8/go.mod h1:l2lP/RyAtc1ynaTjFksBde/O8v9oOGIApu2/xRitmZk=
 gorm.io/gorm v1.25.2 h1:gs1o6Vsa+oVKG/a9ElL3XgyGfghFfkKA2SInQaCyMho=
 gorm.io/gorm v1.25.2/go.mod h1:L4uxeKpfBml98NYqVqwAdmV1a2nBtAec/cf3fpucW/k=
-modernc.org/libc v1.22.5 h1:91BNch/e5B0uPbJFgqbxXuOnxBQjlS//icfQEGmvyjE=
-modernc.org/libc v1.22.5/go.mod h1:jj+Z7dTNX8fBScMVNRAYZ/jF91K8fdT2hYMThc3YjBY=
+modernc.org/cc/v4 v4.26.5 h1:xM3bX7Mve6G8K8b+T11ReenJOT+BmVqQj0FY5T4+5Y4=
+modernc.org/cc/v4 v4.26.5/go.mod h1:uVtb5OGqUKpoLWhqwNQo/8LwvoiEBLvZXIQ/SmO6mL0=
+modernc.org/ccgo/v4 v4.28.1 h1:wPKYn5EC/mYTqBO373jKjvX2n+3+aK7+sICCv4Fjy1A=
+modernc.org/ccgo/v4 v4.28.1/go.mod h1:uD+4RnfrVgE6ec9NGguUNdhqzNIeeomeXf6CL0GTE5Q=
+modernc.org/fileutil v1.3.40 h1:ZGMswMNc9JOCrcrakF1HrvmergNLAmxOPjizirpfqBA=
+modernc.org/fileutil v1.3.40/go.mod h1:HxmghZSZVAz/LXcMNwZPA/DRrQZEVP9VX0V4LQGQFOc=
+modernc.org/gc/v2 v2.6.5 h1:nyqdV8q46KvTpZlsw66kWqwXRHdjIlJOhG6kxiV/9xI=
+modernc.org/gc/v2 v2.6.5/go.mod h1:YgIahr1ypgfe7chRuJi2gD7DBQiKSLMPgBQe9oIiito=
+modernc.org/goabi0 v0.2.0 h1:HvEowk7LxcPd0eq6mVOAEMai46V+i7Jrj13t4AzuNks=
+modernc.org/goabi0 v0.2.0/go.mod h1:CEFRnnJhKvWT1c1JTI3Avm+tgOWbkOu5oPA8eH8LnMI=
 modernc.org/libc v1.66.10 h1:yZkb3YeLx4oynyR+iUsXsybsX4Ubx7MQlSYEw4yj59A=
 modernc.org/libc v1.66.10/go.mod h1:8vGSEwvoUoltr4dlywvHqjtAqHBaw0j1jI7iFBTAr2I=
-modernc.org/mathutil v1.5.0 h1:rV0Ko/6SfM+8G+yKiyI830l3Wuz1zRutdslNoQ0kfiQ=
-modernc.org/mathutil v1.5.0/go.mod h1:mZW8CKdRPY1v87qxC/wUdX5O1qDzXMP5TH3wjfpga6E=
 modernc.org/mathutil v1.7.1 h1:GCZVGXdaN8gTqB1Mf/usp1Y/hSqgI2vAGGP4jZMCxOU=
 modernc.org/mathutil v1.7.1/go.mod h1:4p5IwJITfppl0G4sUEDtCr4DthTaT47/N3aT6MhfgJg=
-modernc.org/memory v1.5.0 h1:N+/8c5rE6EqugZwHii4IFsaJ7MUhoWX07J5tC/iI5Ds=
-modernc.org/memory v1.5.0/go.mod h1:PkUhL0Mugw21sHPeskwZW4D6VscE/GQJOnIpCnW6pSU=
 modernc.org/memory v1.11.0 h1:o4QC8aMQzmcwCK3t3Ux/ZHmwFPzE6hf2Y5LbkRs+hbI=
 modernc.org/memory v1.11.0/go.mod h1:/JP4VbVC+K5sU2wZi9bHoq2MAkCnrt2r98UGeSK7Mjw=
-modernc.org/sqlite v1.23.1 h1:nrSBg4aRQQwq59JpvGEQ15tNxoO5pX/kUjcRNwSAGQM=
-modernc.org/sqlite v1.23.1/go.mod h1:OrDj17Mggn6MhE+iPbBNf7RGKODDE9NFT0f3EwDzJqk=
+modernc.org/opt v0.1.4 h1:2kNGMRiUjrp4LcaPuLY2PzUfqM/w9N23quVwhKt5Qm8=
+modernc.org/opt v0.1.4/go.mod h1:03fq9lsNfvkYSfxrfUhZCWPk1lm4cq4N+Bh//bEtgns=
+modernc.org/sortutil v1.2.1 h1:+xyoGf15mM3NMlPDnFqrteY07klSFxLElE2PVuWIJ7w=
+modernc.org/sortutil v1.2.1/go.mod h1:7ZI3a3REbai7gzCLcotuw9AC4VZVpYMjDzETGsSMqJE=
 modernc.org/sqlite v1.40.1 h1:VfuXcxcUWWKRBuP8+BR9L7VnmusMgBNNnBYGEe9w/iY=
 modernc.org/sqlite v1.40.1/go.mod h1:9fjQZ0mB1LLP0GYrp39oOJXx/I2sxEnZtzCmEQIKvGE=
+modernc.org/strutil v1.2.1 h1:UneZBkQA+DX2Rp35KcM69cSsNES9ly8mQWD71HKlOA0=
+modernc.org/strutil v1.2.1/go.mod h1:EHkiggD70koQxjVdSBM3JKM7k6L0FbGE5eymy9i3B9A=
+modernc.org/token v1.1.0 h1:Xl7Ap9dKaEs5kLoOQeQmPWevfnk/DM5qcLcYlA8ys6Y=
+modernc.org/token v1.1.0/go.mod h1:UGzOrNV1mAFSEB63lOFHIpNRUVMvYTc6yu1SMY/XTDM=
--- a/main.go
+++ b/main.go
@@ -124,6 +124,11 @@ func main() {
 		common.SysLog("pprof enabled")
 	}

+	err = common.StartPyroScope()
+	if err != nil {
+		common.SysError(fmt.Sprintf("start pyroscope error : %v", err))
+	}
+
 	// Initialize HTTP server
 	server := gin.New()
 	server.Use(gin.CustomRecovery(func(c *gin.Context, err any) {
@@ -183,6 +188,7 @@ func InjectUmamiAnalytics() {
 		analyticsInjectBuilder.WriteString(umamiSiteID)
 		analyticsInjectBuilder.WriteString("\"></script>")
 	}
+	analyticsInjectBuilder.WriteString("<!--Umami QuantumNous-->\n")
 	analyticsInject := analyticsInjectBuilder.String()
 	indexPage = bytes.ReplaceAll(indexPage, []byte("<!--umami-->\n"), []byte(analyticsInject))
 }
@@ -204,6 +210,7 @@ func InjectGoogleAnalytics() {
 		analyticsInjectBuilder.WriteString("');")
 		analyticsInjectBuilder.WriteString("</script>")
 	}
+	analyticsInjectBuilder.WriteString("<!--Google Analytics QuantumNous-->\n")
 	analyticsInject := analyticsInjectBuilder.String()
 	indexPage = bytes.ReplaceAll(indexPage, []byte("<!--Google Analytics-->\n"), []byte(analyticsInject))
 }
--- a/middleware/auth.go
+++ b/middleware/auth.go
@@ -2,12 +2,14 @@ package middleware

 import (
 	"fmt"
+	"net"
 	"net/http"
 	"strconv"
 	"strings"

 	"github.com/QuantumNous/new-api/common"
 	"github.com/QuantumNous/new-api/constant"
+	"github.com/QuantumNous/new-api/logger"
 	"github.com/QuantumNous/new-api/model"
 	"github.com/QuantumNous/new-api/service"
 	"github.com/QuantumNous/new-api/setting/ratio_setting"
@@ -216,10 +218,14 @@ func TokenAuth() func(c *gin.Context) {
 		}
 		key := c.Request.Header.Get("Authorization")
 		parts := make([]string, 0)
-		key = strings.TrimPrefix(key, "Bearer ")
+		if strings.HasPrefix(key, "Bearer ") || strings.HasPrefix(key, "bearer ") {
+			key = strings.TrimSpace(key[7:])
+		}
 		if key == "" || key == "midjourney-proxy" {
 			key = c.Request.Header.Get("mj-api-secret")
-			key = strings.TrimPrefix(key, "Bearer ")
+			if strings.HasPrefix(key, "Bearer ") || strings.HasPrefix(key, "bearer ") {
+				key = strings.TrimSpace(key[7:])
+			}
 			key = strings.TrimPrefix(key, "sk-")
 			parts = strings.Split(key, "-")
 			key = parts[0]
@@ -240,13 +246,20 @@ func TokenAuth() func(c *gin.Context) {
 			return
 		}

-		allowIpsMap := token.GetIpLimitsMap()
-		if len(allowIpsMap) != 0 {
+		allowIps := token.GetIpLimits()
+		if len(allowIps) > 0 {
 			clientIp := c.ClientIP()
-			if _, ok := allowIpsMap[clientIp]; !ok {
+			logger.LogDebug(c, "Token has IP restrictions, checking client IP %s", clientIp)
+			ip := net.ParseIP(clientIp)
+			if ip == nil {
+				abortWithOpenAiMessage(c, http.StatusForbidden, "无法解析客户端 IP 地址")
+				return
+			}
+			if common.IsIpInCIDRList(ip, allowIps) == false {
 				abortWithOpenAiMessage(c, http.StatusForbidden, "您的 IP 不在令牌允许访问的列表中")
 				return
 			}
+			logger.LogDebug(c, "Client IP %s passed the token IP restrictions check", clientIp)
 		}

 		userCache, err := model.GetUserCache(token.UserId)
@@ -307,8 +320,8 @@ func SetupContextForToken(c *gin.Context, token *model.Token, parts ...string) e
 	} else {
 		c.Set("token_model_limit_enabled", false)
 	}
-	c.Set("token_group", token.Group)
-	c.Set("token_cross_group_retry", token.CrossGroupRetry)
+	common.SetContextKey(c, constant.ContextKeyTokenGroup, token.Group)
+	common.SetContextKey(c, constant.ContextKeyTokenCrossGroupRetry, token.CrossGroupRetry)
 	if len(parts) > 1 {
 		if model.IsAdmin(token.UserId) {
 			c.Set("specific_channel_id", parts[1])
--- a/middleware/distributor.go
+++ b/middleware/distributor.go
@@ -97,7 +97,12 @@ func Distribute() func(c *gin.Context) {
 						common.SetContextKey(c, constant.ContextKeyUsingGroup, usingGroup)
 					}
 				}
-				channel, selectGroup, err = service.CacheGetRandomSatisfiedChannel(c, usingGroup, modelRequest.Model, 0)
+				channel, selectGroup, err = service.CacheGetRandomSatisfiedChannel(&service.RetryParam{
+					Ctx:        c,
+					ModelName:  modelRequest.Model,
+					TokenGroup: usingGroup,
+					Retry:      common.GetPointer(0),
+				})
 				if err != nil {
 					showGroup := usingGroup
 					if usingGroup == "auto" {
@@ -157,7 +162,7 @@ func getModelRequest(c *gin.Context) (*ModelRequest, bool, error) {
 			}
 			midjourneyModel, mjErr, success := service.GetMjRequestModel(relayMode, &midjourneyRequest)
 			if mjErr != nil {
-				return nil, false, fmt.Errorf(mjErr.Description)
+				return nil, false, fmt.Errorf("%s", mjErr.Description)
 			}
 			if midjourneyModel == "" {
 				if !success {
--- a/middleware/gzip.go
+++ b/middleware/gzip.go
@@ -5,32 +5,69 @@ import (
 	"io"
 	"net/http"

+	"github.com/QuantumNous/new-api/constant"
 	"github.com/andybalholm/brotli"
 	"github.com/gin-gonic/gin"
 )

+type readCloser struct {
+	io.Reader
+	closeFn func() error
+}
+
+func (rc *readCloser) Close() error {
+	if rc.closeFn != nil {
+		return rc.closeFn()
+	}
+	return nil
+}
+
 func DecompressRequestMiddleware() gin.HandlerFunc {
 	return func(c *gin.Context) {
 		if c.Request.Body == nil || c.Request.Method == http.MethodGet {
 			c.Next()
 			return
 		}
+		maxMB := constant.MaxRequestBodyMB
+		if maxMB <= 0 {
+			maxMB = 32
+		}
+		maxBytes := int64(maxMB) << 20
+
+		origBody := c.Request.Body
+		wrapMaxBytes := func(body io.ReadCloser) io.ReadCloser {
+			return http.MaxBytesReader(c.Writer, body, maxBytes)
+		}
+
 		switch c.GetHeader("Content-Encoding") {
 		case "gzip":
-			gzipReader, err := gzip.NewReader(c.Request.Body)
+			gzipReader, err := gzip.NewReader(origBody)
 			if err != nil {
+				_ = origBody.Close()
 				c.AbortWithStatus(http.StatusBadRequest)
 				return
 			}
-			defer gzipReader.Close()
-
-			// Replace the request body with the decompressed data
-			c.Request.Body = io.NopCloser(gzipReader)
+			// Replace the request body with the decompressed data, and enforce a max size (post-decompression).
+			c.Request.Body = wrapMaxBytes(&readCloser{
+				Reader: gzipReader,
+				closeFn: func() error {
+					_ = gzipReader.Close()
+					return origBody.Close()
+				},
+			})
 			c.Request.Header.Del("Content-Encoding")
 		case "br":
-			reader := brotli.NewReader(c.Request.Body)
-			c.Request.Body = io.NopCloser(reader)
+			reader := brotli.NewReader(origBody)
+			c.Request.Body = wrapMaxBytes(&readCloser{
+				Reader: reader,
+				closeFn: func() error {
+					return origBody.Close()
+				},
+			})
 			c.Request.Header.Del("Content-Encoding")
+		default:
+			// Even for uncompressed bodies, enforce a max size to avoid huge request allocations.
+			c.Request.Body = wrapMaxBytes(origBody)
 		}

 		// Continue processing the request
--- a/model/channel.go
+++ b/model/channel.go
@@ -254,6 +254,9 @@ func (channel *Channel) Save() error {
 }

 func (channel *Channel) SaveWithoutKey() error {
+	if channel.Id == 0 {
+		return errors.New("channel ID is 0")
+	}
 	return DB.Omit("key").Save(channel).Error
 }

--- a/model/main.go
+++ b/model/main.go
@@ -248,26 +248,26 @@ func InitLogDB() (err error) {
 }

 func migrateDB() error {
-	err := DB.AutoMigrate(
-		&Channel{},
-		&Token{},
-		&User{},
-		&PasskeyCredential{},
+    err := DB.AutoMigrate(
+        &Channel{},
+        &Token{},
+        &User{},
+        &PasskeyCredential{},
 		&Option{},
-		&Redemption{},
-		&Ability{},
-		&Log{},
-		&Midjourney{},
-		&TopUp{},
-		&QuotaData{},
-		&Task{},
-		&Model{},
-		&Vendor{},
-		&PrefillGroup{},
-		&Setup{},
-		&TwoFA{},
-		&TwoFABackupCode{},
-	)
+        &Redemption{},
+        &Ability{},
+        &Log{},
+        &Midjourney{},
+        &TopUp{},
+        &QuotaData{},
+        &Task{},
+        &Model{},
+        &Vendor{},
+        &PrefillGroup{},
+        &Setup{},
+        &TwoFA{},
+        &TwoFABackupCode{},
+    )
 	if err != nil {
 		return err
 	}
@@ -278,29 +278,29 @@ func migrateDBFast() error {

 	var wg sync.WaitGroup

-	migrations := []struct {
-		model interface{}
-		name  string
-	}{
-		{&Channel{}, "Channel"},
-		{&Token{}, "Token"},
-		{&User{}, "User"},
-		{&PasskeyCredential{}, "PasskeyCredential"},
+    migrations := []struct {
+        model interface{}
+        name  string
+    }{
+        {&Channel{}, "Channel"},
+        {&Token{}, "Token"},
+        {&User{}, "User"},
+        {&PasskeyCredential{}, "PasskeyCredential"},
 		{&Option{}, "Option"},
-		{&Redemption{}, "Redemption"},
-		{&Ability{}, "Ability"},
-		{&Log{}, "Log"},
-		{&Midjourney{}, "Midjourney"},
-		{&TopUp{}, "TopUp"},
-		{&QuotaData{}, "QuotaData"},
-		{&Task{}, "Task"},
-		{&Model{}, "Model"},
-		{&Vendor{}, "Vendor"},
-		{&PrefillGroup{}, "PrefillGroup"},
-		{&Setup{}, "Setup"},
-		{&TwoFA{}, "TwoFA"},
-		{&TwoFABackupCode{}, "TwoFABackupCode"},
-	}
+        {&Redemption{}, "Redemption"},
+        {&Ability{}, "Ability"},
+        {&Log{}, "Log"},
+        {&Midjourney{}, "Midjourney"},
+        {&TopUp{}, "TopUp"},
+        {&QuotaData{}, "QuotaData"},
+        {&Task{}, "Task"},
+        {&Model{}, "Model"},
+        {&Vendor{}, "Vendor"},
+        {&PrefillGroup{}, "PrefillGroup"},
+        {&Setup{}, "Setup"},
+        {&TwoFA{}, "TwoFA"},
+        {&TwoFABackupCode{}, "TwoFABackupCode"},
+    }
 	// 动态计算migration数量，确保errChan缓冲区足够大
 	errChan := make(chan error, len(migrations))

--- a/model/token.go
+++ b/model/token.go
@@ -6,7 +6,6 @@ import (
 	"strings"

 	"github.com/QuantumNous/new-api/common"
-
 	"github.com/bytedance/gopkg/util/gopool"
 	"gorm.io/gorm"
 )
@@ -35,26 +34,26 @@ func (token *Token) Clean() {
 	token.Key = ""
 }

-func (token *Token) GetIpLimitsMap() map[string]any {
+func (token *Token) GetIpLimits() []string {
 	// delete empty spaces
 	//split with \n
-	ipLimitsMap := make(map[string]any)
+	ipLimits := make([]string, 0)
 	if token.AllowIps == nil {
-		return ipLimitsMap
+		return ipLimits
 	}
 	cleanIps := strings.ReplaceAll(*token.AllowIps, " ", "")
 	if cleanIps == "" {
-		return ipLimitsMap
+		return ipLimits
 	}
 	ips := strings.Split(cleanIps, "\n")
 	for _, ip := range ips {
 		ip = strings.TrimSpace(ip)
 		ip = strings.ReplaceAll(ip, ",", "")
-		if common.IsIP(ip) {
-			ipLimitsMap[ip] = true
+		if ip != "" {
+			ipLimits = append(ipLimits, ip)
 		}
 	}
-	return ipLimitsMap
+	return ipLimits
 }

 func GetAllUserTokens(userId int, startIdx int, num int) ([]*Token, error) {
@@ -113,7 +112,12 @@ func ValidateUserToken(key string) (token *Token, err error) {
 		}
 		return token, nil
 	}
-	return nil, errors.New("无效的令牌")
+	common.SysLog("ValidateUserToken: failed to get token: " + err.Error())
+	if errors.Is(err, gorm.ErrRecordNotFound) {
+		return nil, errors.New("无效的令牌")
+	} else {
+		return nil, errors.New("无效的令牌，数据库查询出错，请联系管理员")
+	}
 }

 func GetTokenByIds(id int, userId int) (*Token, error) {
--- a/pkg/ionet/client.go
+++ b/pkg/ionet/client.go
@@ -0,0 +1,219 @@
+package ionet
+
+import (
+	"bytes"
+	"encoding/json"
+	"fmt"
+	"net/http"
+	"net/url"
+	"strconv"
+	"time"
+)
+
+const (
+	DefaultEnterpriseBaseURL = "https://api.io.solutions/enterprise/v1/io-cloud/caas"
+	DefaultBaseURL           = "https://api.io.solutions/v1/io-cloud/caas"
+	DefaultTimeout           = 30 * time.Second
+)
+
+// DefaultHTTPClient is the default HTTP client implementation
+type DefaultHTTPClient struct {
+	client *http.Client
+}
+
+// NewDefaultHTTPClient creates a new default HTTP client
+func NewDefaultHTTPClient(timeout time.Duration) *DefaultHTTPClient {
+	return &DefaultHTTPClient{
+		client: &http.Client{
+			Timeout: timeout,
+		},
+	}
+}
+
+// Do executes an HTTP request
+func (c *DefaultHTTPClient) Do(req *HTTPRequest) (*HTTPResponse, error) {
+	httpReq, err := http.NewRequest(req.Method, req.URL, bytes.NewReader(req.Body))
+	if err != nil {
+		return nil, fmt.Errorf("failed to create HTTP request: %w", err)
+	}
+
+	// Set headers
+	for key, value := range req.Headers {
+		httpReq.Header.Set(key, value)
+	}
+
+	resp, err := c.client.Do(httpReq)
+	if err != nil {
+		return nil, fmt.Errorf("HTTP request failed: %w", err)
+	}
+	defer resp.Body.Close()
+
+	// Read response body
+	var body bytes.Buffer
+	_, err = body.ReadFrom(resp.Body)
+	if err != nil {
+		return nil, fmt.Errorf("failed to read response body: %w", err)
+	}
+
+	// Convert headers
+	headers := make(map[string]string)
+	for key, values := range resp.Header {
+		if len(values) > 0 {
+			headers[key] = values[0]
+		}
+	}
+
+	return &HTTPResponse{
+		StatusCode: resp.StatusCode,
+		Headers:    headers,
+		Body:       body.Bytes(),
+	}, nil
+}
+
+// NewEnterpriseClient creates a new IO.NET API client targeting the enterprise API base URL.
+func NewEnterpriseClient(apiKey string) *Client {
+	return NewClientWithConfig(apiKey, DefaultEnterpriseBaseURL, nil)
+}
+
+// NewClient creates a new IO.NET API client targeting the public API base URL.
+func NewClient(apiKey string) *Client {
+	return NewClientWithConfig(apiKey, DefaultBaseURL, nil)
+}
+
+// NewClientWithConfig creates a new IO.NET API client with custom configuration
+func NewClientWithConfig(apiKey, baseURL string, httpClient HTTPClient) *Client {
+	if baseURL == "" {
+		baseURL = DefaultBaseURL
+	}
+	if httpClient == nil {
+		httpClient = NewDefaultHTTPClient(DefaultTimeout)
+	}
+	return &Client{
+		BaseURL:    baseURL,
+		APIKey:     apiKey,
+		HTTPClient: httpClient,
+	}
+}
+
+// makeRequest performs an HTTP request and handles common response processing
+func (c *Client) makeRequest(method, endpoint string, body interface{}) (*HTTPResponse, error) {
+	var reqBody []byte
+	var err error
+
+	if body != nil {
+		reqBody, err = json.Marshal(body)
+		if err != nil {
+			return nil, fmt.Errorf("failed to marshal request body: %w", err)
+		}
+	}
+
+	headers := map[string]string{
+		"X-API-KEY":    c.APIKey,
+		"Content-Type": "application/json",
+	}
+
+	req := &HTTPRequest{
+		Method:  method,
+		URL:     c.BaseURL + endpoint,
+		Headers: headers,
+		Body:    reqBody,
+	}
+
+	resp, err := c.HTTPClient.Do(req)
+	if err != nil {
+		return nil, fmt.Errorf("request failed: %w", err)
+	}
+
+	// Handle API errors
+	if resp.StatusCode >= 400 {
+		var apiErr APIError
+		if len(resp.Body) > 0 {
+			// Try to parse the actual error format: {"detail": "message"}
+			var errorResp struct {
+				Detail string `json:"detail"`
+			}
+			if err := json.Unmarshal(resp.Body, &errorResp); err == nil && errorResp.Detail != "" {
+				apiErr = APIError{
+					Code:    resp.StatusCode,
+					Message: errorResp.Detail,
+				}
+			} else {
+				// Fallback: use raw body as details
+				apiErr = APIError{
+					Code:    resp.StatusCode,
+					Message: fmt.Sprintf("API request failed with status %d", resp.StatusCode),
+					Details: string(resp.Body),
+				}
+			}
+		} else {
+			apiErr = APIError{
+				Code:    resp.StatusCode,
+				Message: fmt.Sprintf("API request failed with status %d", resp.StatusCode),
+			}
+		}
+		return nil, &apiErr
+	}
+
+	return resp, nil
+}
+
+// buildQueryParams builds query parameters for GET requests
+func buildQueryParams(params map[string]interface{}) string {
+	if len(params) == 0 {
+		return ""
+	}
+
+	values := url.Values{}
+	for key, value := range params {
+		if value == nil {
+			continue
+		}
+		switch v := value.(type) {
+		case string:
+			if v != "" {
+				values.Add(key, v)
+			}
+		case int:
+			if v != 0 {
+				values.Add(key, strconv.Itoa(v))
+			}
+		case int64:
+			if v != 0 {
+				values.Add(key, strconv.FormatInt(v, 10))
+			}
+		case float64:
+			if v != 0 {
+				values.Add(key, strconv.FormatFloat(v, 'f', -1, 64))
+			}
+		case bool:
+			values.Add(key, strconv.FormatBool(v))
+		case time.Time:
+			if !v.IsZero() {
+				values.Add(key, v.Format(time.RFC3339))
+			}
+		case *time.Time:
+			if v != nil && !v.IsZero() {
+				values.Add(key, v.Format(time.RFC3339))
+			}
+		case []int:
+			if len(v) > 0 {
+				if encoded, err := json.Marshal(v); err == nil {
+					values.Add(key, string(encoded))
+				}
+			}
+		case []string:
+			if len(v) > 0 {
+				if encoded, err := json.Marshal(v); err == nil {
+					values.Add(key, string(encoded))
+				}
+			}
+		default:
+			values.Add(key, fmt.Sprint(v))
+		}
+	}
+
+	if len(values) > 0 {
+		return "?" + values.Encode()
+	}
+	return ""
+}
--- a/pkg/ionet/container.go
+++ b/pkg/ionet/container.go
@@ -0,0 +1,302 @@
+package ionet
+
+import (
+	"encoding/json"
+	"fmt"
+	"strings"
+	"time"
+
+	"github.com/samber/lo"
+)
+
+// ListContainers retrieves all containers for a specific deployment
+func (c *Client) ListContainers(deploymentID string) (*ContainerList, error) {
+	if deploymentID == "" {
+		return nil, fmt.Errorf("deployment ID cannot be empty")
+	}
+
+	endpoint := fmt.Sprintf("/deployment/%s/containers", deploymentID)
+
+	resp, err := c.makeRequest("GET", endpoint, nil)
+	if err != nil {
+		return nil, fmt.Errorf("failed to list containers: %w", err)
+	}
+
+	var containerList ContainerList
+	if err := decodeDataWithFlexibleTimes(resp.Body, &containerList); err != nil {
+		return nil, fmt.Errorf("failed to parse containers list: %w", err)
+	}
+
+	return &containerList, nil
+}
+
+// GetContainerDetails retrieves detailed information about a specific container
+func (c *Client) GetContainerDetails(deploymentID, containerID string) (*Container, error) {
+	if deploymentID == "" {
+		return nil, fmt.Errorf("deployment ID cannot be empty")
+	}
+	if containerID == "" {
+		return nil, fmt.Errorf("container ID cannot be empty")
+	}
+
+	endpoint := fmt.Sprintf("/deployment/%s/container/%s", deploymentID, containerID)
+
+	resp, err := c.makeRequest("GET", endpoint, nil)
+	if err != nil {
+		return nil, fmt.Errorf("failed to get container details: %w", err)
+	}
+
+	// API response format not documented, assuming direct format
+	var container Container
+	if err := decodeWithFlexibleTimes(resp.Body, &container); err != nil {
+		return nil, fmt.Errorf("failed to parse container details: %w", err)
+	}
+
+	return &container, nil
+}
+
+// GetContainerJobs retrieves containers jobs for a specific container (similar to containers endpoint)
+func (c *Client) GetContainerJobs(deploymentID, containerID string) (*ContainerList, error) {
+	if deploymentID == "" {
+		return nil, fmt.Errorf("deployment ID cannot be empty")
+	}
+	if containerID == "" {
+		return nil, fmt.Errorf("container ID cannot be empty")
+	}
+
+	endpoint := fmt.Sprintf("/deployment/%s/containers-jobs/%s", deploymentID, containerID)
+
+	resp, err := c.makeRequest("GET", endpoint, nil)
+	if err != nil {
+		return nil, fmt.Errorf("failed to get container jobs: %w", err)
+	}
+
+	var containerList ContainerList
+	if err := decodeDataWithFlexibleTimes(resp.Body, &containerList); err != nil {
+		return nil, fmt.Errorf("failed to parse container jobs: %w", err)
+	}
+
+	return &containerList, nil
+}
+
+// buildLogEndpoint constructs the request path for fetching logs
+func buildLogEndpoint(deploymentID, containerID string, opts *GetLogsOptions) (string, error) {
+	if deploymentID == "" {
+		return "", fmt.Errorf("deployment ID cannot be empty")
+	}
+	if containerID == "" {
+		return "", fmt.Errorf("container ID cannot be empty")
+	}
+
+	params := make(map[string]interface{})
+
+	if opts != nil {
+		if opts.Level != "" {
+			params["level"] = opts.Level
+		}
+		if opts.Stream != "" {
+			params["stream"] = opts.Stream
+		}
+		if opts.Limit > 0 {
+			params["limit"] = opts.Limit
+		}
+		if opts.Cursor != "" {
+			params["cursor"] = opts.Cursor
+		}
+		if opts.Follow {
+			params["follow"] = true
+		}
+
+		if opts.StartTime != nil {
+			params["start_time"] = opts.StartTime
+		}
+		if opts.EndTime != nil {
+			params["end_time"] = opts.EndTime
+		}
+	}
+
+	endpoint := fmt.Sprintf("/deployment/%s/log/%s", deploymentID, containerID)
+	endpoint += buildQueryParams(params)
+
+	return endpoint, nil
+}
+
+// GetContainerLogs retrieves logs for containers in a deployment and normalizes them
+func (c *Client) GetContainerLogs(deploymentID, containerID string, opts *GetLogsOptions) (*ContainerLogs, error) {
+	raw, err := c.GetContainerLogsRaw(deploymentID, containerID, opts)
+	if err != nil {
+		return nil, err
+	}
+
+	logs := &ContainerLogs{
+		ContainerID: containerID,
+	}
+
+	if raw == "" {
+		return logs, nil
+	}
+
+	normalized := strings.ReplaceAll(raw, "\r\n", "\n")
+	lines := strings.Split(normalized, "\n")
+	logs.Logs = lo.FilterMap(lines, func(line string, _ int) (LogEntry, bool) {
+		if strings.TrimSpace(line) == "" {
+			return LogEntry{}, false
+		}
+		return LogEntry{Message: line}, true
+	})
+
+	return logs, nil
+}
+
+// GetContainerLogsRaw retrieves the raw text logs for a specific container
+func (c *Client) GetContainerLogsRaw(deploymentID, containerID string, opts *GetLogsOptions) (string, error) {
+	endpoint, err := buildLogEndpoint(deploymentID, containerID, opts)
+	if err != nil {
+		return "", err
+	}
+
+	resp, err := c.makeRequest("GET", endpoint, nil)
+	if err != nil {
+		return "", fmt.Errorf("failed to get container logs: %w", err)
+	}
+
+	return string(resp.Body), nil
+}
+
+// StreamContainerLogs streams real-time logs for a specific container
+// This method uses a callback function to handle incoming log entries
+func (c *Client) StreamContainerLogs(deploymentID, containerID string, opts *GetLogsOptions, callback func(*LogEntry) error) error {
+	if deploymentID == "" {
+		return fmt.Errorf("deployment ID cannot be empty")
+	}
+	if containerID == "" {
+		return fmt.Errorf("container ID cannot be empty")
+	}
+	if callback == nil {
+		return fmt.Errorf("callback function cannot be nil")
+	}
+
+	// Set follow to true for streaming
+	if opts == nil {
+		opts = &GetLogsOptions{}
+	}
+	opts.Follow = true
+
+	endpoint, err := buildLogEndpoint(deploymentID, containerID, opts)
+	if err != nil {
+		return err
+	}
+
+	// Note: This is a simplified implementation. In a real scenario, you might want to use
+	// Server-Sent Events (SSE) or WebSocket for streaming logs
+	for {
+		resp, err := c.makeRequest("GET", endpoint, nil)
+		if err != nil {
+			return fmt.Errorf("failed to stream container logs: %w", err)
+		}
+
+		var logs ContainerLogs
+		if err := decodeWithFlexibleTimes(resp.Body, &logs); err != nil {
+			return fmt.Errorf("failed to parse container logs: %w", err)
+		}
+
+		// Call the callback for each log entry
+		for _, logEntry := range logs.Logs {
+			if err := callback(&logEntry); err != nil {
+				return fmt.Errorf("callback error: %w", err)
+			}
+		}
+
+		// If there are no more logs or we have a cursor, continue polling
+		if !logs.HasMore && logs.NextCursor == "" {
+			break
+		}
+
+		// Update cursor for next request
+		if logs.NextCursor != "" {
+			opts.Cursor = logs.NextCursor
+			endpoint, err = buildLogEndpoint(deploymentID, containerID, opts)
+			if err != nil {
+				return err
+			}
+		}
+
+		// Wait a bit before next poll to avoid overwhelming the API
+		time.Sleep(2 * time.Second)
+	}
+
+	return nil
+}
+
+// RestartContainer restarts a specific container (if supported by the API)
+func (c *Client) RestartContainer(deploymentID, containerID string) error {
+	if deploymentID == "" {
+		return fmt.Errorf("deployment ID cannot be empty")
+	}
+	if containerID == "" {
+		return fmt.Errorf("container ID cannot be empty")
+	}
+
+	endpoint := fmt.Sprintf("/deployment/%s/container/%s/restart", deploymentID, containerID)
+
+	_, err := c.makeRequest("POST", endpoint, nil)
+	if err != nil {
+		return fmt.Errorf("failed to restart container: %w", err)
+	}
+
+	return nil
+}
+
+// StopContainer stops a specific container (if supported by the API)
+func (c *Client) StopContainer(deploymentID, containerID string) error {
+	if deploymentID == "" {
+		return fmt.Errorf("deployment ID cannot be empty")
+	}
+	if containerID == "" {
+		return fmt.Errorf("container ID cannot be empty")
+	}
+
+	endpoint := fmt.Sprintf("/deployment/%s/container/%s/stop", deploymentID, containerID)
+
+	_, err := c.makeRequest("POST", endpoint, nil)
+	if err != nil {
+		return fmt.Errorf("failed to stop container: %w", err)
+	}
+
+	return nil
+}
+
+// ExecuteInContainer executes a command in a specific container (if supported by the API)
+func (c *Client) ExecuteInContainer(deploymentID, containerID string, command []string) (string, error) {
+	if deploymentID == "" {
+		return "", fmt.Errorf("deployment ID cannot be empty")
+	}
+	if containerID == "" {
+		return "", fmt.Errorf("container ID cannot be empty")
+	}
+	if len(command) == 0 {
+		return "", fmt.Errorf("command cannot be empty")
+	}
+
+	reqBody := map[string]interface{}{
+		"command": command,
+	}
+
+	endpoint := fmt.Sprintf("/deployment/%s/container/%s/exec", deploymentID, containerID)
+
+	resp, err := c.makeRequest("POST", endpoint, reqBody)
+	if err != nil {
+		return "", fmt.Errorf("failed to execute command in container: %w", err)
+	}
+
+	var result map[string]interface{}
+	if err := json.Unmarshal(resp.Body, &result); err != nil {
+		return "", fmt.Errorf("failed to parse execution result: %w", err)
+	}
+
+	if output, ok := result["output"].(string); ok {
+		return output, nil
+	}
+
+	return string(resp.Body), nil
+}
--- a/pkg/ionet/deployment.go
+++ b/pkg/ionet/deployment.go
@@ -0,0 +1,377 @@
+package ionet
+
+import (
+	"encoding/json"
+	"fmt"
+	"strings"
+
+	"github.com/samber/lo"
+)
+
+// DeployContainer deploys a new container with the specified configuration
+func (c *Client) DeployContainer(req *DeploymentRequest) (*DeploymentResponse, error) {
+	if req == nil {
+		return nil, fmt.Errorf("deployment request cannot be nil")
+	}
+
+	// Validate required fields
+	if req.ResourcePrivateName == "" {
+		return nil, fmt.Errorf("resource_private_name is required")
+	}
+	if len(req.LocationIDs) == 0 {
+		return nil, fmt.Errorf("location_ids is required")
+	}
+	if req.HardwareID <= 0 {
+		return nil, fmt.Errorf("hardware_id is required")
+	}
+	if req.RegistryConfig.ImageURL == "" {
+		return nil, fmt.Errorf("registry_config.image_url is required")
+	}
+	if req.GPUsPerContainer < 1 {
+		return nil, fmt.Errorf("gpus_per_container must be at least 1")
+	}
+	if req.DurationHours < 1 {
+		return nil, fmt.Errorf("duration_hours must be at least 1")
+	}
+	if req.ContainerConfig.ReplicaCount < 1 {
+		return nil, fmt.Errorf("container_config.replica_count must be at least 1")
+	}
+
+	resp, err := c.makeRequest("POST", "/deploy", req)
+	if err != nil {
+		return nil, fmt.Errorf("failed to deploy container: %w", err)
+	}
+
+	// API returns direct format:
+	// {"status": "string", "deployment_id": "..."}
+	var deployResp DeploymentResponse
+	if err := json.Unmarshal(resp.Body, &deployResp); err != nil {
+		return nil, fmt.Errorf("failed to parse deployment response: %w", err)
+	}
+
+	return &deployResp, nil
+}
+
+// ListDeployments retrieves a list of deployments with optional filtering
+func (c *Client) ListDeployments(opts *ListDeploymentsOptions) (*DeploymentList, error) {
+	params := make(map[string]interface{})
+
+	if opts != nil {
+		params["status"] = opts.Status
+		params["location_id"] = opts.LocationID
+		params["page"] = opts.Page
+		params["page_size"] = opts.PageSize
+		params["sort_by"] = opts.SortBy
+		params["sort_order"] = opts.SortOrder
+	}
+
+	endpoint := "/deployments" + buildQueryParams(params)
+
+	resp, err := c.makeRequest("GET", endpoint, nil)
+	if err != nil {
+		return nil, fmt.Errorf("failed to list deployments: %w", err)
+	}
+
+	var deploymentList DeploymentList
+	if err := decodeData(resp.Body, &deploymentList); err != nil {
+		return nil, fmt.Errorf("failed to parse deployments list: %w", err)
+	}
+
+	deploymentList.Deployments = lo.Map(deploymentList.Deployments, func(deployment Deployment, _ int) Deployment {
+		deployment.GPUCount = deployment.HardwareQuantity
+		deployment.Replicas = deployment.HardwareQuantity // Assuming 1:1 mapping for now
+		return deployment
+	})
+
+	return &deploymentList, nil
+}
+
+// GetDeployment retrieves detailed information about a specific deployment
+func (c *Client) GetDeployment(deploymentID string) (*DeploymentDetail, error) {
+	if deploymentID == "" {
+		return nil, fmt.Errorf("deployment ID cannot be empty")
+	}
+
+	endpoint := fmt.Sprintf("/deployment/%s", deploymentID)
+
+	resp, err := c.makeRequest("GET", endpoint, nil)
+	if err != nil {
+		return nil, fmt.Errorf("failed to get deployment details: %w", err)
+	}
+
+	var deploymentDetail DeploymentDetail
+	if err := decodeDataWithFlexibleTimes(resp.Body, &deploymentDetail); err != nil {
+		return nil, fmt.Errorf("failed to parse deployment details: %w", err)
+	}
+
+	return &deploymentDetail, nil
+}
+
+// UpdateDeployment updates the configuration of an existing deployment
+func (c *Client) UpdateDeployment(deploymentID string, req *UpdateDeploymentRequest) (*UpdateDeploymentResponse, error) {
+	if deploymentID == "" {
+		return nil, fmt.Errorf("deployment ID cannot be empty")
+	}
+	if req == nil {
+		return nil, fmt.Errorf("update request cannot be nil")
+	}
+
+	endpoint := fmt.Sprintf("/deployment/%s", deploymentID)
+
+	resp, err := c.makeRequest("PATCH", endpoint, req)
+	if err != nil {
+		return nil, fmt.Errorf("failed to update deployment: %w", err)
+	}
+
+	// API returns direct format:
+	// {"status": "string", "deployment_id": "..."}
+	var updateResp UpdateDeploymentResponse
+	if err := json.Unmarshal(resp.Body, &updateResp); err != nil {
+		return nil, fmt.Errorf("failed to parse update deployment response: %w", err)
+	}
+
+	return &updateResp, nil
+}
+
+// ExtendDeployment extends the duration of an existing deployment
+func (c *Client) ExtendDeployment(deploymentID string, req *ExtendDurationRequest) (*DeploymentDetail, error) {
+	if deploymentID == "" {
+		return nil, fmt.Errorf("deployment ID cannot be empty")
+	}
+	if req == nil {
+		return nil, fmt.Errorf("extend request cannot be nil")
+	}
+	if req.DurationHours < 1 {
+		return nil, fmt.Errorf("duration_hours must be at least 1")
+	}
+
+	endpoint := fmt.Sprintf("/deployment/%s/extend", deploymentID)
+
+	resp, err := c.makeRequest("POST", endpoint, req)
+	if err != nil {
+		return nil, fmt.Errorf("failed to extend deployment: %w", err)
+	}
+
+	var deploymentDetail DeploymentDetail
+	if err := decodeDataWithFlexibleTimes(resp.Body, &deploymentDetail); err != nil {
+		return nil, fmt.Errorf("failed to parse extended deployment details: %w", err)
+	}
+
+	return &deploymentDetail, nil
+}
+
+// DeleteDeployment deletes an active deployment
+func (c *Client) DeleteDeployment(deploymentID string) (*UpdateDeploymentResponse, error) {
+	if deploymentID == "" {
+		return nil, fmt.Errorf("deployment ID cannot be empty")
+	}
+
+	endpoint := fmt.Sprintf("/deployment/%s", deploymentID)
+
+	resp, err := c.makeRequest("DELETE", endpoint, nil)
+	if err != nil {
+		return nil, fmt.Errorf("failed to delete deployment: %w", err)
+	}
+
+	// API returns direct format:
+	// {"status": "string", "deployment_id": "..."}
+	var deleteResp UpdateDeploymentResponse
+	if err := json.Unmarshal(resp.Body, &deleteResp); err != nil {
+		return nil, fmt.Errorf("failed to parse delete deployment response: %w", err)
+	}
+
+	return &deleteResp, nil
+}
+
+// GetPriceEstimation calculates the estimated cost for a deployment
+func (c *Client) GetPriceEstimation(req *PriceEstimationRequest) (*PriceEstimationResponse, error) {
+	if req == nil {
+		return nil, fmt.Errorf("price estimation request cannot be nil")
+	}
+
+	// Validate required fields
+	if len(req.LocationIDs) == 0 {
+		return nil, fmt.Errorf("location_ids is required")
+	}
+	if req.HardwareID == 0 {
+		return nil, fmt.Errorf("hardware_id is required")
+	}
+	if req.ReplicaCount < 1 {
+		return nil, fmt.Errorf("replica_count must be at least 1")
+	}
+
+	currency := strings.TrimSpace(req.Currency)
+	if currency == "" {
+		currency = "usdc"
+	}
+
+	durationType := strings.TrimSpace(req.DurationType)
+	if durationType == "" {
+		durationType = "hour"
+	}
+	durationType = strings.ToLower(durationType)
+
+	apiDurationType := ""
+
+	durationQty := req.DurationQty
+	if durationQty < 1 {
+		durationQty = req.DurationHours
+	}
+	if durationQty < 1 {
+		return nil, fmt.Errorf("duration_qty must be at least 1")
+	}
+
+	hardwareQty := req.HardwareQty
+	if hardwareQty < 1 {
+		hardwareQty = req.GPUsPerContainer
+	}
+	if hardwareQty < 1 {
+		return nil, fmt.Errorf("hardware_qty must be at least 1")
+	}
+
+	durationHoursForRate := req.DurationHours
+	if durationHoursForRate < 1 {
+		durationHoursForRate = durationQty
+	}
+	switch durationType {
+	case "hour", "hours", "hourly":
+		durationHoursForRate = durationQty
+		apiDurationType = "hourly"
+	case "day", "days", "daily":
+		durationHoursForRate = durationQty * 24
+		apiDurationType = "daily"
+	case "week", "weeks", "weekly":
+		durationHoursForRate = durationQty * 24 * 7
+		apiDurationType = "weekly"
+	case "month", "months", "monthly":
+		durationHoursForRate = durationQty * 24 * 30
+		apiDurationType = "monthly"
+	}
+	if durationHoursForRate < 1 {
+		durationHoursForRate = 1
+	}
+	if apiDurationType == "" {
+		apiDurationType = "hourly"
+	}
+
+	params := map[string]interface{}{
+		"location_ids":       req.LocationIDs,
+		"hardware_id":        req.HardwareID,
+		"hardware_qty":       hardwareQty,
+		"gpus_per_container": req.GPUsPerContainer,
+		"duration_type":      apiDurationType,
+		"duration_qty":       durationQty,
+		"duration_hours":     req.DurationHours,
+		"replica_count":      req.ReplicaCount,
+		"currency":           currency,
+	}
+
+	endpoint := "/price" + buildQueryParams(params)
+
+	resp, err := c.makeRequest("GET", endpoint, nil)
+	if err != nil {
+		return nil, fmt.Errorf("failed to get price estimation: %w", err)
+	}
+
+	// Parse according to the actual API response format from docs:
+	// {
+	//   "data": {
+	//     "replica_count": 0,
+	//     "gpus_per_container": 0,
+	//     "available_replica_count": [0],
+	//     "discount": 0,
+	//     "ionet_fee": 0,
+	//     "ionet_fee_percent": 0,
+	//     "currency_conversion_fee": 0,
+	//     "currency_conversion_fee_percent": 0,
+	//     "total_cost_usdc": 0
+	//   }
+	// }
+	var pricingData struct {
+		ReplicaCount                 int     `json:"replica_count"`
+		GPUsPerContainer             int     `json:"gpus_per_container"`
+		AvailableReplicaCount        []int   `json:"available_replica_count"`
+		Discount                     float64 `json:"discount"`
+		IonetFee                     float64 `json:"ionet_fee"`
+		IonetFeePercent              float64 `json:"ionet_fee_percent"`
+		CurrencyConversionFee        float64 `json:"currency_conversion_fee"`
+		CurrencyConversionFeePercent float64 `json:"currency_conversion_fee_percent"`
+		TotalCostUSDC                float64 `json:"total_cost_usdc"`
+	}
+
+	if err := decodeData(resp.Body, &pricingData); err != nil {
+		return nil, fmt.Errorf("failed to parse price estimation response: %w", err)
+	}
+
+	// Convert to our internal format
+	durationHoursFloat := float64(durationHoursForRate)
+	if durationHoursFloat <= 0 {
+		durationHoursFloat = 1
+	}
+
+	priceResp := &PriceEstimationResponse{
+		EstimatedCost:   pricingData.TotalCostUSDC,
+		Currency:        strings.ToUpper(currency),
+		EstimationValid: true,
+		PriceBreakdown: PriceBreakdown{
+			ComputeCost: pricingData.TotalCostUSDC - pricingData.IonetFee - pricingData.CurrencyConversionFee,
+			TotalCost:   pricingData.TotalCostUSDC,
+			HourlyRate:  pricingData.TotalCostUSDC / durationHoursFloat,
+		},
+	}
+
+	return priceResp, nil
+}
+
+// CheckClusterNameAvailability checks if a cluster name is available
+func (c *Client) CheckClusterNameAvailability(clusterName string) (bool, error) {
+	if clusterName == "" {
+		return false, fmt.Errorf("cluster name cannot be empty")
+	}
+
+	params := map[string]interface{}{
+		"cluster_name": clusterName,
+	}
+
+	endpoint := "/clusters/check_cluster_name_availability" + buildQueryParams(params)
+
+	resp, err := c.makeRequest("GET", endpoint, nil)
+	if err != nil {
+		return false, fmt.Errorf("failed to check cluster name availability: %w", err)
+	}
+
+	var availabilityResp bool
+	if err := json.Unmarshal(resp.Body, &availabilityResp); err != nil {
+		return false, fmt.Errorf("failed to parse cluster name availability response: %w", err)
+	}
+
+	return availabilityResp, nil
+}
+
+// UpdateClusterName updates the name of an existing cluster/deployment
+func (c *Client) UpdateClusterName(clusterID string, req *UpdateClusterNameRequest) (*UpdateClusterNameResponse, error) {
+	if clusterID == "" {
+		return nil, fmt.Errorf("cluster ID cannot be empty")
+	}
+	if req == nil {
+		return nil, fmt.Errorf("update cluster name request cannot be nil")
+	}
+	if req.Name == "" {
+		return nil, fmt.Errorf("cluster name cannot be empty")
+	}
+
+	endpoint := fmt.Sprintf("/clusters/%s/update-name", clusterID)
+
+	resp, err := c.makeRequest("PUT", endpoint, req)
+	if err != nil {
+		return nil, fmt.Errorf("failed to update cluster name: %w", err)
+	}
+
+	// Parse the response directly without data wrapper based on API docs
+	var updateResp UpdateClusterNameResponse
+	if err := json.Unmarshal(resp.Body, &updateResp); err != nil {
+		return nil, fmt.Errorf("failed to parse update cluster name response: %w", err)
+	}
+
+	return &updateResp, nil
+}
--- a/pkg/ionet/hardware.go
+++ b/pkg/ionet/hardware.go
@@ -0,0 +1,202 @@
+package ionet
+
+import (
+	"encoding/json"
+	"fmt"
+	"strings"
+
+	"github.com/samber/lo"
+)
+
+// GetAvailableReplicas retrieves available replicas per location for specified hardware
+func (c *Client) GetAvailableReplicas(hardwareID int, gpuCount int) (*AvailableReplicasResponse, error) {
+	if hardwareID <= 0 {
+		return nil, fmt.Errorf("hardware_id must be greater than 0")
+	}
+	if gpuCount < 1 {
+		return nil, fmt.Errorf("gpu_count must be at least 1")
+	}
+
+	params := map[string]interface{}{
+		"hardware_id":  hardwareID,
+		"hardware_qty": gpuCount,
+	}
+
+	endpoint := "/available-replicas" + buildQueryParams(params)
+
+	resp, err := c.makeRequest("GET", endpoint, nil)
+	if err != nil {
+		return nil, fmt.Errorf("failed to get available replicas: %w", err)
+	}
+
+	type availableReplicaPayload struct {
+		ID                int    `json:"id"`
+		ISO2              string `json:"iso2"`
+		Name              string `json:"name"`
+		AvailableReplicas int    `json:"available_replicas"`
+	}
+	var payload []availableReplicaPayload
+
+	if err := decodeData(resp.Body, &payload); err != nil {
+		return nil, fmt.Errorf("failed to parse available replicas response: %w", err)
+	}
+
+	replicas := lo.Map(payload, func(item availableReplicaPayload, _ int) AvailableReplica {
+		return AvailableReplica{
+			LocationID:     item.ID,
+			LocationName:   item.Name,
+			HardwareID:     hardwareID,
+			HardwareName:   "",
+			AvailableCount: item.AvailableReplicas,
+			MaxGPUs:        gpuCount,
+		}
+	})
+
+	return &AvailableReplicasResponse{Replicas: replicas}, nil
+}
+
+// GetMaxGPUsPerContainer retrieves the maximum number of GPUs available per hardware type
+func (c *Client) GetMaxGPUsPerContainer() (*MaxGPUResponse, error) {
+	resp, err := c.makeRequest("GET", "/hardware/max-gpus-per-container", nil)
+	if err != nil {
+		return nil, fmt.Errorf("failed to get max GPUs per container: %w", err)
+	}
+
+	var maxGPUResp MaxGPUResponse
+	if err := decodeData(resp.Body, &maxGPUResp); err != nil {
+		return nil, fmt.Errorf("failed to parse max GPU response: %w", err)
+	}
+
+	return &maxGPUResp, nil
+}
+
+// ListHardwareTypes retrieves available hardware types using the max GPUs endpoint
+func (c *Client) ListHardwareTypes() ([]HardwareType, int, error) {
+	maxGPUResp, err := c.GetMaxGPUsPerContainer()
+	if err != nil {
+		return nil, 0, fmt.Errorf("failed to list hardware types: %w", err)
+	}
+
+	mapped := lo.Map(maxGPUResp.Hardware, func(hw MaxGPUInfo, _ int) HardwareType {
+		name := strings.TrimSpace(hw.HardwareName)
+		if name == "" {
+			name = fmt.Sprintf("Hardware %d", hw.HardwareID)
+		}
+
+		return HardwareType{
+			ID:             hw.HardwareID,
+			Name:           name,
+			GPUType:        "",
+			GPUMemory:      0,
+			MaxGPUs:        hw.MaxGPUsPerContainer,
+			CPU:            "",
+			Memory:         0,
+			Storage:        0,
+			HourlyRate:     0,
+			Available:      hw.Available > 0,
+			BrandName:      strings.TrimSpace(hw.BrandName),
+			AvailableCount: hw.Available,
+		}
+	})
+
+	totalAvailable := maxGPUResp.Total
+	if totalAvailable == 0 {
+		totalAvailable = lo.SumBy(maxGPUResp.Hardware, func(hw MaxGPUInfo) int {
+			return hw.Available
+		})
+	}
+
+	return mapped, totalAvailable, nil
+}
+
+// ListLocations retrieves available deployment locations (if supported by the API)
+func (c *Client) ListLocations() (*LocationsResponse, error) {
+	resp, err := c.makeRequest("GET", "/locations", nil)
+	if err != nil {
+		return nil, fmt.Errorf("failed to list locations: %w", err)
+	}
+
+	var locations LocationsResponse
+	if err := decodeData(resp.Body, &locations); err != nil {
+		return nil, fmt.Errorf("failed to parse locations response: %w", err)
+	}
+
+	locations.Locations = lo.Map(locations.Locations, func(location Location, _ int) Location {
+		location.ISO2 = strings.ToUpper(strings.TrimSpace(location.ISO2))
+		return location
+	})
+
+	if locations.Total == 0 {
+		locations.Total = lo.SumBy(locations.Locations, func(location Location) int {
+			return location.Available
+		})
+	}
+
+	return &locations, nil
+}
+
+// GetHardwareType retrieves details about a specific hardware type
+func (c *Client) GetHardwareType(hardwareID int) (*HardwareType, error) {
+	if hardwareID <= 0 {
+		return nil, fmt.Errorf("hardware ID must be greater than 0")
+	}
+
+	endpoint := fmt.Sprintf("/hardware/types/%d", hardwareID)
+
+	resp, err := c.makeRequest("GET", endpoint, nil)
+	if err != nil {
+		return nil, fmt.Errorf("failed to get hardware type: %w", err)
+	}
+
+	// API response format not documented, assuming direct format
+	var hardwareType HardwareType
+	if err := json.Unmarshal(resp.Body, &hardwareType); err != nil {
+		return nil, fmt.Errorf("failed to parse hardware type: %w", err)
+	}
+
+	return &hardwareType, nil
+}
+
+// GetLocation retrieves details about a specific location
+func (c *Client) GetLocation(locationID int) (*Location, error) {
+	if locationID <= 0 {
+		return nil, fmt.Errorf("location ID must be greater than 0")
+	}
+
+	endpoint := fmt.Sprintf("/locations/%d", locationID)
+
+	resp, err := c.makeRequest("GET", endpoint, nil)
+	if err != nil {
+		return nil, fmt.Errorf("failed to get location: %w", err)
+	}
+
+	// API response format not documented, assuming direct format
+	var location Location
+	if err := json.Unmarshal(resp.Body, &location); err != nil {
+		return nil, fmt.Errorf("failed to parse location: %w", err)
+	}
+
+	return &location, nil
+}
+
+// GetLocationAvailability retrieves real-time availability for a specific location
+func (c *Client) GetLocationAvailability(locationID int) (*LocationAvailability, error) {
+	if locationID <= 0 {
+		return nil, fmt.Errorf("location ID must be greater than 0")
+	}
+
+	endpoint := fmt.Sprintf("/locations/%d/availability", locationID)
+
+	resp, err := c.makeRequest("GET", endpoint, nil)
+	if err != nil {
+		return nil, fmt.Errorf("failed to get location availability: %w", err)
+	}
+
+	// API response format not documented, assuming direct format
+	var availability LocationAvailability
+	if err := json.Unmarshal(resp.Body, &availability); err != nil {
+		return nil, fmt.Errorf("failed to parse location availability: %w", err)
+	}
+
+	return &availability, nil
+}
--- a/pkg/ionet/jsonutil.go
+++ b/pkg/ionet/jsonutil.go
@@ -0,0 +1,96 @@
+package ionet
+
+import (
+	"encoding/json"
+	"strings"
+	"time"
+
+	"github.com/samber/lo"
+)
+
+// decodeWithFlexibleTimes unmarshals API responses while tolerating timestamp strings
+// that omit timezone information by normalizing them to RFC3339Nano.
+func decodeWithFlexibleTimes(data []byte, target interface{}) error {
+	var intermediate interface{}
+	if err := json.Unmarshal(data, &intermediate); err != nil {
+		return err
+	}
+
+	normalized := normalizeTimeValues(intermediate)
+	reencoded, err := json.Marshal(normalized)
+	if err != nil {
+		return err
+	}
+
+	return json.Unmarshal(reencoded, target)
+}
+
+func decodeData[T any](data []byte, target *T) error {
+	var wrapper struct {
+		Data T `json:"data"`
+	}
+	if err := json.Unmarshal(data, &wrapper); err != nil {
+		return err
+	}
+	*target = wrapper.Data
+	return nil
+}
+
+func decodeDataWithFlexibleTimes[T any](data []byte, target *T) error {
+	var wrapper struct {
+		Data T `json:"data"`
+	}
+	if err := decodeWithFlexibleTimes(data, &wrapper); err != nil {
+		return err
+	}
+	*target = wrapper.Data
+	return nil
+}
+
+func normalizeTimeValues(value interface{}) interface{} {
+	switch v := value.(type) {
+	case map[string]interface{}:
+		return lo.MapValues(v, func(val interface{}, _ string) interface{} {
+			return normalizeTimeValues(val)
+		})
+	case []interface{}:
+		return lo.Map(v, func(item interface{}, _ int) interface{} {
+			return normalizeTimeValues(item)
+		})
+	case string:
+		if normalized, changed := normalizeTimeString(v); changed {
+			return normalized
+		}
+		return v
+	default:
+		return value
+	}
+}
+
+func normalizeTimeString(input string) (string, bool) {
+	trimmed := strings.TrimSpace(input)
+	if trimmed == "" {
+		return input, false
+	}
+
+	if _, err := time.Parse(time.RFC3339Nano, trimmed); err == nil {
+		return trimmed, trimmed != input
+	}
+	if _, err := time.Parse(time.RFC3339, trimmed); err == nil {
+		return trimmed, trimmed != input
+	}
+
+	layouts := []string{
+		"2006-01-02T15:04:05.999999999",
+		"2006-01-02T15:04:05.999999",
+		"2006-01-02T15:04:05",
+	}
+
+	for _, layout := range layouts {
+		if parsed, err := time.Parse(layout, trimmed); err == nil {
+			return parsed.UTC().Format(time.RFC3339Nano), true
+		}
+	}
+
+	return input, false
+}
--- a/pkg/ionet/types.go
+++ b/pkg/ionet/types.go
@@ -0,0 +1,353 @@
+package ionet
+
+import (
+	"time"
+)
+
+// Client represents the IO.NET API client
+type Client struct {
+	BaseURL    string
+	APIKey     string
+	HTTPClient HTTPClient
+}
+
+// HTTPClient interface for making HTTP requests
+type HTTPClient interface {
+	Do(req *HTTPRequest) (*HTTPResponse, error)
+}
+
+// HTTPRequest represents an HTTP request
+type HTTPRequest struct {
+	Method  string
+	URL     string
+	Headers map[string]string
+	Body    []byte
+}
+
+// HTTPResponse represents an HTTP response
+type HTTPResponse struct {
+	StatusCode int
+	Headers    map[string]string
+	Body       []byte
+}
+
+// DeploymentRequest represents a container deployment request
+type DeploymentRequest struct {
+	ResourcePrivateName string          `json:"resource_private_name"`
+	DurationHours       int             `json:"duration_hours"`
+	GPUsPerContainer    int             `json:"gpus_per_container"`
+	HardwareID          int             `json:"hardware_id"`
+	LocationIDs         []int           `json:"location_ids"`
+	ContainerConfig     ContainerConfig `json:"container_config"`
+	RegistryConfig      RegistryConfig  `json:"registry_config"`
+}
+
+// ContainerConfig represents container configuration
+type ContainerConfig struct {
+	ReplicaCount       int               `json:"replica_count"`
+	EnvVariables       map[string]string `json:"env_variables,omitempty"`
+	SecretEnvVariables map[string]string `json:"secret_env_variables,omitempty"`
+	Entrypoint         []string          `json:"entrypoint,omitempty"`
+	TrafficPort        int               `json:"traffic_port,omitempty"`
+	Args               []string          `json:"args,omitempty"`
+}
+
+// RegistryConfig represents registry configuration
+type RegistryConfig struct {
+	ImageURL         string `json:"image_url"`
+	RegistryUsername string `json:"registry_username,omitempty"`
+	RegistrySecret   string `json:"registry_secret,omitempty"`
+}
+
+// DeploymentResponse represents the response from deployment creation
+type DeploymentResponse struct {
+	DeploymentID string `json:"deployment_id"`
+	Status       string `json:"status"`
+}
+
+// DeploymentDetail represents detailed deployment information
+type DeploymentDetail struct {
+	ID                      string                    `json:"id"`
+	Status                  string                    `json:"status"`
+	CreatedAt               time.Time                 `json:"created_at"`
+	StartedAt               *time.Time                `json:"started_at,omitempty"`
+	FinishedAt              *time.Time                `json:"finished_at,omitempty"`
+	AmountPaid              float64                   `json:"amount_paid"`
+	CompletedPercent        float64                   `json:"completed_percent"`
+	TotalGPUs               int                       `json:"total_gpus"`
+	GPUsPerContainer        int                       `json:"gpus_per_container"`
+	TotalContainers         int                       `json:"total_containers"`
+	HardwareName            string                    `json:"hardware_name"`
+	HardwareID              int                       `json:"hardware_id"`
+	Locations               []DeploymentLocation      `json:"locations"`
+	BrandName               string                    `json:"brand_name"`
+	ComputeMinutesServed    int                       `json:"compute_minutes_served"`
+	ComputeMinutesRemaining int                       `json:"compute_minutes_remaining"`
+	ContainerConfig         DeploymentContainerConfig `json:"container_config"`
+}
+
+// DeploymentLocation represents a location in deployment details
+type DeploymentLocation struct {
+	ID   int    `json:"id"`
+	ISO2 string `json:"iso2"`
+	Name string `json:"name"`
+}
+
+// DeploymentContainerConfig represents container config in deployment details
+type DeploymentContainerConfig struct {
+	Entrypoint   []string               `json:"entrypoint"`
+	EnvVariables map[string]interface{} `json:"env_variables"`
+	TrafficPort  int                    `json:"traffic_port"`
+	ImageURL     string                 `json:"image_url"`
+}
+
+// Container represents a container within a deployment
+type Container struct {
+	DeviceID         string           `json:"device_id"`
+	ContainerID      string           `json:"container_id"`
+	Hardware         string           `json:"hardware"`
+	BrandName        string           `json:"brand_name"`
+	CreatedAt        time.Time        `json:"created_at"`
+	UptimePercent    int              `json:"uptime_percent"`
+	GPUsPerContainer int              `json:"gpus_per_container"`
+	Status           string           `json:"status"`
+	ContainerEvents  []ContainerEvent `json:"container_events"`
+	PublicURL        string           `json:"public_url"`
+}
+
+// ContainerEvent represents a container event
+type ContainerEvent struct {
+	Time    time.Time `json:"time"`
+	Message string    `json:"message"`
+}
+
+// ContainerList represents a list of containers
+type ContainerList struct {
+	Total   int         `json:"total"`
+	Workers []Container `json:"workers"`
+}
+
+// Deployment represents a deployment in the list
+type Deployment struct {
+	ID                      string    `json:"id"`
+	Status                  string    `json:"status"`
+	Name                    string    `json:"name"`
+	CompletedPercent        float64   `json:"completed_percent"`
+	HardwareQuantity        int       `json:"hardware_quantity"`
+	BrandName               string    `json:"brand_name"`
+	HardwareName            string    `json:"hardware_name"`
+	Served                  string    `json:"served"`
+	Remaining               string    `json:"remaining"`
+	ComputeMinutesServed    int       `json:"compute_minutes_served"`
+	ComputeMinutesRemaining int       `json:"compute_minutes_remaining"`
+	CreatedAt               time.Time `json:"created_at"`
+	GPUCount                int       `json:"-"` // Derived from HardwareQuantity
+	Replicas                int       `json:"-"` // Derived from HardwareQuantity
+}
+
+// DeploymentList represents a list of deployments with pagination
+type DeploymentList struct {
+	Deployments []Deployment `json:"deployments"`
+	Total       int          `json:"total"`
+	Statuses    []string     `json:"statuses"`
+}
+
+// AvailableReplica represents replica availability for a location
+type AvailableReplica struct {
+	LocationID     int    `json:"location_id"`
+	LocationName   string `json:"location_name"`
+	HardwareID     int    `json:"hardware_id"`
+	HardwareName   string `json:"hardware_name"`
+	AvailableCount int    `json:"available_count"`
+	MaxGPUs        int    `json:"max_gpus"`
+}
+
+// AvailableReplicasResponse represents the response for available replicas
+type AvailableReplicasResponse struct {
+	Replicas []AvailableReplica `json:"replicas"`
+}
+
+// MaxGPUResponse represents the response for maximum GPUs per container
+type MaxGPUResponse struct {
+	Hardware []MaxGPUInfo `json:"hardware"`
+	Total    int          `json:"total"`
+}
+
+// MaxGPUInfo represents max GPU information for a hardware type
+type MaxGPUInfo struct {
+	MaxGPUsPerContainer int    `json:"max_gpus_per_container"`
+	Available           int    `json:"available"`
+	HardwareID          int    `json:"hardware_id"`
+	HardwareName        string `json:"hardware_name"`
+	BrandName           string `json:"brand_name"`
+}
+
+// PriceEstimationRequest represents a price estimation request
+type PriceEstimationRequest struct {
+	LocationIDs      []int  `json:"location_ids"`
+	HardwareID       int    `json:"hardware_id"`
+	GPUsPerContainer int    `json:"gpus_per_container"`
+	DurationHours    int    `json:"duration_hours"`
+	ReplicaCount     int    `json:"replica_count"`
+	Currency         string `json:"currency"`
+	DurationType     string `json:"duration_type"`
+	DurationQty      int    `json:"duration_qty"`
+	HardwareQty      int    `json:"hardware_qty"`
+}
+
+// PriceEstimationResponse represents the price estimation response
+type PriceEstimationResponse struct {
+	EstimatedCost   float64        `json:"estimated_cost"`
+	Currency        string         `json:"currency"`
+	PriceBreakdown  PriceBreakdown `json:"price_breakdown"`
+	EstimationValid bool           `json:"estimation_valid"`
+}
+
+// PriceBreakdown represents detailed cost breakdown
+type PriceBreakdown struct {
+	ComputeCost float64 `json:"compute_cost"`
+	NetworkCost float64 `json:"network_cost,omitempty"`
+	StorageCost float64 `json:"storage_cost,omitempty"`
+	TotalCost   float64 `json:"total_cost"`
+	HourlyRate  float64 `json:"hourly_rate"`
+}
+
+// ContainerLogs represents container log entries
+type ContainerLogs struct {
+	ContainerID string     `json:"container_id"`
+	Logs        []LogEntry `json:"logs"`
+	HasMore     bool       `json:"has_more"`
+	NextCursor  string     `json:"next_cursor,omitempty"`
+}
+
+// LogEntry represents a single log entry
+type LogEntry struct {
+	Timestamp time.Time `json:"timestamp"`
+	Level     string    `json:"level,omitempty"`
+	Message   string    `json:"message"`
+	Source    string    `json:"source,omitempty"`
+}
+
+// UpdateDeploymentRequest represents request to update deployment configuration
+type UpdateDeploymentRequest struct {
+	EnvVariables       map[string]string `json:"env_variables,omitempty"`
+	SecretEnvVariables map[string]string `json:"secret_env_variables,omitempty"`
+	Entrypoint         []string          `json:"entrypoint,omitempty"`
+	TrafficPort        *int              `json:"traffic_port,omitempty"`
+	ImageURL           string            `json:"image_url,omitempty"`
+	RegistryUsername   string            `json:"registry_username,omitempty"`
+	RegistrySecret     string            `json:"registry_secret,omitempty"`
+	Args               []string          `json:"args,omitempty"`
+	Command            string            `json:"command,omitempty"`
+}
+
+// ExtendDurationRequest represents request to extend deployment duration
+type ExtendDurationRequest struct {
+	DurationHours int `json:"duration_hours"`
+}
+
+// UpdateDeploymentResponse represents response from deployment update
+type UpdateDeploymentResponse struct {
+	Status       string `json:"status"`
+	DeploymentID string `json:"deployment_id"`
+}
+
+// UpdateClusterNameRequest represents request to update cluster name
+type UpdateClusterNameRequest struct {
+	Name string `json:"cluster_name"`
+}
+
+// UpdateClusterNameResponse represents response from cluster name update
+type UpdateClusterNameResponse struct {
+	Status  string `json:"status"`
+	Message string `json:"message"`
+}
+
+// APIError represents an API error response
+type APIError struct {
+	Code    int    `json:"code"`
+	Message string `json:"message"`
+	Details string `json:"details,omitempty"`
+}
+
+// Error implements the error interface
+func (e *APIError) Error() string {
+	if e.Details != "" {
+		return e.Message + ": " + e.Details
+	}
+	return e.Message
+}
+
+// ListDeploymentsOptions represents options for listing deployments
+type ListDeploymentsOptions struct {
+	Status     string `json:"status,omitempty"`      // filter by status
+	LocationID int    `json:"location_id,omitempty"` // filter by location
+	Page       int    `json:"page,omitempty"`        // pagination
+	PageSize   int    `json:"page_size,omitempty"`   // pagination
+	SortBy     string `json:"sort_by,omitempty"`     // sort field
+	SortOrder  string `json:"sort_order,omitempty"`  // asc/desc
+}
+
+// GetLogsOptions represents options for retrieving container logs
+type GetLogsOptions struct {
+	StartTime *time.Time `json:"start_time,omitempty"`
+	EndTime   *time.Time `json:"end_time,omitempty"`
+	Level     string     `json:"level,omitempty"`  // filter by log level
+	Stream    string     `json:"stream,omitempty"` // filter by stdout/stderr streams
+	Limit     int        `json:"limit,omitempty"`  // max number of log entries
+	Cursor    string     `json:"cursor,omitempty"` // pagination cursor
+	Follow    bool       `json:"follow,omitempty"` // stream logs
+}
+
+// HardwareType represents a hardware type available for deployment
+type HardwareType struct {
+	ID             int     `json:"id"`
+	Name           string  `json:"name"`
+	Description    string  `json:"description,omitempty"`
+	GPUType        string  `json:"gpu_type"`
+	GPUMemory      int     `json:"gpu_memory"` // in GB
+	MaxGPUs        int     `json:"max_gpus"`
+	CPU            string  `json:"cpu,omitempty"`
+	Memory         int     `json:"memory,omitempty"`  // in GB
+	Storage        int     `json:"storage,omitempty"` // in GB
+	HourlyRate     float64 `json:"hourly_rate"`
+	Available      bool    `json:"available"`
+	BrandName      string  `json:"brand_name,omitempty"`
+	AvailableCount int     `json:"available_count,omitempty"`
+}
+
+// Location represents a deployment location
+type Location struct {
+	ID          int     `json:"id"`
+	Name        string  `json:"name"`
+	ISO2        string  `json:"iso2,omitempty"`
+	Region      string  `json:"region,omitempty"`
+	Country     string  `json:"country,omitempty"`
+	Latitude    float64 `json:"latitude,omitempty"`
+	Longitude   float64 `json:"longitude,omitempty"`
+	Available   int     `json:"available,omitempty"`
+	Description string  `json:"description,omitempty"`
+}
+
+// LocationsResponse represents the list of locations and aggregated metadata.
+type LocationsResponse struct {
+	Locations []Location `json:"locations"`
+	Total     int        `json:"total"`
+}
+
+// LocationAvailability represents real-time availability for a location
+type LocationAvailability struct {
+	LocationID           int                    `json:"location_id"`
+	LocationName         string                 `json:"location_name"`
+	Available            bool                   `json:"available"`
+	HardwareAvailability []HardwareAvailability `json:"hardware_availability"`
+	UpdatedAt            time.Time              `json:"updated_at"`
+}
+
+// HardwareAvailability represents availability for specific hardware at a location
+type HardwareAvailability struct {
+	HardwareID     int    `json:"hardware_id"`
+	HardwareName   string `json:"hardware_name"`
+	AvailableCount int    `json:"available_count"`
+	MaxGPUs        int    `json:"max_gpus"`
+}
--- a/relay/audio_handler.go
+++ b/relay/audio_handler.go
@@ -67,8 +67,11 @@ func AudioHelper(c *gin.Context, info *relaycommon.RelayInfo) (newAPIError *type
 		service.ResetStatusCode(newAPIError, statusCodeMappingStr)
 		return newAPIError
 	}
-
-	postConsumeQuota(c, info, usage.(*dto.Usage), "")
+	if usage.(*dto.Usage).CompletionTokenDetails.AudioTokens > 0 || usage.(*dto.Usage).PromptTokensDetails.AudioTokens > 0 {
+		service.PostAudioConsumeQuota(c, info, usage.(*dto.Usage), "")
+	} else {
+		postConsumeQuota(c, info, usage.(*dto.Usage))
+	}

 	return nil
 }
--- a/relay/channel/ali/adaptor.go
+++ b/relay/channel/ali/adaptor.go
@@ -19,6 +19,22 @@ import (
 )

 type Adaptor struct {
+	IsSyncImageModel bool
+}
+
+var syncModels = []string{
+	"z-image",
+	"qwen-image",
+	"wan2.6",
+}
+
+func isSyncImageModel(modelName string) bool {
+	for _, m := range syncModels {
+		if strings.Contains(modelName, m) {
+			return true
+		}
+	}
+	return false
 }

 func (a *Adaptor) ConvertGeminiRequest(*gin.Context, *relaycommon.RelayInfo, *dto.GeminiChatRequest) (any, error) {
@@ -45,10 +61,16 @@ func (a *Adaptor) GetRequestURL(info *relaycommon.RelayInfo) (string, error) {
 		case constant.RelayModeRerank:
 			fullRequestURL = fmt.Sprintf("%s/api/v1/services/rerank/text-rerank/text-rerank", info.ChannelBaseUrl)
 		case constant.RelayModeImagesGenerations:
-			fullRequestURL = fmt.Sprintf("%s/api/v1/services/aigc/text2image/image-synthesis", info.ChannelBaseUrl)
+			if isSyncImageModel(info.OriginModelName) {
+				fullRequestURL = fmt.Sprintf("%s/api/v1/services/aigc/multimodal-generation/generation", info.ChannelBaseUrl)
+			} else {
+				fullRequestURL = fmt.Sprintf("%s/api/v1/services/aigc/text2image/image-synthesis", info.ChannelBaseUrl)
+			}
 		case constant.RelayModeImagesEdits:
-			if isWanModel(info.OriginModelName) {
+			if isOldWanModel(info.OriginModelName) {
 				fullRequestURL = fmt.Sprintf("%s/api/v1/services/aigc/image2image/image-synthesis", info.ChannelBaseUrl)
+			} else if isWanModel(info.OriginModelName) {
+				fullRequestURL = fmt.Sprintf("%s/api/v1/services/aigc/image-generation/generation", info.ChannelBaseUrl)
 			} else {
 				fullRequestURL = fmt.Sprintf("%s/api/v1/services/aigc/multimodal-generation/generation", info.ChannelBaseUrl)
 			}
@@ -72,7 +94,11 @@ func (a *Adaptor) SetupRequestHeader(c *gin.Context, req *http.Header, info *rel
 		req.Set("X-DashScope-Plugin", c.GetString("plugin"))
 	}
 	if info.RelayMode == constant.RelayModeImagesGenerations {
-		req.Set("X-DashScope-Async", "enable")
+		if isSyncImageModel(info.OriginModelName) {
+
+		} else {
+			req.Set("X-DashScope-Async", "enable")
+		}
 	}
 	if info.RelayMode == constant.RelayModeImagesEdits {
 		if isWanModel(info.OriginModelName) {
@@ -108,15 +134,25 @@ func (a *Adaptor) ConvertOpenAIRequest(c *gin.Context, info *relaycommon.RelayIn

 func (a *Adaptor) ConvertImageRequest(c *gin.Context, info *relaycommon.RelayInfo, request dto.ImageRequest) (any, error) {
 	if info.RelayMode == constant.RelayModeImagesGenerations {
-		aliRequest, err := oaiImage2Ali(request)
+		if isSyncImageModel(info.OriginModelName) {
+			a.IsSyncImageModel = true
+		}
+		aliRequest, err := oaiImage2AliImageRequest(info, request, a.IsSyncImageModel)
 		if err != nil {
-			return nil, fmt.Errorf("convert image request failed: %w", err)
+			return nil, fmt.Errorf("convert image request to async ali image request failed: %w", err)
 		}
 		return aliRequest, nil
 	} else if info.RelayMode == constant.RelayModeImagesEdits {
-		if isWanModel(info.OriginModelName) {
+		if isOldWanModel(info.OriginModelName) {
 			return oaiFormEdit2WanxImageEdit(c, info, request)
 		}
+		if isSyncImageModel(info.OriginModelName) {
+			if isWanModel(info.OriginModelName) {
+				a.IsSyncImageModel = false
+			} else {
+				a.IsSyncImageModel = true
+			}
+		}
 		// ali image edit https://bailian.console.aliyun.com/?tab=api#/api/?type=model&url=2976416
 		// 如果用户使用表单，则需要解析表单数据
 		if strings.Contains(c.Request.Header.Get("Content-Type"), "multipart/form-data") {
@@ -126,9 +162,9 @@ func (a *Adaptor) ConvertImageRequest(c *gin.Context, info *relaycommon.RelayInf
 			}
 			return aliRequest, nil
 		} else {
-			aliRequest, err := oaiImage2Ali(request)
+			aliRequest, err := oaiImage2AliImageRequest(info, request, a.IsSyncImageModel)
 			if err != nil {
-				return nil, fmt.Errorf("convert image request failed: %w", err)
+				return nil, fmt.Errorf("convert image request to async ali image request failed: %w", err)
 			}
 			return aliRequest, nil
 		}
@@ -169,13 +205,9 @@ func (a *Adaptor) DoResponse(c *gin.Context, resp *http.Response, info *relaycom
 	default:
 		switch info.RelayMode {
 		case constant.RelayModeImagesGenerations:
-			err, usage = aliImageHandler(c, resp, info)
+			err, usage = aliImageHandler(a, c, resp, info)
 		case constant.RelayModeImagesEdits:
-			if isWanModel(info.OriginModelName) {
-				err, usage = aliImageHandler(c, resp, info)
-			} else {
-				err, usage = aliImageEditHandler(c, resp, info)
-			}
+			err, usage = aliImageHandler(a, c, resp, info)
 		case constant.RelayModeRerank:
 			err, usage = RerankHandler(c, resp, info)
 		default:
--- a/relay/channel/ali/dto.go
+++ b/relay/channel/ali/dto.go
@@ -1,6 +1,13 @@
 package ali

-import "github.com/QuantumNous/new-api/dto"
+import (
+	"strings"
+
+	"github.com/QuantumNous/new-api/dto"
+	"github.com/QuantumNous/new-api/logger"
+	"github.com/QuantumNous/new-api/service"
+	"github.com/gin-gonic/gin"
+)

 type AliMessage struct {
 	Content any    `json:"content"`
@@ -65,6 +72,7 @@ type AliUsage struct {
 	InputTokens  int `json:"input_tokens"`
 	OutputTokens int `json:"output_tokens"`
 	TotalTokens  int `json:"total_tokens"`
+	ImageCount   int `json:"image_count,omitempty"`
 }

 type TaskResult struct {
@@ -75,14 +83,78 @@ type TaskResult struct {
 }

 type AliOutput struct {
-	TaskId       string           `json:"task_id,omitempty"`
-	TaskStatus   string           `json:"task_status,omitempty"`
-	Text         string           `json:"text"`
-	FinishReason string           `json:"finish_reason"`
-	Message      string           `json:"message,omitempty"`
-	Code         string           `json:"code,omitempty"`
-	Results      []TaskResult     `json:"results,omitempty"`
-	Choices      []map[string]any `json:"choices,omitempty"`
+	TaskId       string       `json:"task_id,omitempty"`
+	TaskStatus   string       `json:"task_status,omitempty"`
+	Text         string       `json:"text"`
+	FinishReason string       `json:"finish_reason"`
+	Message      string       `json:"message,omitempty"`
+	Code         string       `json:"code,omitempty"`
+	Results      []TaskResult `json:"results,omitempty"`
+	Choices      []struct {
+		FinishReason string `json:"finish_reason,omitempty"`
+		Message      struct {
+			Role             string            `json:"role,omitempty"`
+			Content          []AliMediaContent `json:"content,omitempty"`
+			ReasoningContent string            `json:"reasoning_content,omitempty"`
+		} `json:"message,omitempty"`
+	} `json:"choices,omitempty"`
+}
+
+func (o *AliOutput) ChoicesToOpenAIImageDate(c *gin.Context, responseFormat string) []dto.ImageData {
+	var imageData []dto.ImageData
+	if len(o.Choices) > 0 {
+		for _, choice := range o.Choices {
+			var data dto.ImageData
+			for _, content := range choice.Message.Content {
+				if content.Image != "" {
+					if strings.HasPrefix(content.Image, "http") {
+						var b64Json string
+						if responseFormat == "b64_json" {
+							_, b64, err := service.GetImageFromUrl(content.Image)
+							if err != nil {
+								logger.LogError(c, "get_image_data_failed: "+err.Error())
+								continue
+							}
+							b64Json = b64
+						}
+						data.Url = content.Image
+						data.B64Json = b64Json
+					} else {
+						data.B64Json = content.Image
+					}
+				} else if content.Text != "" {
+					data.RevisedPrompt = content.Text
+				}
+			}
+			imageData = append(imageData, data)
+		}
+	}
+
+	return imageData
+}
+
+func (o *AliOutput) ResultToOpenAIImageDate(c *gin.Context, responseFormat string) []dto.ImageData {
+	var imageData []dto.ImageData
+	for _, data := range o.Results {
+		var b64Json string
+		if responseFormat == "b64_json" {
+			_, b64, err := service.GetImageFromUrl(data.Url)
+			if err != nil {
+				logger.LogError(c, "get_image_data_failed: "+err.Error())
+				continue
+			}
+			b64Json = b64
+		} else {
+			b64Json = data.B64Image
+		}
+
+		imageData = append(imageData, dto.ImageData{
+			Url:           data.Url,
+			B64Json:       b64Json,
+			RevisedPrompt: "",
+		})
+	}
+	return imageData
 }

 type AliResponse struct {
@@ -92,18 +164,26 @@ type AliResponse struct {
 }

 type AliImageRequest struct {
-	Model          string `json:"model"`
-	Input          any    `json:"input"`
-	Parameters     any    `json:"parameters,omitempty"`
-	ResponseFormat string `json:"response_format,omitempty"`
+	Model          string             `json:"model"`
+	Input          any                `json:"input"`
+	Parameters     AliImageParameters `json:"parameters,omitempty"`
+	ResponseFormat string             `json:"response_format,omitempty"`
 }

 type AliImageParameters struct {
-	Size      string `json:"size,omitempty"`
-	N         int    `json:"n,omitempty"`
-	Steps     string `json:"steps,omitempty"`
-	Scale     string `json:"scale,omitempty"`
-	Watermark *bool  `json:"watermark,omitempty"`
+	Size         string `json:"size,omitempty"`
+	N            int    `json:"n,omitempty"`
+	Steps        string `json:"steps,omitempty"`
+	Scale        string `json:"scale,omitempty"`
+	Watermark    *bool  `json:"watermark,omitempty"`
+	PromptExtend *bool  `json:"prompt_extend,omitempty"`
+}
+
+func (p *AliImageParameters) PromptExtendValue() bool {
+	if p != nil && p.PromptExtend != nil {
+		return *p.PromptExtend
+	}
+	return false
 }

 type AliImageInput struct {
--- a/relay/channel/ali/image.go
+++ b/relay/channel/ali/image.go
@@ -21,17 +21,25 @@ import (
 	"github.com/gin-gonic/gin"
 )

-func oaiImage2Ali(request dto.ImageRequest) (*AliImageRequest, error) {
+func oaiImage2AliImageRequest(info *relaycommon.RelayInfo, request dto.ImageRequest, isSync bool) (*AliImageRequest, error) {
 	var imageRequest AliImageRequest
 	imageRequest.Model = request.Model
 	imageRequest.ResponseFormat = request.ResponseFormat
 	logger.LogJson(context.Background(), "oaiImage2Ali request extra", request.Extra)
+	logger.LogDebug(context.Background(), "oaiImage2Ali request isSync: "+fmt.Sprintf("%v", isSync))
 	if request.Extra != nil {
 		if val, ok := request.Extra["parameters"]; ok {
 			err := common.Unmarshal(val, &imageRequest.Parameters)
 			if err != nil {
 				return nil, fmt.Errorf("invalid parameters field: %w", err)
 			}
+		} else {
+			// 兼容没有parameters字段的情况，从openai标准字段中提取参数
+			imageRequest.Parameters = AliImageParameters{
+				Size:      strings.Replace(request.Size, "x", "*", -1),
+				N:         int(request.N),
+				Watermark: request.Watermark,
+			}
 		}
 		if val, ok := request.Extra["input"]; ok {
 			err := common.Unmarshal(val, &imageRequest.Input)
@@ -41,23 +49,44 @@ func oaiImage2Ali(request dto.ImageRequest) (*AliImageRequest, error) {
 		}
 	}

-	if imageRequest.Parameters == nil {
-		imageRequest.Parameters = AliImageParameters{
-			Size:      strings.Replace(request.Size, "x", "*", -1),
-			N:         int(request.N),
-			Watermark: request.Watermark,
+	if strings.Contains(request.Model, "z-image") {
+		// z-image 开启prompt_extend后，按2倍计费
+		if imageRequest.Parameters.PromptExtendValue() {
+			info.PriceData.AddOtherRatio("prompt_extend", 2)
 		}
 	}

-	if imageRequest.Input == nil {
-		imageRequest.Input = AliImageInput{
-			Prompt: request.Prompt,
+	// 检查n参数
+	if imageRequest.Parameters.N != 0 {
+		info.PriceData.AddOtherRatio("n", float64(imageRequest.Parameters.N))
+	}
+
+	// 同步图片模型和异步图片模型请求格式不一样
+	if isSync {
+		if imageRequest.Input == nil {
+			imageRequest.Input = AliImageInput{
+				Messages: []AliMessage{
+					{
+						Role: "user",
+						Content: []AliMediaContent{
+							{
+								Text: request.Prompt,
+							},
+						},
+					},
+				},
+			}
+		}
+	} else {
+		if imageRequest.Input == nil {
+			imageRequest.Input = AliImageInput{
+				Prompt: request.Prompt,
+			}
 		}
 	}

 	return &imageRequest, nil
 }
-
 func getImageBase64sFromForm(c *gin.Context, fieldName string) ([]string, error) {
 	mf := c.Request.MultipartForm
 	if mf == nil {
@@ -199,6 +228,8 @@ func asyncTaskWait(c *gin.Context, info *relaycommon.RelayInfo, taskID string) (
 	var taskResponse AliResponse
 	var responseBody []byte

+	time.Sleep(time.Duration(5) * time.Second)
+
 	for {
 		logger.LogDebug(c, fmt.Sprintf("asyncTaskWait step %d/%d, wait %d seconds", step, maxStep, waitSeconds))
 		step++
@@ -238,32 +269,17 @@ func responseAli2OpenAIImage(c *gin.Context, response *AliResponse, originBody [
 		Created: info.StartTime.Unix(),
 	}

-	for _, data := range response.Output.Results {
-		var b64Json string
-		if responseFormat == "b64_json" {
-			_, b64, err := service.GetImageFromUrl(data.Url)
-			if err != nil {
-				logger.LogError(c, "get_image_data_failed: "+err.Error())
-				continue
-			}
-			b64Json = b64
-		} else {
-			b64Json = data.B64Image
-		}
-
-		imageResponse.Data = append(imageResponse.Data, dto.ImageData{
-			Url:           data.Url,
-			B64Json:       b64Json,
-			RevisedPrompt: "",
-		})
+	if len(response.Output.Results) > 0 {
+		imageResponse.Data = response.Output.ResultToOpenAIImageDate(c, responseFormat)
+	} else if len(response.Output.Choices) > 0 {
+		imageResponse.Data = response.Output.ChoicesToOpenAIImageDate(c, responseFormat)
 	}
-	var mapResponse map[string]any
-	_ = common.Unmarshal(originBody, &mapResponse)
-	imageResponse.Extra = mapResponse
+
+	imageResponse.Metadata = originBody
 	return &imageResponse
 }

-func aliImageHandler(c *gin.Context, resp *http.Response, info *relaycommon.RelayInfo) (*types.NewAPIError, *dto.Usage) {
+func aliImageHandler(a *Adaptor, c *gin.Context, resp *http.Response, info *relaycommon.RelayInfo) (*types.NewAPIError, *dto.Usage) {
 	responseFormat := c.GetString("response_format")

 	var aliTaskResponse AliResponse
@@ -282,66 +298,49 @@ func aliImageHandler(c *gin.Context, resp *http.Response, info *relaycommon.Rela
 		return types.NewError(errors.New(aliTaskResponse.Message), types.ErrorCodeBadResponse), nil
 	}

-	aliResponse, originRespBody, err := asyncTaskWait(c, info, aliTaskResponse.Output.TaskId)
-	if err != nil {
-		return types.NewError(err, types.ErrorCodeBadResponse), nil
-	}
+	var (
+		aliResponse    *AliResponse
+		originRespBody []byte
+	)

-	if aliResponse.Output.TaskStatus != "SUCCEEDED" {
-		return types.WithOpenAIError(types.OpenAIError{
-			Message: aliResponse.Output.Message,
-			Type:    "ali_error",
-			Param:   "",
-			Code:    aliResponse.Output.Code,
-		}, resp.StatusCode), nil
-	}
-
-	fullTextResponse := responseAli2OpenAIImage(c, aliResponse, originRespBody, info, responseFormat)
-	jsonResponse, err := common.Marshal(fullTextResponse)
-	if err != nil {
-		return types.NewError(err, types.ErrorCodeBadResponseBody), nil
-	}
-	service.IOCopyBytesGracefully(c, resp, jsonResponse)
-	return nil, &dto.Usage{}
-}
-
-func aliImageEditHandler(c *gin.Context, resp *http.Response, info *relaycommon.RelayInfo) (*types.NewAPIError, *dto.Usage) {
-	var aliResponse AliResponse
-	responseBody, err := io.ReadAll(resp.Body)
-	if err != nil {
-		return types.NewOpenAIError(err, types.ErrorCodeReadResponseBodyFailed, http.StatusInternalServerError), nil
-	}
-
-	service.CloseResponseBodyGracefully(resp)
-	err = common.Unmarshal(responseBody, &aliResponse)
-	if err != nil {
-		return types.NewOpenAIError(err, types.ErrorCodeBadResponseBody, http.StatusInternalServerError), nil
-	}
-
-	if aliResponse.Message != "" {
-		logger.LogError(c, "ali_task_failed: "+aliResponse.Message)
-		return types.NewError(errors.New(aliResponse.Message), types.ErrorCodeBadResponse), nil
-	}
-	var fullTextResponse dto.ImageResponse
-	if len(aliResponse.Output.Choices) > 0 {
-		fullTextResponse = dto.ImageResponse{
-			Created: info.StartTime.Unix(),
-			Data: []dto.ImageData{
-				{
-					Url:     aliResponse.Output.Choices[0]["message"].(map[string]any)["content"].([]any)[0].(map[string]any)["image"].(string),
-					B64Json: "",
-				},
-			},
+	if a.IsSyncImageModel {
+		aliResponse = &aliTaskResponse
+		originRespBody = responseBody
+	} else {
+		// 异步图片模型需要轮询任务结果
+		aliResponse, originRespBody, err = asyncTaskWait(c, info, aliTaskResponse.Output.TaskId)
+		if err != nil {
+			return types.NewError(err, types.ErrorCodeBadResponse), nil
+		}
+		if aliResponse.Output.TaskStatus != "SUCCEEDED" {
+			return types.WithOpenAIError(types.OpenAIError{
+				Message: aliResponse.Output.Message,
+				Type:    "ali_error",
+				Param:   "",
+				Code:    aliResponse.Output.Code,
+			}, resp.StatusCode), nil
 		}
 	}

-	var mapResponse map[string]any
-	_ = common.Unmarshal(responseBody, &mapResponse)
-	fullTextResponse.Extra = mapResponse
-	jsonResponse, err := common.Marshal(fullTextResponse)
+	//logger.LogDebug(c, "ali_async_task_result: "+string(originRespBody))
+	if a.IsSyncImageModel {
+		logger.LogDebug(c, "ali_sync_image_result: "+string(originRespBody))
+	} else {
+		logger.LogDebug(c, "ali_async_image_result: "+string(originRespBody))
+	}
+
+	imageResponses := responseAli2OpenAIImage(c, aliResponse, originRespBody, info, responseFormat)
+	// 可能生成多张图片，修正计费数量n
+	if aliResponse.Usage.ImageCount != 0 {
+		info.PriceData.AddOtherRatio("n", float64(aliResponse.Usage.ImageCount))
+	} else if len(imageResponses.Data) != 0 {
+		info.PriceData.AddOtherRatio("n", float64(len(imageResponses.Data)))
+	}
+	jsonResponse, err := common.Marshal(imageResponses)
 	if err != nil {
 		return types.NewError(err, types.ErrorCodeBadResponseBody), nil
 	}
 	service.IOCopyBytesGracefully(c, resp, jsonResponse)
+
 	return nil, &dto.Usage{}
 }
--- a/relay/channel/ali/image_wan.go
+++ b/relay/channel/ali/image_wan.go
@@ -26,14 +26,22 @@ func oaiFormEdit2WanxImageEdit(c *gin.Context, info *relaycommon.RelayInfo, requ
 	if wanInput.Images, err = getImageBase64sFromForm(c, "image"); err != nil {
 		return nil, fmt.Errorf("get image base64s from form failed: %w", err)
 	}
-	wanParams := WanImageParameters{
+	//wanParams := WanImageParameters{
+	//	N: int(request.N),
+	//}
+	imageRequest.Input = wanInput
+	imageRequest.Parameters = AliImageParameters{
 		N: int(request.N),
 	}
-	imageRequest.Input = wanInput
-	imageRequest.Parameters = wanParams
+	info.PriceData.AddOtherRatio("n", float64(imageRequest.Parameters.N))
+
 	return &imageRequest, nil
 }

+func isOldWanModel(modelName string) bool {
+	return strings.Contains(modelName, "wan") && !strings.Contains(modelName, "wan2.6")
+}
+
 func isWanModel(modelName string) bool {
 	return strings.Contains(modelName, "wan")
 }
--- a/relay/channel/aws/constants.go
+++ b/relay/channel/aws/constants.go
@@ -18,7 +18,7 @@ var awsModelIDMap = map[string]string{
 	"claude-opus-4-1-20250805":   "anthropic.claude-opus-4-1-20250805-v1:0",
 	"claude-sonnet-4-5-20250929": "anthropic.claude-sonnet-4-5-20250929-v1:0",
 	"claude-haiku-4-5-20251001":  "anthropic.claude-haiku-4-5-20251001-v1:0",
-	"claude-opus-4-5-20251101":  "anthropic.claude-opus-4-5-20251101-v1:0",
+	"claude-opus-4-5-20251101":   "anthropic.claude-opus-4-5-20251101-v1:0",
 	// Nova models
 	"nova-micro-v1:0":   "amazon.nova-micro-v1:0",
 	"nova-lite-v1:0":    "amazon.nova-lite-v1:0",
--- a/relay/channel/baidu/relay-baidu.go
+++ b/relay/channel/baidu/relay-baidu.go
@@ -150,7 +150,7 @@ func baiduHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *http.Respon
 		return types.NewError(err, types.ErrorCodeBadResponseBody), nil
 	}
 	if baiduResponse.ErrorMsg != "" {
-		return types.NewError(fmt.Errorf(baiduResponse.ErrorMsg), types.ErrorCodeBadResponseBody), nil
+		return types.NewError(fmt.Errorf("%s", baiduResponse.ErrorMsg), types.ErrorCodeBadResponseBody), nil
 	}
 	fullTextResponse := responseBaidu2OpenAI(&baiduResponse)
 	jsonResponse, err := json.Marshal(fullTextResponse)
@@ -175,7 +175,7 @@ func baiduEmbeddingHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *ht
 		return types.NewError(err, types.ErrorCodeBadResponseBody), nil
 	}
 	if baiduResponse.ErrorMsg != "" {
-		return types.NewError(fmt.Errorf(baiduResponse.ErrorMsg), types.ErrorCodeBadResponseBody), nil
+		return types.NewError(fmt.Errorf("%s", baiduResponse.ErrorMsg), types.ErrorCodeBadResponseBody), nil
 	}
 	fullTextResponse := embeddingResponseBaidu2OpenAI(&baiduResponse)
 	jsonResponse, err := json.Marshal(fullTextResponse)
--- a/relay/channel/claude/relay-claude.go
+++ b/relay/channel/claude/relay-claude.go
@@ -483,9 +483,11 @@ func StreamResponseClaude2OpenAI(reqMode int, claudeResponse *dto.ClaudeResponse
 				}
 			}
 		} else if claudeResponse.Type == "message_delta" {
-			finishReason := stopReasonClaude2OpenAI(*claudeResponse.Delta.StopReason)
-			if finishReason != "null" {
-				choice.FinishReason = &finishReason
+			if claudeResponse.Delta != nil && claudeResponse.Delta.StopReason != nil {
+				finishReason := stopReasonClaude2OpenAI(*claudeResponse.Delta.StopReason)
+				if finishReason != "null" {
+					choice.FinishReason = &finishReason
+				}
 			}
 			//claudeUsage = &claudeResponse.Usage
 		} else if claudeResponse.Type == "message_stop" {
--- a/relay/channel/coze/relay-coze.go
+++ b/relay/channel/coze/relay-coze.go
@@ -208,7 +208,7 @@ func handleCozeEvent(c *gin.Context, event string, data string, responseText *st
 			return
 		}

-		common.SysLog(fmt.Sprintf("stream event error: ", errorData.Code, errorData.Message))
+		common.SysLog(fmt.Sprintf("stream event error: %v %v", errorData.Code, errorData.Message))
 	}
 }

--- a/relay/channel/gemini/adaptor.go
+++ b/relay/channel/gemini/adaptor.go
@@ -13,6 +13,7 @@ import (
 	relaycommon "github.com/QuantumNous/new-api/relay/common"
 	"github.com/QuantumNous/new-api/relay/constant"
 	"github.com/QuantumNous/new-api/setting/model_setting"
+	"github.com/QuantumNous/new-api/setting/reasoning"
 	"github.com/QuantumNous/new-api/types"

 	"github.com/gin-gonic/gin"
@@ -137,7 +138,7 @@ func (a *Adaptor) GetRequestURL(info *relaycommon.RelayInfo) (string, error) {
 			info.UpstreamModelName = strings.TrimSuffix(info.UpstreamModelName, "-thinking")
 		} else if strings.HasSuffix(info.UpstreamModelName, "-nothinking") {
 			info.UpstreamModelName = strings.TrimSuffix(info.UpstreamModelName, "-nothinking")
-		} else if baseModel, level := parseThinkingLevelSuffix(info.UpstreamModelName); level != "" {
+		} else if baseModel, level, ok := reasoning.TrimEffortSuffix(info.UpstreamModelName); ok && level != "" {
 			info.UpstreamModelName = baseModel
 		}
 	}
--- a/relay/channel/gemini/relay-gemini-native.go
+++ b/relay/channel/gemini/relay-gemini-native.go
@@ -94,10 +94,10 @@ func GeminiTextGenerationStreamHandler(c *gin.Context, info *relaycommon.RelayIn
 	helper.SetEventStreamHeaders(c)

 	return geminiStreamHandler(c, info, resp, func(data string, geminiResponse *dto.GeminiChatResponse) bool {
-		// 直接发送 GeminiChatResponse 响应
 		err := helper.StringData(c, data)
 		if err != nil {
-			logger.LogError(c, err.Error())
+			logger.LogError(c, "failed to write stream data: "+err.Error())
+			return false
 		}
 		info.SendResponseCount++
 		return true
--- a/relay/channel/gemini/relay-gemini.go
+++ b/relay/channel/gemini/relay-gemini.go
@@ -98,6 +98,7 @@ func clampThinkingBudget(modelName string, budget int) int {
 // "effort": "high" - Allocates a large portion of tokens for reasoning (approximately 80% of max_tokens)
 // "effort": "medium" - Allocates a moderate portion of tokens (approximately 50% of max_tokens)
 // "effort": "low" - Allocates a smaller portion of tokens (approximately 20% of max_tokens)
+// "effort": "minimal" - Allocates a minimal portion of tokens (approximately 5% of max_tokens)
 func clampThinkingBudgetByEffort(modelName string, effort string) int {
 	isNew25Pro := isNew25ProModel(modelName)
 	is25FlashLite := is25FlashLiteModel(modelName)
@@ -118,18 +119,12 @@ func clampThinkingBudgetByEffort(modelName string, effort string) int {
 		maxBudget = maxBudget * 50 / 100
 	case "low":
 		maxBudget = maxBudget * 20 / 100
+	case "minimal":
+		maxBudget = maxBudget * 5 / 100
 	}
 	return clampThinkingBudget(modelName, maxBudget)
 }

-func parseThinkingLevelSuffix(modelName string) (string, string) {
-	base, level, ok := reasoning.TrimEffortSuffix(modelName)
-	if !ok {
-		return modelName, ""
-	}
-	return base, level
-}
-
 func ThinkingAdaptor(geminiRequest *dto.GeminiChatRequest, info *relaycommon.RelayInfo, oaiRequest ...dto.GeneralOpenAIRequest) {
 	if model_setting.GetGeminiSettings().ThinkingAdapterEnabled {
 		modelName := info.UpstreamModelName
@@ -186,7 +181,7 @@ func ThinkingAdaptor(geminiRequest *dto.GeminiChatRequest, info *relaycommon.Rel
 					ThinkingBudget: common.GetPointer(0),
 				}
 			}
-		} else if _, level := parseThinkingLevelSuffix(modelName); level != "" {
+		} else if _, level, ok := reasoning.TrimEffortSuffix(info.UpstreamModelName); ok && level != "" {
 			geminiRequest.GenerationConfig.ThinkingConfig = &dto.GeminiThinkingConfig{
 				IncludeThoughts: true,
 				ThinkingLevel:   level,
@@ -379,7 +374,7 @@ func CovertOpenAI2Gemini(c *gin.Context, textRequest dto.GeneralOpenAIRequest, i
 	var system_content []string
 	//shouldAddDummyModelMessage := false
 	for _, message := range textRequest.Messages {
-		if message.Role == "system" {
+		if message.Role == "system" || message.Role == "developer" {
 			system_content = append(system_content, message.StringContent())
 			continue
 		} else if message.Role == "tool" || message.Role == "function" {
--- a/relay/channel/ollama/dto.go
+++ b/relay/channel/ollama/dto.go
@@ -67,3 +67,40 @@ type OllamaEmbeddingResponse struct {
 	Embeddings      [][]float64 `json:"embeddings"`
 	PromptEvalCount int         `json:"prompt_eval_count,omitempty"`
 }
+
+type OllamaTagsResponse struct {
+	Models []OllamaModel `json:"models"`
+}
+
+type OllamaModel struct {
+	Name       string            `json:"name"`
+	Size       int64             `json:"size"`
+	Digest     string            `json:"digest,omitempty"`
+	ModifiedAt string            `json:"modified_at"`
+	Details    OllamaModelDetail `json:"details,omitempty"`
+}
+
+type OllamaModelDetail struct {
+	ParentModel       string   `json:"parent_model,omitempty"`
+	Format            string   `json:"format,omitempty"`
+	Family            string   `json:"family,omitempty"`
+	Families          []string `json:"families,omitempty"`
+	ParameterSize     string   `json:"parameter_size,omitempty"`
+	QuantizationLevel string   `json:"quantization_level,omitempty"`
+}
+
+type OllamaPullRequest struct {
+	Name   string `json:"name"`
+	Stream bool   `json:"stream,omitempty"`
+}
+
+type OllamaPullResponse struct {
+	Status    string `json:"status"`
+	Digest    string `json:"digest,omitempty"`
+	Total     int64  `json:"total,omitempty"`
+	Completed int64  `json:"completed,omitempty"`
+}
+
+type OllamaDeleteRequest struct {
+	Name string `json:"name"`
+}
--- a/relay/channel/ollama/relay-ollama.go
+++ b/relay/channel/ollama/relay-ollama.go
@@ -1,11 +1,13 @@
 package ollama

 import (
+	"bufio"
 	"encoding/json"
 	"fmt"
 	"io"
 	"net/http"
 	"strings"
+	"time"

 	"github.com/QuantumNous/new-api/common"
 	"github.com/QuantumNous/new-api/dto"
@@ -283,3 +285,246 @@ func ollamaEmbeddingHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *h
 	service.IOCopyBytesGracefully(c, resp, out)
 	return usage, nil
 }
+
+func FetchOllamaModels(baseURL, apiKey string) ([]OllamaModel, error) {
+	url := fmt.Sprintf("%s/api/tags", baseURL)
+
+	client := &http.Client{}
+	request, err := http.NewRequest("GET", url, nil)
+	if err != nil {
+		return nil, fmt.Errorf("创建请求失败: %v", err)
+	}
+
+	// Ollama 通常不需要 Bearer token，但为了兼容性保留
+	if apiKey != "" {
+		request.Header.Set("Authorization", "Bearer "+apiKey)
+	}
+
+	response, err := client.Do(request)
+	if err != nil {
+		return nil, fmt.Errorf("请求失败: %v", err)
+	}
+	defer response.Body.Close()
+
+	if response.StatusCode != http.StatusOK {
+		body, _ := io.ReadAll(response.Body)
+		return nil, fmt.Errorf("服务器返回错误 %d: %s", response.StatusCode, string(body))
+	}
+
+	var tagsResponse OllamaTagsResponse
+	body, err := io.ReadAll(response.Body)
+	if err != nil {
+		return nil, fmt.Errorf("读取响应失败: %v", err)
+	}
+
+	err = common.Unmarshal(body, &tagsResponse)
+	if err != nil {
+		return nil, fmt.Errorf("解析响应失败: %v", err)
+	}
+
+	return tagsResponse.Models, nil
+}
+
+// 拉取 Ollama 模型 (非流式)
+func PullOllamaModel(baseURL, apiKey, modelName string) error {
+	url := fmt.Sprintf("%s/api/pull", baseURL)
+
+	pullRequest := OllamaPullRequest{
+		Name:   modelName,
+		Stream: false, // 非流式，简化处理
+	}
+
+	requestBody, err := common.Marshal(pullRequest)
+	if err != nil {
+		return fmt.Errorf("序列化请求失败: %v", err)
+	}
+
+	client := &http.Client{
+		Timeout: 30 * 60 * 1000 * time.Millisecond, // 30分钟超时，支持大模型
+	}
+	request, err := http.NewRequest("POST", url, strings.NewReader(string(requestBody)))
+	if err != nil {
+		return fmt.Errorf("创建请求失败: %v", err)
+	}
+
+	request.Header.Set("Content-Type", "application/json")
+	if apiKey != "" {
+		request.Header.Set("Authorization", "Bearer "+apiKey)
+	}
+
+	response, err := client.Do(request)
+	if err != nil {
+		return fmt.Errorf("请求失败: %v", err)
+	}
+	defer response.Body.Close()
+
+	if response.StatusCode != http.StatusOK {
+		body, _ := io.ReadAll(response.Body)
+		return fmt.Errorf("拉取模型失败 %d: %s", response.StatusCode, string(body))
+	}
+
+	return nil
+}
+
+// 流式拉取 Ollama 模型 (支持进度回调)
+func PullOllamaModelStream(baseURL, apiKey, modelName string, progressCallback func(OllamaPullResponse)) error {
+	url := fmt.Sprintf("%s/api/pull", baseURL)
+
+	pullRequest := OllamaPullRequest{
+		Name:   modelName,
+		Stream: true, // 启用流式
+	}
+
+	requestBody, err := common.Marshal(pullRequest)
+	if err != nil {
+		return fmt.Errorf("序列化请求失败: %v", err)
+	}
+
+	client := &http.Client{
+		Timeout: 60 * 60 * 1000 * time.Millisecond, // 1小时超时，支持超大模型
+	}
+	request, err := http.NewRequest("POST", url, strings.NewReader(string(requestBody)))
+	if err != nil {
+		return fmt.Errorf("创建请求失败: %v", err)
+	}
+
+	request.Header.Set("Content-Type", "application/json")
+	if apiKey != "" {
+		request.Header.Set("Authorization", "Bearer "+apiKey)
+	}
+
+	response, err := client.Do(request)
+	if err != nil {
+		return fmt.Errorf("请求失败: %v", err)
+	}
+	defer response.Body.Close()
+
+	if response.StatusCode != http.StatusOK {
+		body, _ := io.ReadAll(response.Body)
+		return fmt.Errorf("拉取模型失败 %d: %s", response.StatusCode, string(body))
+	}
+
+	// 读取流式响应
+	scanner := bufio.NewScanner(response.Body)
+	successful := false
+	for scanner.Scan() {
+		line := scanner.Text()
+		if strings.TrimSpace(line) == "" {
+			continue
+		}
+
+		var pullResponse OllamaPullResponse
+		if err := common.Unmarshal([]byte(line), &pullResponse); err != nil {
+			continue // 忽略解析失败的行
+		}
+
+		if progressCallback != nil {
+			progressCallback(pullResponse)
+		}
+
+		// 检查是否出现错误或完成
+		if strings.EqualFold(pullResponse.Status, "error") {
+			return fmt.Errorf("拉取模型失败: %s", strings.TrimSpace(line))
+		}
+		if strings.EqualFold(pullResponse.Status, "success") {
+			successful = true
+			break
+		}
+	}
+
+	if err := scanner.Err(); err != nil {
+		return fmt.Errorf("读取流式响应失败: %v", err)
+	}
+
+	if !successful {
+		return fmt.Errorf("拉取模型未完成: 未收到成功状态")
+	}
+
+	return nil
+}
+
+// 删除 Ollama 模型
+func DeleteOllamaModel(baseURL, apiKey, modelName string) error {
+	url := fmt.Sprintf("%s/api/delete", baseURL)
+
+	deleteRequest := OllamaDeleteRequest{
+		Name: modelName,
+	}
+
+	requestBody, err := common.Marshal(deleteRequest)
+	if err != nil {
+		return fmt.Errorf("序列化请求失败: %v", err)
+	}
+
+	client := &http.Client{}
+	request, err := http.NewRequest("DELETE", url, strings.NewReader(string(requestBody)))
+	if err != nil {
+		return fmt.Errorf("创建请求失败: %v", err)
+	}
+
+	request.Header.Set("Content-Type", "application/json")
+	if apiKey != "" {
+		request.Header.Set("Authorization", "Bearer "+apiKey)
+	}
+
+	response, err := client.Do(request)
+	if err != nil {
+		return fmt.Errorf("请求失败: %v", err)
+	}
+	defer response.Body.Close()
+
+	if response.StatusCode != http.StatusOK {
+		body, _ := io.ReadAll(response.Body)
+		return fmt.Errorf("删除模型失败 %d: %s", response.StatusCode, string(body))
+	}
+
+	return nil
+}
+
+func FetchOllamaVersion(baseURL, apiKey string) (string, error) {
+	trimmedBase := strings.TrimRight(baseURL, "/")
+	if trimmedBase == "" {
+		return "", fmt.Errorf("baseURL 为空")
+	}
+
+	url := fmt.Sprintf("%s/api/version", trimmedBase)
+
+	client := &http.Client{Timeout: 10 * time.Second}
+	request, err := http.NewRequest("GET", url, nil)
+	if err != nil {
+		return "", fmt.Errorf("创建请求失败: %v", err)
+	}
+
+	if apiKey != "" {
+		request.Header.Set("Authorization", "Bearer "+apiKey)
+	}
+
+	response, err := client.Do(request)
+	if err != nil {
+		return "", fmt.Errorf("请求失败: %v", err)
+	}
+	defer response.Body.Close()
+
+	body, err := io.ReadAll(response.Body)
+	if err != nil {
+		return "", fmt.Errorf("读取响应失败: %v", err)
+	}
+
+	if response.StatusCode != http.StatusOK {
+		return "", fmt.Errorf("查询版本失败 %d: %s", response.StatusCode, string(body))
+	}
+
+	var versionResp struct {
+		Version string `json:"version"`
+	}
+
+	if err := json.Unmarshal(body, &versionResp); err != nil {
+		return "", fmt.Errorf("解析响应失败: %v", err)
+	}
+
+	if versionResp.Version == "" {
+		return "", fmt.Errorf("未返回版本信息")
+	}
+
+	return versionResp.Version, nil
+}
--- a/relay/channel/openai/audio.go
+++ b/relay/channel/openai/audio.go
@@ -0,0 +1,145 @@
+package openai
+
+import (
+	"bytes"
+	"fmt"
+	"io"
+	"math"
+	"net/http"
+
+	"github.com/QuantumNous/new-api/common"
+	"github.com/QuantumNous/new-api/constant"
+	"github.com/QuantumNous/new-api/dto"
+	"github.com/QuantumNous/new-api/logger"
+	relaycommon "github.com/QuantumNous/new-api/relay/common"
+	"github.com/QuantumNous/new-api/relay/helper"
+	"github.com/QuantumNous/new-api/service"
+	"github.com/QuantumNous/new-api/types"
+	"github.com/gin-gonic/gin"
+)
+
+func OpenaiTTSHandler(c *gin.Context, resp *http.Response, info *relaycommon.RelayInfo) *dto.Usage {
+	// the status code has been judged before, if there is a body reading failure,
+	// it should be regarded as a non-recoverable error, so it should not return err for external retry.
+	// Analogous to nginx's load balancing, it will only retry if it can't be requested or
+	// if the upstream returns a specific status code, once the upstream has already written the header,
+	// the subsequent failure of the response body should be regarded as a non-recoverable error,
+	// and can be terminated directly.
+	defer service.CloseResponseBodyGracefully(resp)
+	usage := &dto.Usage{}
+	usage.PromptTokens = info.GetEstimatePromptTokens()
+	usage.TotalTokens = info.GetEstimatePromptTokens()
+	for k, v := range resp.Header {
+		c.Writer.Header().Set(k, v[0])
+	}
+	c.Writer.WriteHeader(resp.StatusCode)
+
+	if info.IsStream {
+		helper.StreamScannerHandler(c, resp, info, func(data string) bool {
+			if service.SundaySearch(data, "usage") {
+				var simpleResponse dto.SimpleResponse
+				err := common.Unmarshal([]byte(data), &simpleResponse)
+				if err != nil {
+					logger.LogError(c, err.Error())
+				}
+				if simpleResponse.Usage.TotalTokens != 0 {
+					usage.PromptTokens = simpleResponse.Usage.InputTokens
+					usage.CompletionTokens = simpleResponse.OutputTokens
+					usage.TotalTokens = simpleResponse.TotalTokens
+				}
+			}
+			_ = helper.StringData(c, data)
+			return true
+		})
+	} else {
+		common.SetContextKey(c, constant.ContextKeyLocalCountTokens, true)
+		// 读取响应体到缓冲区
+		bodyBytes, err := io.ReadAll(resp.Body)
+		if err != nil {
+			logger.LogError(c, fmt.Sprintf("failed to read TTS response body: %v", err))
+			c.Writer.WriteHeaderNow()
+			return usage
+		}
+
+		// 写入响应到客户端
+		c.Writer.WriteHeaderNow()
+		_, err = c.Writer.Write(bodyBytes)
+		if err != nil {
+			logger.LogError(c, fmt.Sprintf("failed to write TTS response: %v", err))
+		}
+
+		// 计算音频时长并更新 usage
+		audioFormat := "mp3" // 默认格式
+		if audioReq, ok := info.Request.(*dto.AudioRequest); ok && audioReq.ResponseFormat != "" {
+			audioFormat = audioReq.ResponseFormat
+		}
+
+		var duration float64
+		var durationErr error
+
+		if audioFormat == "pcm" {
+			// PCM 格式没有文件头，根据 OpenAI TTS 的 PCM 参数计算时长
+			// 采样率: 24000 Hz, 位深度: 16-bit (2 bytes), 声道数: 1
+			const sampleRate = 24000
+			const bytesPerSample = 2
+			const channels = 1
+			duration = float64(len(bodyBytes)) / float64(sampleRate*bytesPerSample*channels)
+		} else {
+			ext := "." + audioFormat
+			reader := bytes.NewReader(bodyBytes)
+			duration, durationErr = common.GetAudioDuration(c.Request.Context(), reader, ext)
+		}
+
+		usage.PromptTokensDetails.TextTokens = usage.PromptTokens
+
+		if durationErr != nil {
+			logger.LogWarn(c, fmt.Sprintf("failed to get audio duration: %v", durationErr))
+			// 如果无法获取时长，则设置保底的 CompletionTokens，根据body大小计算
+			sizeInKB := float64(len(bodyBytes)) / 1000.0
+			estimatedTokens := int(math.Ceil(sizeInKB)) // 粗略估算每KB约等于1 token
+			usage.CompletionTokens = estimatedTokens
+			usage.CompletionTokenDetails.AudioTokens = estimatedTokens
+		} else if duration > 0 {
+			// 计算 token: ceil(duration) / 60.0 * 1000，即每分钟 1000 tokens
+			completionTokens := int(math.Round(math.Ceil(duration) / 60.0 * 1000))
+			usage.CompletionTokens = completionTokens
+			usage.CompletionTokenDetails.AudioTokens = completionTokens
+		}
+		usage.TotalTokens = usage.PromptTokens + usage.CompletionTokens
+	}
+
+	return usage
+}
+
+func OpenaiSTTHandler(c *gin.Context, resp *http.Response, info *relaycommon.RelayInfo, responseFormat string) (*types.NewAPIError, *dto.Usage) {
+	defer service.CloseResponseBodyGracefully(resp)
+
+	responseBody, err := io.ReadAll(resp.Body)
+	if err != nil {
+		return types.NewOpenAIError(err, types.ErrorCodeReadResponseBodyFailed, http.StatusInternalServerError), nil
+	}
+	// 写入新的 response body
+	service.IOCopyBytesGracefully(c, resp, responseBody)
+
+	var responseData struct {
+		Usage *dto.Usage `json:"usage"`
+	}
+	if err := common.Unmarshal(responseBody, &responseData); err == nil && responseData.Usage != nil {
+		if responseData.Usage.TotalTokens > 0 {
+			usage := responseData.Usage
+			if usage.PromptTokens == 0 {
+				usage.PromptTokens = usage.InputTokens
+			}
+			if usage.CompletionTokens == 0 {
+				usage.CompletionTokens = usage.OutputTokens
+			}
+			return nil, usage
+		}
+	}
+
+	usage := &dto.Usage{}
+	usage.PromptTokens = info.GetEstimatePromptTokens()
+	usage.CompletionTokens = 0
+	usage.TotalTokens = usage.PromptTokens + usage.CompletionTokens
+	return nil, usage
+}
--- a/relay/channel/openai/helper.go
+++ b/relay/channel/openai/helper.go
@@ -208,7 +208,6 @@ func HandleFinalResponse(c *gin.Context, info *relaycommon.RelayInfo, lastStream
 		helper.Done(c)

 	case types.RelayFormatClaude:
-		info.ClaudeConvertInfo.Done = true
 		var streamResponse dto.ChatCompletionsStreamResponse
 		if err := common.Unmarshal(common.StringToByteSlice(lastStreamData), &streamResponse); err != nil {
 			common.SysLog("error unmarshalling stream response: " + err.Error())
@@ -221,6 +220,7 @@ func HandleFinalResponse(c *gin.Context, info *relaycommon.RelayInfo, lastStream
 		for _, resp := range claudeResponses {
 			_ = helper.ClaudeData(c, *resp)
 		}
+		info.ClaudeConvertInfo.Done = true

 	case types.RelayFormatGemini:
 		var streamResponse dto.ChatCompletionsStreamResponse
--- a/relay/channel/openai/relay-openai.go
+++ b/relay/channel/openai/relay-openai.go
@@ -1,7 +1,6 @@
 package openai

 import (
-	"encoding/json"
 	"fmt"
 	"io"
 	"net/http"
@@ -151,7 +150,7 @@ func OaiStreamHandler(c *gin.Context, info *relaycommon.RelayInfo, resp *http.Re
 		var streamResp struct {
 			Usage *dto.Usage `json:"usage"`
 		}
-		err := json.Unmarshal([]byte(secondLastStreamData), &streamResp)
+		err := common.Unmarshal([]byte(secondLastStreamData), &streamResp)
 		if err == nil && streamResp.Usage != nil && service.ValidUsage(streamResp.Usage) {
 			usage = streamResp.Usage
 			containStreamUsage = true
@@ -327,68 +326,6 @@ func streamTTSResponse(c *gin.Context, resp *http.Response) {
 	}
 }

-func OpenaiTTSHandler(c *gin.Context, resp *http.Response, info *relaycommon.RelayInfo) *dto.Usage {
-	// the status code has been judged before, if there is a body reading failure,
-	// it should be regarded as a non-recoverable error, so it should not return err for external retry.
-	// Analogous to nginx's load balancing, it will only retry if it can't be requested or
-	// if the upstream returns a specific status code, once the upstream has already written the header,
-	// the subsequent failure of the response body should be regarded as a non-recoverable error,
-	// and can be terminated directly.
-	defer service.CloseResponseBodyGracefully(resp)
-	usage := &dto.Usage{}
-	usage.PromptTokens = info.GetEstimatePromptTokens()
-	usage.TotalTokens = info.GetEstimatePromptTokens()
-	for k, v := range resp.Header {
-		c.Writer.Header().Set(k, v[0])
-	}
-	c.Writer.WriteHeader(resp.StatusCode)
-
-	isStreaming := resp.ContentLength == -1 || resp.Header.Get("Content-Length") == ""
-	if isStreaming {
-		streamTTSResponse(c, resp)
-	} else {
-		c.Writer.WriteHeaderNow()
-		_, err := io.Copy(c.Writer, resp.Body)
-		if err != nil {
-			logger.LogError(c, err.Error())
-		}
-	}
-	return usage
-}
-
-func OpenaiSTTHandler(c *gin.Context, resp *http.Response, info *relaycommon.RelayInfo, responseFormat string) (*types.NewAPIError, *dto.Usage) {
-	defer service.CloseResponseBodyGracefully(resp)
-
-	responseBody, err := io.ReadAll(resp.Body)
-	if err != nil {
-		return types.NewOpenAIError(err, types.ErrorCodeReadResponseBodyFailed, http.StatusInternalServerError), nil
-	}
-	// 写入新的 response body
-	service.IOCopyBytesGracefully(c, resp, responseBody)
-
-	var responseData struct {
-		Usage *dto.Usage `json:"usage"`
-	}
-	if err := json.Unmarshal(responseBody, &responseData); err == nil && responseData.Usage != nil {
-		if responseData.Usage.TotalTokens > 0 {
-			usage := responseData.Usage
-			if usage.PromptTokens == 0 {
-				usage.PromptTokens = usage.InputTokens
-			}
-			if usage.CompletionTokens == 0 {
-				usage.CompletionTokens = usage.OutputTokens
-			}
-			return nil, usage
-		}
-	}
-
-	usage := &dto.Usage{}
-	usage.PromptTokens = info.GetEstimatePromptTokens()
-	usage.CompletionTokens = 0
-	usage.TotalTokens = usage.PromptTokens + usage.CompletionTokens
-	return nil, usage
-}
-
 func OpenaiRealtimeHandler(c *gin.Context, info *relaycommon.RelayInfo) (*types.NewAPIError, *dto.RealtimeUsage) {
 	if info == nil || info.ClientWs == nil || info.TargetWs == nil {
 		return types.NewError(fmt.Errorf("invalid websocket connection"), types.ErrorCodeBadResponse), nil
@@ -659,7 +596,7 @@ func applyUsagePostProcessing(info *relaycommon.RelayInfo, usage *dto.Usage, res
 		if usage.PromptTokensDetails.CachedTokens == 0 && usage.PromptCacheHitTokens != 0 {
 			usage.PromptTokensDetails.CachedTokens = usage.PromptCacheHitTokens
 		}
-	case constant.ChannelTypeZhipu_v4:
+	case constant.ChannelTypeZhipu_v4, constant.ChannelTypeMoonshot:
 		if usage.PromptTokensDetails.CachedTokens == 0 {
 			if usage.InputTokensDetails != nil && usage.InputTokensDetails.CachedTokens > 0 {
 				usage.PromptTokensDetails.CachedTokens = usage.InputTokensDetails.CachedTokens
@@ -687,7 +624,7 @@ func extractCachedTokensFromBody(body []byte) (int, bool) {
 		} `json:"usage"`
 	}

-	if err := json.Unmarshal(body, &payload); err != nil {
+	if err := common.Unmarshal(body, &payload); err != nil {
 		return 0, false
 	}

--- a/relay/channel/task/ali/adaptor.go
+++ b/relay/channel/task/ali/adaptor.go
@@ -192,6 +192,10 @@ func sizeToResolution(size string) (string, error) {
 func ProcessAliOtherRatios(aliReq *AliVideoRequest) (map[string]float64, error) {
 	otherRatios := make(map[string]float64)
 	aliRatios := map[string]map[string]float64{
+		"wan2.6-i2v": {
+			"720P":  1,
+			"1080P": 1 / 0.6,
+		},
 		"wan2.5-t2v-preview": {
 			"480P":  1,
 			"720P":  2,
--- a/relay/channel/task/jimeng/adaptor.go
+++ b/relay/channel/task/jimeng/adaptor.go
@@ -196,7 +196,7 @@ func (a *TaskAdaptor) DoResponse(c *gin.Context, resp *http.Response, info *rela
 	}

 	if jResp.Code != 10000 {
-		taskErr = service.TaskErrorWrapper(fmt.Errorf(jResp.Message), fmt.Sprintf("%d", jResp.Code), http.StatusInternalServerError)
+		taskErr = service.TaskErrorWrapper(fmt.Errorf("%s", jResp.Message), fmt.Sprintf("%d", jResp.Code), http.StatusInternalServerError)
 		return
 	}

--- a/relay/channel/task/kling/adaptor.go
+++ b/relay/channel/task/kling/adaptor.go
@@ -186,7 +186,7 @@ func (a *TaskAdaptor) DoResponse(c *gin.Context, resp *http.Response, info *rela
 		return
 	}
 	if kResp.Code != 0 {
-		taskErr = service.TaskErrorWrapperLocal(fmt.Errorf(kResp.Message), "task_failed", http.StatusBadRequest)
+		taskErr = service.TaskErrorWrapperLocal(fmt.Errorf("%s", kResp.Message), "task_failed", http.StatusBadRequest)
 		return
 	}
 	ov := dto.NewOpenAIVideo()
--- a/relay/channel/task/suno/adaptor.go
+++ b/relay/channel/task/suno/adaptor.go
@@ -105,7 +105,7 @@ func (a *TaskAdaptor) DoResponse(c *gin.Context, resp *http.Response, info *rela
 		return
 	}
 	if !sunoResponse.IsSuccess() {
-		taskErr = service.TaskErrorWrapper(fmt.Errorf(sunoResponse.Message), sunoResponse.Code, http.StatusInternalServerError)
+		taskErr = service.TaskErrorWrapper(fmt.Errorf("%s", sunoResponse.Message), sunoResponse.Code, http.StatusInternalServerError)
 		return
 	}

--- a/relay/channel/vertex/adaptor.go
+++ b/relay/channel/vertex/adaptor.go
@@ -51,10 +51,43 @@ type Adaptor struct {
 }

 func (a *Adaptor) ConvertGeminiRequest(c *gin.Context, info *relaycommon.RelayInfo, request *dto.GeminiChatRequest) (any, error) {
+	// Vertex AI does not support functionResponse.id; keep it stripped here for consistency.
+	if model_setting.GetGeminiSettings().RemoveFunctionResponseIdEnabled {
+		removeFunctionResponseID(request)
+	}
 	geminiAdaptor := gemini.Adaptor{}
 	return geminiAdaptor.ConvertGeminiRequest(c, info, request)
 }

+func removeFunctionResponseID(request *dto.GeminiChatRequest) {
+	if request == nil {
+		return
+	}
+
+	if len(request.Contents) > 0 {
+		for i := range request.Contents {
+			if len(request.Contents[i].Parts) == 0 {
+				continue
+			}
+			for j := range request.Contents[i].Parts {
+				part := &request.Contents[i].Parts[j]
+				if part.FunctionResponse == nil {
+					continue
+				}
+				if len(part.FunctionResponse.ID) > 0 {
+					part.FunctionResponse.ID = nil
+				}
+			}
+		}
+	}
+
+	if len(request.Requests) > 0 {
+		for i := range request.Requests {
+			removeFunctionResponseID(&request.Requests[i])
+		}
+	}
+}
+
 func (a *Adaptor) ConvertClaudeRequest(c *gin.Context, info *relaycommon.RelayInfo, request *dto.ClaudeRequest) (any, error) {
 	if v, ok := claudeModelMap[info.UpstreamModelName]; ok {
 		c.Set("request_model", v)
--- a/relay/channel/zhipu_4v/dto.go
+++ b/relay/channel/zhipu_4v/dto.go
@@ -4,6 +4,7 @@ import (
 	"time"

 	"github.com/QuantumNous/new-api/dto"
+	"github.com/QuantumNous/new-api/types"
 )

 //	type ZhipuMessage struct {
@@ -37,7 +38,7 @@ type ZhipuV4Response struct {
 	Model               string                         `json:"model"`
 	TextResponseChoices []dto.OpenAITextResponseChoice `json:"choices"`
 	Usage               dto.Usage                      `json:"usage"`
-	Error               dto.OpenAIError                `json:"error"`
+	Error               types.OpenAIError              `json:"error"`
 }

 //
--- a/relay/common/relay_info.go
+++ b/relay/common/relay_info.go
@@ -11,6 +11,7 @@ import (
 	"github.com/QuantumNous/new-api/constant"
 	"github.com/QuantumNous/new-api/dto"
 	relayconstant "github.com/QuantumNous/new-api/relay/constant"
+	"github.com/QuantumNous/new-api/setting/model_setting"
 	"github.com/QuantumNous/new-api/types"

 	"github.com/gin-gonic/gin"
@@ -83,7 +84,7 @@ type RelayInfo struct {
 	TokenKey          string
 	TokenGroup        string
 	UserId            int
-	UsingGroup        string // 使用的分组
+	UsingGroup        string // 使用的分组，当auto跨分组重试时，会变动
 	UserGroup         string // 用户所在分组
 	TokenUnlimited    bool
 	StartTime         time.Time
@@ -374,6 +375,12 @@ func genBaseRelayInfo(c *gin.Context, request dto.Request) *RelayInfo {
 	//channelId := common.GetContextKeyInt(c, constant.ContextKeyChannelId)
 	//paramOverride := common.GetContextKeyStringMap(c, constant.ContextKeyChannelParamOverride)

+	tokenGroup := common.GetContextKeyString(c, constant.ContextKeyTokenGroup)
+	// 当令牌分组为空时，表示使用用户分组
+	if tokenGroup == "" {
+		tokenGroup = common.GetContextKeyString(c, constant.ContextKeyUserGroup)
+	}
+
 	startTime := common.GetContextKeyTime(c, constant.ContextKeyRequestStartTime)
 	if startTime.IsZero() {
 		startTime = time.Now()
@@ -401,7 +408,7 @@ func genBaseRelayInfo(c *gin.Context, request dto.Request) *RelayInfo {
 		TokenId:        common.GetContextKeyInt(c, constant.ContextKeyTokenId),
 		TokenKey:       common.GetContextKeyString(c, constant.ContextKeyTokenKey),
 		TokenUnlimited: common.GetContextKeyBool(c, constant.ContextKeyTokenUnlimited),
-		TokenGroup:     common.GetContextKeyString(c, constant.ContextKeyTokenGroup),
+		TokenGroup:     tokenGroup,

 		isFirstResponse: true,
 		RelayMode:       relayconstant.Path2RelayMode(c.Request.URL.Path),
@@ -628,3 +635,47 @@ func RemoveDisabledFields(jsonData []byte, channelOtherSettings dto.ChannelOther
 	}
 	return jsonDataAfter, nil
 }
+
+// RemoveGeminiDisabledFields removes disabled fields from Gemini request JSON data
+// Currently supports removing functionResponse.id field which Vertex AI does not support
+func RemoveGeminiDisabledFields(jsonData []byte) ([]byte, error) {
+	if !model_setting.GetGeminiSettings().RemoveFunctionResponseIdEnabled {
+		return jsonData, nil
+	}
+
+	var data map[string]interface{}
+	if err := common.Unmarshal(jsonData, &data); err != nil {
+		common.SysError("RemoveGeminiDisabledFields Unmarshal error: " + err.Error())
+		return jsonData, nil
+	}
+
+	// Process contents array
+	// Handle both camelCase (functionResponse) and snake_case (function_response)
+	if contents, ok := data["contents"].([]interface{}); ok {
+		for _, content := range contents {
+			if contentMap, ok := content.(map[string]interface{}); ok {
+				if parts, ok := contentMap["parts"].([]interface{}); ok {
+					for _, part := range parts {
+						if partMap, ok := part.(map[string]interface{}); ok {
+							// Check functionResponse (camelCase)
+							if funcResp, ok := partMap["functionResponse"].(map[string]interface{}); ok {
+								delete(funcResp, "id")
+							}
+							// Check function_response (snake_case)
+							if funcResp, ok := partMap["function_response"].(map[string]interface{}); ok {
+								delete(funcResp, "id")
+							}
+						}
+					}
+				}
+			}
+		}
+	}
+
+	jsonDataAfter, err := common.Marshal(data)
+	if err != nil {
+		common.SysError("RemoveGeminiDisabledFields Marshal error: " + err.Error())
+		return jsonData, nil
+	}
+	return jsonDataAfter, nil
+}
--- a/relay/compatible_handler.go
+++ b/relay/compatible_handler.go
@@ -181,22 +181,22 @@ func TextHelper(c *gin.Context, info *relaycommon.RelayInfo) (newAPIError *types
 		return newApiErr
 	}

-	if strings.HasPrefix(info.OriginModelName, "gpt-4o-audio") {
+	if usage.(*dto.Usage).CompletionTokenDetails.AudioTokens > 0 || usage.(*dto.Usage).PromptTokensDetails.AudioTokens > 0 {
 		service.PostAudioConsumeQuota(c, info, usage.(*dto.Usage), "")
 	} else {
-		postConsumeQuota(c, info, usage.(*dto.Usage), "")
+		postConsumeQuota(c, info, usage.(*dto.Usage))
 	}
 	return nil
 }

-func postConsumeQuota(ctx *gin.Context, relayInfo *relaycommon.RelayInfo, usage *dto.Usage, extraContent string) {
+func postConsumeQuota(ctx *gin.Context, relayInfo *relaycommon.RelayInfo, usage *dto.Usage, extraContent ...string) {
 	if usage == nil {
 		usage = &dto.Usage{
 			PromptTokens:     relayInfo.GetEstimatePromptTokens(),
 			CompletionTokens: 0,
 			TotalTokens:      relayInfo.GetEstimatePromptTokens(),
 		}
-		extraContent += "（可能是请求出错）"
+		extraContent = append(extraContent, "上游无计费信息")
 	}
 	useTimeSeconds := time.Now().Unix() - relayInfo.StartTime.Unix()
 	promptTokens := usage.PromptTokens
@@ -246,8 +246,8 @@ func postConsumeQuota(ctx *gin.Context, relayInfo *relaycommon.RelayInfo, usage
 			dWebSearchQuota = decimal.NewFromFloat(webSearchPrice).
 				Mul(decimal.NewFromInt(int64(webSearchTool.CallCount))).
 				Div(decimal.NewFromInt(1000)).Mul(dGroupRatio).Mul(dQuotaPerUnit)
-			extraContent += fmt.Sprintf("Web Search 调用 %d 次，上下文大小 %s，调用花费 %s",
-				webSearchTool.CallCount, webSearchTool.SearchContextSize, dWebSearchQuota.String())
+			extraContent = append(extraContent, fmt.Sprintf("Web Search 调用 %d 次，上下文大小 %s，调用花费 %s",
+				webSearchTool.CallCount, webSearchTool.SearchContextSize, dWebSearchQuota.String()))
 		}
 	} else if strings.HasSuffix(modelName, "search-preview") {
 		// search-preview 模型不支持 response api
@@ -258,8 +258,8 @@ func postConsumeQuota(ctx *gin.Context, relayInfo *relaycommon.RelayInfo, usage
 		webSearchPrice = operation_setting.GetWebSearchPricePerThousand(modelName, searchContextSize)
 		dWebSearchQuota = decimal.NewFromFloat(webSearchPrice).
 			Div(decimal.NewFromInt(1000)).Mul(dGroupRatio).Mul(dQuotaPerUnit)
-		extraContent += fmt.Sprintf("Web Search 调用 1 次，上下文大小 %s，调用花费 %s",
-			searchContextSize, dWebSearchQuota.String())
+		extraContent = append(extraContent, fmt.Sprintf("Web Search 调用 1 次，上下文大小 %s，调用花费 %s",
+			searchContextSize, dWebSearchQuota.String()))
 	}
 	// claude web search tool 计费
 	var dClaudeWebSearchQuota decimal.Decimal
@@ -269,8 +269,8 @@ func postConsumeQuota(ctx *gin.Context, relayInfo *relaycommon.RelayInfo, usage
 		claudeWebSearchPrice = operation_setting.GetClaudeWebSearchPricePerThousand()
 		dClaudeWebSearchQuota = decimal.NewFromFloat(claudeWebSearchPrice).
 			Div(decimal.NewFromInt(1000)).Mul(dGroupRatio).Mul(dQuotaPerUnit).Mul(decimal.NewFromInt(int64(claudeWebSearchCallCount)))
-		extraContent += fmt.Sprintf("Claude Web Search 调用 %d 次，调用花费 %s",
-			claudeWebSearchCallCount, dClaudeWebSearchQuota.String())
+		extraContent = append(extraContent, fmt.Sprintf("Claude Web Search 调用 %d 次，调用花费 %s",
+			claudeWebSearchCallCount, dClaudeWebSearchQuota.String()))
 	}
 	// file search tool 计费
 	var dFileSearchQuota decimal.Decimal
@@ -281,8 +281,8 @@ func postConsumeQuota(ctx *gin.Context, relayInfo *relaycommon.RelayInfo, usage
 			dFileSearchQuota = decimal.NewFromFloat(fileSearchPrice).
 				Mul(decimal.NewFromInt(int64(fileSearchTool.CallCount))).
 				Div(decimal.NewFromInt(1000)).Mul(dGroupRatio).Mul(dQuotaPerUnit)
-			extraContent += fmt.Sprintf("File Search 调用 %d 次，调用花费 %s",
-				fileSearchTool.CallCount, dFileSearchQuota.String())
+			extraContent = append(extraContent, fmt.Sprintf("File Search 调用 %d 次，调用花费 %s",
+				fileSearchTool.CallCount, dFileSearchQuota.String()))
 		}
 	}
 	var dImageGenerationCallQuota decimal.Decimal
@@ -290,7 +290,7 @@ func postConsumeQuota(ctx *gin.Context, relayInfo *relaycommon.RelayInfo, usage
 	if ctx.GetBool("image_generation_call") {
 		imageGenerationCallPrice = operation_setting.GetGPTImage1PriceOnceCall(ctx.GetString("image_generation_call_quality"), ctx.GetString("image_generation_call_size"))
 		dImageGenerationCallQuota = decimal.NewFromFloat(imageGenerationCallPrice).Mul(dGroupRatio).Mul(dQuotaPerUnit)
-		extraContent += fmt.Sprintf("Image Generation Call 花费 %s", dImageGenerationCallQuota.String())
+		extraContent = append(extraContent, fmt.Sprintf("Image Generation Call 花费 %s", dImageGenerationCallQuota.String()))
 	}

 	var quotaCalculateDecimal decimal.Decimal
@@ -300,14 +300,20 @@ func postConsumeQuota(ctx *gin.Context, relayInfo *relaycommon.RelayInfo, usage
 	if !relayInfo.PriceData.UsePrice {
 		baseTokens := dPromptTokens
 		// 减去 cached tokens
+		// Anthropic API 的 input_tokens 已经不包含缓存 tokens，不需要减去
+		// OpenAI/OpenRouter 等 API 的 prompt_tokens 包含缓存 tokens，需要减去
 		var cachedTokensWithRatio decimal.Decimal
 		if !dCacheTokens.IsZero() {
-			baseTokens = baseTokens.Sub(dCacheTokens)
+			if relayInfo.ChannelType != constant.ChannelTypeAnthropic {
+				baseTokens = baseTokens.Sub(dCacheTokens)
+			}
 			cachedTokensWithRatio = dCacheTokens.Mul(dCacheRatio)
 		}
 		var dCachedCreationTokensWithRatio decimal.Decimal
 		if !dCachedCreationTokens.IsZero() {
-			baseTokens = baseTokens.Sub(dCachedCreationTokens)
+			if relayInfo.ChannelType != constant.ChannelTypeAnthropic {
+				baseTokens = baseTokens.Sub(dCachedCreationTokens)
+			}
 			dCachedCreationTokensWithRatio = dCachedCreationTokens.Mul(dCachedCreationRatio)
 		}

@@ -325,7 +331,7 @@ func postConsumeQuota(ctx *gin.Context, relayInfo *relaycommon.RelayInfo, usage
 				// 重新计算 base tokens
 				baseTokens = baseTokens.Sub(dAudioTokens)
 				audioInputQuota = decimal.NewFromFloat(audioInputPrice).Div(decimal.NewFromInt(1000000)).Mul(dAudioTokens).Mul(dGroupRatio).Mul(dQuotaPerUnit)
-				extraContent += fmt.Sprintf("Audio Input 花费 %s", audioInputQuota.String())
+				extraContent = append(extraContent, fmt.Sprintf("Audio Input 花费 %s", audioInputQuota.String()))
 			}
 		}
 		promptQuota := baseTokens.Add(cachedTokensWithRatio).
@@ -350,17 +356,25 @@ func postConsumeQuota(ctx *gin.Context, relayInfo *relaycommon.RelayInfo, usage
 	// 添加 image generation call 计费
 	quotaCalculateDecimal = quotaCalculateDecimal.Add(dImageGenerationCallQuota)

+	if len(relayInfo.PriceData.OtherRatios) > 0 {
+		for key, otherRatio := range relayInfo.PriceData.OtherRatios {
+			dOtherRatio := decimal.NewFromFloat(otherRatio)
+			quotaCalculateDecimal = quotaCalculateDecimal.Mul(dOtherRatio)
+			extraContent = append(extraContent, fmt.Sprintf("其他倍率 %s: %f", key, otherRatio))
+		}
+	}
+
 	quota := int(quotaCalculateDecimal.Round(0).IntPart())
 	totalTokens := promptTokens + completionTokens

-	var logContent string
+	//var logContent string

 	// record all the consume log even if quota is 0
 	if totalTokens == 0 {
 		// in this case, must be some error happened
 		// we cannot just return, because we may have to return the pre-consumed quota
 		quota = 0
-		logContent += fmt.Sprintf("（可能是上游超时）")
+		extraContent = append(extraContent, "上游没有返回计费信息，无法扣费（可能是上游超时）")
 		logger.LogError(ctx, fmt.Sprintf("total tokens is 0, cannot consume quota, userId %d, channelId %d, "+
 			"tokenId %d, model %s， pre-consumed quota %d", relayInfo.UserId, relayInfo.ChannelId, relayInfo.TokenId, modelName, relayInfo.FinalPreConsumedQuota))
 	} else {
@@ -399,15 +413,13 @@ func postConsumeQuota(ctx *gin.Context, relayInfo *relaycommon.RelayInfo, usage
 	logModel := modelName
 	if strings.HasPrefix(logModel, "gpt-4-gizmo") {
 		logModel = "gpt-4-gizmo-*"
-		logContent += fmt.Sprintf("，模型 %s", modelName)
+		extraContent = append(extraContent, fmt.Sprintf("模型 %s", modelName))
 	}
 	if strings.HasPrefix(logModel, "gpt-4o-gizmo") {
 		logModel = "gpt-4o-gizmo-*"
-		logContent += fmt.Sprintf("，模型 %s", modelName)
-	}
-	if extraContent != "" {
-		logContent += ", " + extraContent
+		extraContent = append(extraContent, fmt.Sprintf("模型 %s", modelName))
 	}
+	logContent := strings.Join(extraContent, ", ")
 	other := service.GenerateTextOtherInfo(ctx, relayInfo, modelRatio, groupRatio, completionRatio, cacheTokens, cacheRatio, modelPrice, relayInfo.PriceData.GroupRatioInfo.GroupSpecialRatio)
 	if imageTokens != 0 {
 		other["image"] = true
--- a/relay/embedding_handler.go
+++ b/relay/embedding_handler.go
@@ -82,6 +82,6 @@ func EmbeddingHelper(c *gin.Context, info *relaycommon.RelayInfo) (newAPIError *
 		service.ResetStatusCode(newAPIError, statusCodeMappingStr)
 		return newAPIError
 	}
-	postConsumeQuota(c, info, usage.(*dto.Usage), "")
+	postConsumeQuota(c, info, usage.(*dto.Usage))
 	return nil
 }
--- a/relay/gemini_handler.go
+++ b/relay/gemini_handler.go
@@ -193,7 +193,7 @@ func GeminiHelper(c *gin.Context, info *relaycommon.RelayInfo) (newAPIError *typ
 		return openaiErr
 	}

-	postConsumeQuota(c, info, usage.(*dto.Usage), "")
+	postConsumeQuota(c, info, usage.(*dto.Usage))
 	return nil
 }

@@ -292,6 +292,6 @@ func GeminiEmbeddingHandler(c *gin.Context, info *relaycommon.RelayInfo) (newAPI
 		return openaiErr
 	}

-	postConsumeQuota(c, info, usage.(*dto.Usage), "")
+	postConsumeQuota(c, info, usage.(*dto.Usage))
 	return nil
 }
--- a/relay/helper/common.go
+++ b/relay/helper/common.go
@@ -14,15 +14,28 @@ import (
 	"github.com/gorilla/websocket"
 )

-func FlushWriter(c *gin.Context) error {
-	if c.Writer == nil {
+func FlushWriter(c *gin.Context) (err error) {
+	defer func() {
+		if r := recover(); r != nil {
+			err = fmt.Errorf("flush panic recovered: %v", r)
+		}
+	}()
+
+	if c == nil || c.Writer == nil {
 		return nil
 	}
-	if flusher, ok := c.Writer.(http.Flusher); ok {
-		flusher.Flush()
-		return nil
+
+	if c.Request != nil && c.Request.Context().Err() != nil {
+		return fmt.Errorf("request context done: %w", c.Request.Context().Err())
 	}
-	return errors.New("streaming error: flusher not found")
+
+	flusher, ok := c.Writer.(http.Flusher)
+	if !ok {
+		return errors.New("streaming error: flusher not found")
+	}
+
+	flusher.Flush()
+	return nil
 }

 func SetEventStreamHeaders(c *gin.Context) {
@@ -66,17 +79,31 @@ func ResponseChunkData(c *gin.Context, resp dto.ResponsesStreamResponse, data st
 }

 func StringData(c *gin.Context, str string) error {
-	//str = strings.TrimPrefix(str, "data: ")
-	//str = strings.TrimSuffix(str, "\r")
+	if c == nil || c.Writer == nil {
+		return errors.New("context or writer is nil")
+	}
+
+	if c.Request != nil && c.Request.Context().Err() != nil {
+		return fmt.Errorf("request context done: %w", c.Request.Context().Err())
+	}
+
 	c.Render(-1, common.CustomEvent{Data: "data: " + str})
-	_ = FlushWriter(c)
-	return nil
+	return FlushWriter(c)
 }

 func PingData(c *gin.Context) error {
-	c.Writer.Write([]byte(": PING\n\n"))
-	_ = FlushWriter(c)
-	return nil
+	if c == nil || c.Writer == nil {
+		return errors.New("context or writer is nil")
+	}
+
+	if c.Request != nil && c.Request.Context().Err() != nil {
+		return fmt.Errorf("request context done: %w", c.Request.Context().Err())
+	}
+
+	if _, err := c.Writer.Write([]byte(": PING\n\n")); err != nil {
+		return fmt.Errorf("write ping data failed: %w", err)
+	}
+	return FlushWriter(c)
 }

 func ObjectData(c *gin.Context, object interface{}) error {
--- a/relay/image_handler.go
+++ b/relay/image_handler.go
@@ -124,12 +124,18 @@ func ImageHelper(c *gin.Context, info *relaycommon.RelayInfo) (newAPIError *type
 		quality = "hd"
 	}

-	var logContent string
+	var logContent []string

 	if len(request.Size) > 0 {
-		logContent = fmt.Sprintf("大小 %s, 品质 %s, 张数 %d", request.Size, quality, request.N)
+		logContent = append(logContent, fmt.Sprintf("大小 %s", request.Size))
+	}
+	if len(quality) > 0 {
+		logContent = append(logContent, fmt.Sprintf("品质 %s", quality))
+	}
+	if request.N > 0 {
+		logContent = append(logContent, fmt.Sprintf("生成数量 %d", request.N))
 	}

-	postConsumeQuota(c, info, usage.(*dto.Usage), logContent)
+	postConsumeQuota(c, info, usage.(*dto.Usage), logContent...)
 	return nil
 }
--- a/relay/relay_task.go
+++ b/relay/relay_task.go
@@ -196,7 +196,7 @@ func RelayTaskSubmit(c *gin.Context, info *relaycommon.RelayInfo) (taskErr *dto.
 	// handle response
 	if resp != nil && resp.StatusCode != http.StatusOK {
 		responseBody, _ := io.ReadAll(resp.Body)
-		taskErr = service.TaskErrorWrapper(fmt.Errorf(string(responseBody)), "fail_to_fetch_task", resp.StatusCode)
+		taskErr = service.TaskErrorWrapper(fmt.Errorf("%s", string(responseBody)), "fail_to_fetch_task", resp.StatusCode)
 		return
 	}

--- a/relay/rerank_handler.go
+++ b/relay/rerank_handler.go
@@ -95,6 +95,6 @@ func RerankHelper(c *gin.Context, info *relaycommon.RelayInfo) (newAPIError *typ
 		service.ResetStatusCode(newAPIError, statusCodeMappingStr)
 		return newAPIError
 	}
-	postConsumeQuota(c, info, usage.(*dto.Usage), "")
+	postConsumeQuota(c, info, usage.(*dto.Usage))
 	return nil
 }
--- a/relay/responses_handler.go
+++ b/relay/responses_handler.go
@@ -107,7 +107,7 @@ func ResponsesHelper(c *gin.Context, info *relaycommon.RelayInfo) (newAPIError *
 	if strings.HasPrefix(info.OriginModelName, "gpt-4o-audio") {
 		service.PostAudioConsumeQuota(c, info, usage.(*dto.Usage), "")
 	} else {
-		postConsumeQuota(c, info, usage.(*dto.Usage), "")
+		postConsumeQuota(c, info, usage.(*dto.Usage))
 	}
 	return nil
 }
--- a/router/api-router.go
+++ b/router/api-router.go
@@ -152,6 +152,10 @@ func SetApiRouter(router *gin.Engine) {
 			channelRoute.POST("/fix", controller.FixChannelsAbilities)
 			channelRoute.GET("/fetch_models/:id", controller.FetchUpstreamModels)
 			channelRoute.POST("/fetch_models", controller.FetchModels)
+			channelRoute.POST("/ollama/pull", controller.OllamaPullModel)
+			channelRoute.POST("/ollama/pull/stream", controller.OllamaPullModelStream)
+			channelRoute.DELETE("/ollama/delete", controller.OllamaDeleteModel)
+			channelRoute.GET("/ollama/version/:id", controller.OllamaVersion)
 			channelRoute.POST("/batch/tag", controller.BatchSetChannelTag)
 			channelRoute.GET("/tag/models", controller.GetTagModels)
 			channelRoute.POST("/copy/:id", controller.CopyChannel)
@@ -256,5 +260,45 @@ func SetApiRouter(router *gin.Engine) {
 			modelsRoute.PUT("/", controller.UpdateModelMeta)
 			modelsRoute.DELETE("/:id", controller.DeleteModelMeta)
 		}
+
+		// Deployments (model deployment management)
+		deploymentsRoute := apiRouter.Group("/deployments")
+		deploymentsRoute.Use(middleware.AdminAuth())
+		{
+			// List and search deployments
+			deploymentsRoute.GET("/", controller.GetAllDeployments)
+			deploymentsRoute.GET("/search", controller.SearchDeployments)
+
+			// Connection utilities
+			deploymentsRoute.POST("/test-connection", controller.TestIoNetConnection)
+
+			// Resource and configuration endpoints
+			deploymentsRoute.GET("/hardware-types", controller.GetHardwareTypes)
+			deploymentsRoute.GET("/locations", controller.GetLocations)
+			deploymentsRoute.GET("/available-replicas", controller.GetAvailableReplicas)
+			deploymentsRoute.POST("/price-estimation", controller.GetPriceEstimation)
+			deploymentsRoute.GET("/check-name", controller.CheckClusterNameAvailability)
+
+			// Create new deployment
+			deploymentsRoute.POST("/", controller.CreateDeployment)
+
+			// Individual deployment operations
+			deploymentsRoute.GET("/:id", controller.GetDeployment)
+			deploymentsRoute.GET("/:id/logs", controller.GetDeploymentLogs)
+			deploymentsRoute.GET("/:id/containers", controller.ListDeploymentContainers)
+			deploymentsRoute.GET("/:id/containers/:container_id", controller.GetContainerDetails)
+			deploymentsRoute.PUT("/:id", controller.UpdateDeployment)
+			deploymentsRoute.PUT("/:id/name", controller.UpdateDeploymentName)
+			deploymentsRoute.POST("/:id/extend", controller.ExtendDeployment)
+			deploymentsRoute.DELETE("/:id", controller.DeleteDeployment)
+
+			// Future batch operations:
+			// deploymentsRoute.POST("/:id/start", controller.StartDeployment)
+			// deploymentsRoute.POST("/:id/stop", controller.StopDeployment)
+			// deploymentsRoute.POST("/:id/restart", controller.RestartDeployment)
+			// deploymentsRoute.POST("/batch_delete", controller.BatchDeleteDeployments)
+			// deploymentsRoute.POST("/batch_start", controller.BatchStartDeployments)
+			// deploymentsRoute.POST("/batch_stop", controller.BatchStopDeployments)
+		}
 	}
 }
--- a/service/channel_select.go
+++ b/service/channel_select.go
@@ -11,50 +11,151 @@ import (
 	"github.com/gin-gonic/gin"
 )

+type RetryParam struct {
+	Ctx          *gin.Context
+	TokenGroup   string
+	ModelName    string
+	Retry        *int
+	resetNextTry bool
+}
+
+func (p *RetryParam) GetRetry() int {
+	if p.Retry == nil {
+		return 0
+	}
+	return *p.Retry
+}
+
+func (p *RetryParam) SetRetry(retry int) {
+	p.Retry = &retry
+}
+
+func (p *RetryParam) IncreaseRetry() {
+	if p.resetNextTry {
+		p.resetNextTry = false
+		return
+	}
+	if p.Retry == nil {
+		p.Retry = new(int)
+	}
+	*p.Retry++
+}
+
+func (p *RetryParam) ResetRetryNextTry() {
+	p.resetNextTry = true
+}
+
 // CacheGetRandomSatisfiedChannel tries to get a random channel that satisfies the requirements.
-func CacheGetRandomSatisfiedChannel(c *gin.Context, tokenGroup string, modelName string, retry int) (*model.Channel, string, error) {
+// 尝试获取一个满足要求的随机渠道。
+//
+// For "auto" tokenGroup with cross-group Retry enabled:
+// 对于启用了跨分组重试的 "auto" tokenGroup：
+//
+//   - Each group will exhaust all its priorities before moving to the next group.
+//     每个分组会用完所有优先级后才会切换到下一个分组。
+//
+//   - Uses ContextKeyAutoGroupIndex to track current group index.
+//     使用 ContextKeyAutoGroupIndex 跟踪当前分组索引。
+//
+//   - Uses ContextKeyAutoGroupRetryIndex to track the global Retry count when current group started.
+//     使用 ContextKeyAutoGroupRetryIndex 跟踪当前分组开始时的全局重试次数。
+//
+//   - priorityRetry = Retry - startRetryIndex, represents the priority level within current group.
+//     priorityRetry = Retry - startRetryIndex，表示当前分组内的优先级级别。
+//
+//   - When GetRandomSatisfiedChannel returns nil (priorities exhausted), moves to next group.
+//     当 GetRandomSatisfiedChannel 返回 nil（优先级用完）时，切换到下一个分组。
+//
+// Example flow (2 groups, each with 2 priorities, RetryTimes=3):
+// 示例流程（2个分组，每个有2个优先级，RetryTimes=3）：
+//
+//	Retry=0: GroupA, priority0 (startRetryIndex=0, priorityRetry=0)
+//	         分组A, 优先级0
+//
+//	Retry=1: GroupA, priority1 (startRetryIndex=0, priorityRetry=1)
+//	         分组A, 优先级1
+//
+//	Retry=2: GroupA exhausted → GroupB, priority0 (startRetryIndex=2, priorityRetry=0)
+//	         分组A用完 → 分组B, 优先级0
+//
+//	Retry=3: GroupB, priority1 (startRetryIndex=2, priorityRetry=1)
+//	         分组B, 优先级1
+func CacheGetRandomSatisfiedChannel(param *RetryParam) (*model.Channel, string, error) {
 	var channel *model.Channel
 	var err error
-	selectGroup := tokenGroup
-	userGroup := common.GetContextKeyString(c, constant.ContextKeyUserGroup)
-	if tokenGroup == "auto" {
+	selectGroup := param.TokenGroup
+	userGroup := common.GetContextKeyString(param.Ctx, constant.ContextKeyUserGroup)
+
+	if param.TokenGroup == "auto" {
 		if len(setting.GetAutoGroups()) == 0 {
 			return nil, selectGroup, errors.New("auto groups is not enabled")
 		}
 		autoGroups := GetUserAutoGroup(userGroup)
-		startIndex := 0
-		priorityRetry := retry
-		crossGroupRetry := common.GetContextKeyBool(c, constant.ContextKeyTokenCrossGroupRetry)
-		if crossGroupRetry && retry > 0 {
-			logger.LogDebug(c, "Auto group retry cross group, retry: %d", retry)
-			if lastIndex, exists := common.GetContextKey(c, constant.ContextKeyAutoGroupIndex); exists {
-				if idx, ok := lastIndex.(int); ok {
-					startIndex = idx + 1
-					priorityRetry = 0
-				}
+
+		// startGroupIndex: the group index to start searching from
+		// startGroupIndex: 开始搜索的分组索引
+		startGroupIndex := 0
+		crossGroupRetry := common.GetContextKeyBool(param.Ctx, constant.ContextKeyTokenCrossGroupRetry)
+
+		if lastGroupIndex, exists := common.GetContextKey(param.Ctx, constant.ContextKeyAutoGroupIndex); exists {
+			if idx, ok := lastGroupIndex.(int); ok {
+				startGroupIndex = idx
 			}
-			logger.LogDebug(c, "Auto group retry cross group, start index: %d", startIndex)
 		}

-		for i := startIndex; i < len(autoGroups); i++ {
+		for i := startGroupIndex; i < len(autoGroups); i++ {
 			autoGroup := autoGroups[i]
-			logger.LogDebug(c, "Auto selecting group: %s", autoGroup)
-			channel, _ = model.GetRandomSatisfiedChannel(autoGroup, modelName, priorityRetry)
-			if channel == nil {
+			// Calculate priorityRetry for current group
+			// 计算当前分组的 priorityRetry
+			priorityRetry := param.GetRetry()
+			// If moved to a new group, reset priorityRetry and update startRetryIndex
+			// 如果切换到新分组，重置 priorityRetry 并更新 startRetryIndex
+			if i > startGroupIndex {
 				priorityRetry = 0
-				continue
-			} else {
-				c.Set("auto_group", autoGroup)
-				common.SetContextKey(c, constant.ContextKeyAutoGroupIndex, i)
-				selectGroup = autoGroup
-				logger.LogDebug(c, "Auto selected group: %s", autoGroup)
-				break
 			}
+			logger.LogDebug(param.Ctx, "Auto selecting group: %s, priorityRetry: %d", autoGroup, priorityRetry)
+
+			channel, _ = model.GetRandomSatisfiedChannel(autoGroup, param.ModelName, priorityRetry)
+			if channel == nil {
+				// Current group has no available channel for this model, try next group
+				// 当前分组没有该模型的可用渠道，尝试下一个分组
+				logger.LogDebug(param.Ctx, "No available channel in group %s for model %s at priorityRetry %d, trying next group", autoGroup, param.ModelName, priorityRetry)
+				// 重置状态以尝试下一个分组
+				common.SetContextKey(param.Ctx, constant.ContextKeyAutoGroupIndex, i+1)
+				common.SetContextKey(param.Ctx, constant.ContextKeyAutoGroupRetryIndex, 0)
+				// Reset retry counter so outer loop can continue for next group
+				// 重置重试计数器，以便外层循环可以为下一个分组继续
+				param.SetRetry(0)
+				continue
+			}
+			common.SetContextKey(param.Ctx, constant.ContextKeyAutoGroup, autoGroup)
+			selectGroup = autoGroup
+			logger.LogDebug(param.Ctx, "Auto selected group: %s", autoGroup)
+
+			// Prepare state for next retry
+			// 为下一次重试准备状态
+			if crossGroupRetry && priorityRetry >= common.RetryTimes {
+				// Current group has exhausted all retries, prepare to switch to next group
+				// This request still uses current group, but next retry will use next group
+				// 当前分组已用完所有重试次数，准备切换到下一个分组
+				// 本次请求仍使用当前分组，但下次重试将使用下一个分组
+				logger.LogDebug(param.Ctx, "Current group %s retries exhausted (priorityRetry=%d >= RetryTimes=%d), preparing switch to next group for next retry", autoGroup, priorityRetry, common.RetryTimes)
+				common.SetContextKey(param.Ctx, constant.ContextKeyAutoGroupIndex, i+1)
+				// Reset retry counter so outer loop can continue for next group
+				// 重置重试计数器，以便外层循环可以为下一个分组继续
+				param.SetRetry(0)
+				param.ResetRetryNextTry()
+			} else {
+				// Stay in current group, save current state
+				// 保持在当前分组，保存当前状态
+				common.SetContextKey(param.Ctx, constant.ContextKeyAutoGroupIndex, i)
+			}
+			break
 		}
 	} else {
-		channel, err = model.GetRandomSatisfiedChannel(tokenGroup, modelName, retry)
+		channel, err = model.GetRandomSatisfiedChannel(param.TokenGroup, param.ModelName, param.GetRetry())
 		if err != nil {
-			return nil, tokenGroup, err
+			return nil, param.TokenGroup, err
 		}
 	}
 	return channel, selectGroup, nil
--- a/service/convert.go
+++ b/service/convert.go
@@ -389,25 +389,29 @@ func StreamResponseOpenAI2Claude(openAIResponse *dto.ChatCompletionsStreamRespon
 				}

 				idx := blockIndex
-				claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
-					Index: &idx,
-					Type:  "content_block_start",
-					ContentBlock: &dto.ClaudeMediaMessage{
-						Id:    toolCall.ID,
-						Type:  "tool_use",
-						Name:  toolCall.Function.Name,
-						Input: map[string]interface{}{},
-					},
-				})
+				if toolCall.Function.Name != "" {
+					claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
+						Index: &idx,
+						Type:  "content_block_start",
+						ContentBlock: &dto.ClaudeMediaMessage{
+							Id:    toolCall.ID,
+							Type:  "tool_use",
+							Name:  toolCall.Function.Name,
+							Input: map[string]interface{}{},
+						},
+					})
+				}

-				claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
-					Index: &idx,
-					Type:  "content_block_delta",
-					Delta: &dto.ClaudeMediaMessage{
-						Type:        "input_json_delta",
-						PartialJson: &toolCall.Function.Arguments,
-					},
-				})
+				if len(toolCall.Function.Arguments) > 0 {
+					claudeResponses = append(claudeResponses, &dto.ClaudeResponse{
+						Index: &idx,
+						Type:  "content_block_delta",
+						Delta: &dto.ClaudeMediaMessage{
+							Type:        "input_json_delta",
+							PartialJson: &toolCall.Function.Arguments,
+						},
+					})
+				}

 				info.ClaudeConvertInfo.Index = blockIndex
 			}
--- a/service/error.go
+++ b/service/error.go
@@ -90,24 +90,38 @@ func RelayErrorHandler(ctx context.Context, resp *http.Response, showBodyWhenFai
 	}
 	CloseResponseBodyGracefully(resp)
 	var errResponse dto.GeneralErrorResponse
+	buildErrWithBody := func(message string) error {
+		if message == "" {
+			return fmt.Errorf("bad response status code %d, body: %s", resp.StatusCode, string(responseBody))
+		}
+		return fmt.Errorf("bad response status code %d, message: %s, body: %s", resp.StatusCode, message, string(responseBody))
+	}

 	err = common.Unmarshal(responseBody, &errResponse)
 	if err != nil {
 		if showBodyWhenFail {
-			newApiErr.Err = fmt.Errorf("bad response status code %d, body: %s", resp.StatusCode, string(responseBody))
+			newApiErr.Err = buildErrWithBody("")
 		} else {
-			if common.DebugEnabled {
-				logger.LogInfo(ctx, fmt.Sprintf("bad response status code %d, body: %s", resp.StatusCode, string(responseBody)))
-			}
+			logger.LogError(ctx, fmt.Sprintf("bad response status code %d, body: %s", resp.StatusCode, string(responseBody)))
 			newApiErr.Err = fmt.Errorf("bad response status code %d", resp.StatusCode)
 		}
 		return
 	}
-	if errResponse.Error.Message != "" {
+
+	if common.GetJsonType(errResponse.Error) == "object" {
 		// General format error (OpenAI, Anthropic, Gemini, etc.)
-		newApiErr = types.WithOpenAIError(errResponse.Error, resp.StatusCode)
-	} else {
-		newApiErr = types.NewOpenAIError(errors.New(errResponse.ToMessage()), types.ErrorCodeBadResponseStatusCode, resp.StatusCode)
+		oaiError := errResponse.TryToOpenAIError()
+		if oaiError != nil {
+			newApiErr = types.WithOpenAIError(*oaiError, resp.StatusCode)
+			if showBodyWhenFail {
+				newApiErr.Err = buildErrWithBody(newApiErr.Error())
+			}
+			return
+		}
+	}
+	newApiErr = types.NewOpenAIError(errors.New(errResponse.ToMessage()), types.ErrorCodeBadResponseStatusCode, resp.StatusCode)
+	if showBodyWhenFail {
+		newApiErr.Err = buildErrWithBody(newApiErr.Error())
 	}
 	return
 }
--- a/service/quota.go
+++ b/service/quota.go
@@ -95,7 +95,7 @@ func PreWssConsumeQuota(ctx *gin.Context, relayInfo *relaycommon.RelayInfo, usag
 		return err
 	}

-	token, err := model.GetTokenByKey(strings.TrimLeft(relayInfo.TokenKey, "sk-"), false)
+	token, err := model.GetTokenByKey(strings.TrimPrefix(relayInfo.TokenKey, "sk-"), false)
 	if err != nil {
 		return err
 	}
@@ -108,7 +108,7 @@ func PreWssConsumeQuota(ctx *gin.Context, relayInfo *relaycommon.RelayInfo, usag
 	groupRatio := ratio_setting.GetGroupRatio(relayInfo.UsingGroup)
 	modelRatio, _, _ := ratio_setting.GetModelRatio(modelName)

-	autoGroup, exists := ctx.Get("auto_group")
+	autoGroup, exists := common.GetContextKey(ctx, constant.ContextKeyAutoGroup)
 	if exists {
 		groupRatio = ratio_setting.GetGroupRatio(autoGroup.(string))
 		log.Printf("final group ratio: %f", groupRatio)
--- a/setting/model_setting/gemini.go
+++ b/setting/model_setting/gemini.go
@@ -4,7 +4,7 @@ import (
 	"github.com/QuantumNous/new-api/setting/config"
 )

-// GeminiSettings 定义Gemini模型的配置
+// GeminiSettings defines Gemini model configuration. 注意bool要以enabled结尾才可以生效编辑
 type GeminiSettings struct {
 	SafetySettings                        map[string]string `json:"safety_settings"`
 	VersionSettings                       map[string]string `json:"version_settings"`
@@ -12,6 +12,7 @@ type GeminiSettings struct {
 	ThinkingAdapterEnabled                bool              `json:"thinking_adapter_enabled"`
 	ThinkingAdapterBudgetTokensPercentage float64           `json:"thinking_adapter_budget_tokens_percentage"`
 	FunctionCallThoughtSignatureEnabled   bool              `json:"function_call_thought_signature_enabled"`
+	RemoveFunctionResponseIdEnabled       bool              `json:"remove_function_response_id_enabled"`
 }

 // 默认配置
@@ -30,6 +31,7 @@ var defaultGeminiSettings = GeminiSettings{
 	ThinkingAdapterEnabled:                false,
 	ThinkingAdapterBudgetTokensPercentage: 0.6,
 	FunctionCallThoughtSignatureEnabled:   true,
+	RemoveFunctionResponseIdEnabled:       true,
 }

 // 全局实例
--- a/setting/ratio_setting/model_ratio.go
+++ b/setting/ratio_setting/model_ratio.go
@@ -7,7 +7,6 @@ import (

 	"github.com/QuantumNous/new-api/common"
 	"github.com/QuantumNous/new-api/setting/operation_setting"
-	"github.com/QuantumNous/new-api/setting/reasoning"
 )

 // from songquanpeng/one-api
@@ -297,6 +296,7 @@ var defaultModelPrice = map[string]float64{
 	"mj_upload":                      0.05,
 	"sora-2":                         0.3,
 	"sora-2-pro":                     0.5,
+	"gpt-4o-mini-tts":                0.3,
 }

 var defaultAudioRatio = map[string]float64{
@@ -304,11 +304,13 @@ var defaultAudioRatio = map[string]float64{
 	"gpt-4o-mini-audio-preview":    66.67,
 	"gpt-4o-realtime-preview":      8,
 	"gpt-4o-mini-realtime-preview": 16.67,
+	"gpt-4o-mini-tts":              25,
 }

 var defaultAudioCompletionRatio = map[string]float64{
 	"gpt-4o-realtime":      2,
 	"gpt-4o-mini-realtime": 2,
+	"gpt-4o-mini-tts":      1,
 }

 var (
@@ -536,7 +538,10 @@ func getHardcodedCompletionModelRatio(name string) (float64, bool) {
 			if name == "gpt-4o-2024-05-13" {
 				return 3, true
 			}
-			return 4, true
+			if strings.HasPrefix(name, "gpt-4o-mini-tts") {
+				return 20, false
+			}
+			return 4, false
 		}
 		// gpt-5 匹配
 		if strings.HasPrefix(name, "gpt-5") {
@@ -823,10 +828,6 @@ func FormatMatchingModelName(name string) string {
 		name = handleThinkingBudgetModel(name, "gemini-2.5-pro", "gemini-2.5-pro-thinking-*")
 	}

-	if base, _, ok := reasoning.TrimEffortSuffix(name); ok {
-		name = base
-	}
-
 	if strings.HasPrefix(name, "gpt-4-gizmo") {
 		name = "gpt-4-gizmo-*"
 	}
--- a/setting/reasoning/suffix.go
+++ b/setting/reasoning/suffix.go
@@ -6,7 +6,7 @@ import (
 	"github.com/samber/lo"
 )

-var EffortSuffixes = []string{"-high", "-medium", "-low"}
+var EffortSuffixes = []string{"-high", "-medium", "-low", "-minimal"}

 // TrimEffortSuffix -> modelName level(low) exists
 func TrimEffortSuffix(modelName string) (string, string, bool) {
--- a/setting/system_setting/discord.go
+++ b/setting/system_setting/discord.go
@@ -3,9 +3,9 @@ package system_setting
 import "github.com/QuantumNous/new-api/setting/config"

 type DiscordSettings struct {
-	Enabled               bool   `json:"enabled"`
-	ClientId              string `json:"client_id"`
-	ClientSecret          string `json:"client_secret"`
+	Enabled      bool   `json:"enabled"`
+	ClientId     string `json:"client_id"`
+	ClientSecret string `json:"client_secret"`
 }

 // 默认配置
--- a/types/error.go
+++ b/types/error.go
@@ -1,6 +1,7 @@
 package types

 import (
+	"encoding/json"
 	"errors"
 	"fmt"
 	"net/http"
@@ -10,10 +11,11 @@ import (
 )

 type OpenAIError struct {
-	Message string `json:"message"`
-	Type    string `json:"type"`
-	Param   string `json:"param"`
-	Code    any    `json:"code"`
+	Message  string          `json:"message"`
+	Type     string          `json:"type"`
+	Param    string          `json:"param"`
+	Code     any             `json:"code"`
+	Metadata json.RawMessage `json:"metadata,omitempty"`
 }

 type ClaudeError struct {
@@ -92,6 +94,15 @@ type NewAPIError struct {
 	errorType      ErrorType
 	errorCode      ErrorCode
 	StatusCode     int
+	Metadata       json.RawMessage
+}
+
+// Unwrap enables errors.Is / errors.As to work with NewAPIError by exposing the underlying error.
+func (e *NewAPIError) Unwrap() error {
+	if e == nil {
+		return nil
+	}
+	return e.Err
 }

 func (e *NewAPIError) GetErrorCode() ErrorCode {
@@ -293,6 +304,13 @@ func WithOpenAIError(openAIError OpenAIError, statusCode int, ops ...NewAPIError
 		Err:        errors.New(openAIError.Message),
 		errorCode:  ErrorCode(code),
 	}
+	// OpenRouter
+	if len(openAIError.Metadata) > 0 {
+		openAIError.Message = fmt.Sprintf("%s (%s)", openAIError.Message, openAIError.Metadata)
+		e.Metadata = openAIError.Metadata
+		e.RelayError = openAIError
+		e.Err = errors.New(openAIError.Message)
+	}
 	for _, op := range ops {
 		op(e)
 	}
--- a/types/price_data.go
+++ b/types/price_data.go
@@ -26,12 +26,22 @@ type PriceData struct {
 	GroupRatioInfo       GroupRatioInfo
 }

+func (p *PriceData) AddOtherRatio(key string, ratio float64) {
+	if p.OtherRatios == nil {
+		p.OtherRatios = make(map[string]float64)
+	}
+	if ratio <= 0 {
+		return
+	}
+	p.OtherRatios[key] = ratio
+}
+
 type PerCallPriceData struct {
 	ModelPrice     float64
 	Quota          int
 	GroupRatioInfo GroupRatioInfo
 }

-func (p PriceData) ToSetting() string {
+func (p *PriceData) ToSetting() string {
 	return fmt.Sprintf("ModelPrice: %f, ModelRatio: %f, CompletionRatio: %f, CacheRatio: %f, GroupRatio: %f, UsePrice: %t, CacheCreationRatio: %f, CacheCreation5mRatio: %f, CacheCreation1hRatio: %f, QuotaToPreConsume: %d, ImageRatio: %f, AudioRatio: %f, AudioCompletionRatio: %f", p.ModelPrice, p.ModelRatio, p.CompletionRatio, p.CacheRatio, p.GroupRatioInfo.GroupRatio, p.UsePrice, p.CacheCreationRatio, p.CacheCreation5mRatio, p.CacheCreation1hRatio, p.QuotaToPreConsume, p.ImageRatio, p.AudioRatio, p.AudioCompletionRatio)
 }
--- a/web/bun.lock
+++ b/web/bun.lock
@@ -48,6 +48,7 @@
        "@so1ve/prettier-config": "^3.1.0",
        "@vitejs/plugin-react": "^4.2.1",
        "autoprefixer": "^10.4.21",
+        "code-inspector-plugin": "^1.3.3",
        "eslint": "8.57.0",
        "eslint-plugin-header": "^3.1.1",
        "eslint-plugin-react-hooks": "^5.2.0",
@@ -139,6 +140,18 @@

    "@chevrotain/utils": ["@chevrotain/utils@11.0.3", "", {}, "sha512-YslZMgtJUyuMbZ+aKvfF3x1f5liK4mWNxghFRv7jqRR9C3R3fAOGTTKvxXDa2Y1s9zSbcpuO0cAxDYsc9SrXoQ=="],

+    "@code-inspector/core": ["@code-inspector/core@1.3.3", "", { "dependencies": { "@vue/compiler-dom": "^3.5.13", "chalk": "^4.1.1", "dotenv": "^16.1.4", "launch-ide": "1.3.0", "portfinder": "^1.0.28" } }, "sha512-1SUCY/XiJ3LuA9TPfS9i7/cUcmdLsgB0chuDcP96ixB2tvYojzgCrglP7CHUGZa1dtWuRLuCiDzkclLetpV4ew=="],
+
+    "@code-inspector/esbuild": ["@code-inspector/esbuild@1.3.3", "", { "dependencies": { "@code-inspector/core": "1.3.3" } }, "sha512-GzX5LQbvh9DXINSUyWymG8Y7u5Tq4oJAnnrCoRiYxQvKBUuu2qVMzpZHIA2iDGxvazgZvr2OK+Sh/We4LutViA=="],
+
+    "@code-inspector/mako": ["@code-inspector/mako@1.3.3", "", { "dependencies": { "@code-inspector/core": "1.3.3" } }, "sha512-YPTHwpDtz9zn1vimMcJFCM6ELdBoivY7t2GzgY/iCTfgm6pu1H+oWZiBC35edqYAB7+xE8frspnNsmBhsrA36A=="],
+
+    "@code-inspector/turbopack": ["@code-inspector/turbopack@1.3.3", "", { "dependencies": { "@code-inspector/core": "1.3.3", "@code-inspector/webpack": "1.3.3" } }, "sha512-XhqsMtts/Int64LkpO00b4rlg1bw0otlRebX8dSVgZfsujj+Jdv2ngKmQ6RBN3vgj/zV7BfgBLeGgJn7D1kT3A=="],
+
+    "@code-inspector/vite": ["@code-inspector/vite@1.3.3", "", { "dependencies": { "@code-inspector/core": "1.3.3", "chalk": "4.1.1" } }, "sha512-phsHVYBsxAhfi6jJ+vpmxuF6jYMuVbozs5e8pkEJL2hQyGVkzP77vfCh1wzmQHcmKUKb2tlrFcvAsRb7oA1W7w=="],
+
+    "@code-inspector/webpack": ["@code-inspector/webpack@1.3.3", "", { "dependencies": { "@code-inspector/core": "1.3.3" } }, "sha512-qYih7syRXgM45KaWFNNk5Ed4WitVQHCI/2s/DZMFaF1Y2FA9qd1wPGiggNeqdcUsjf9TvVBQw/89gPQZIGwSqQ=="],
+
    "@dnd-kit/accessibility": ["@dnd-kit/accessibility@3.1.1", "", { "dependencies": { "tslib": "^2.0.0" }, "peerDependencies": { "react": ">=16.8.0" } }, "sha512-2P+YgaXF+gRsIihwwY1gCsQSYnu9Zyj2py8kY5fFvUM1qm2WA2u639R6YNVfU4GWr+ZM5mqEsfHZZLoRONbemw=="],

    "@dnd-kit/core": ["@dnd-kit/core@6.3.1", "", { "dependencies": { "@dnd-kit/accessibility": "^3.1.1", "@dnd-kit/utilities": "^3.2.2", "tslib": "^2.0.0" }, "peerDependencies": { "react": ">=16.8.0", "react-dom": ">=16.8.0" } }, "sha512-xkGBRQQab4RLwgXxoqETICr6S5JlogafbhNsidmrkVv2YRs5MLwpjoF2qpiGjQt8S9AoxtIV603s0GIUpY5eYQ=="],
@@ -713,6 +726,12 @@

    "@vitejs/plugin-react": ["@vitejs/plugin-react@4.3.4", "", { "dependencies": { "@babel/core": "^7.26.0", "@babel/plugin-transform-react-jsx-self": "^7.25.9", "@babel/plugin-transform-react-jsx-source": "^7.25.9", "@types/babel__core": "^7.20.5", "react-refresh": "^0.14.2" }, "peerDependencies": { "vite": "^4.2.0 || ^5.0.0 || ^6.0.0" } }, "sha512-SCCPBJtYLdE8PX/7ZQAs1QAZ8Jqwih+0VBLum1EGqmCCQal+MIUqLCzj3ZUy8ufbC0cAM4LRlSTm7IQJwWT4ug=="],

+    "@vue/compiler-core": ["@vue/compiler-core@3.5.26", "", { "dependencies": { "@babel/parser": "^7.28.5", "@vue/shared": "3.5.26", "entities": "^7.0.0", "estree-walker": "^2.0.2", "source-map-js": "^1.2.1" } }, "sha512-vXyI5GMfuoBCnv5ucIT7jhHKl55Y477yxP6fc4eUswjP8FG3FFVFd41eNDArR+Uk3QKn2Z85NavjaxLxOC19/w=="],
+
+    "@vue/compiler-dom": ["@vue/compiler-dom@3.5.26", "", { "dependencies": { "@vue/compiler-core": "3.5.26", "@vue/shared": "3.5.26" } }, "sha512-y1Tcd3eXs834QjswshSilCBnKGeQjQXB6PqFn/1nxcQw4pmG42G8lwz+FZPAZAby6gZeHSt/8LMPfZ4Rb+Bd/A=="],
+
+    "@vue/shared": ["@vue/shared@3.5.26", "", {}, "sha512-7Z6/y3uFI5PRoKeorTOSXKcDj0MSasfNNltcslbFrPpcw6aXRUALq4IfJlaTRspiWIUOEZbrpM+iQGmCOiWe4A=="],
+
    "abs-svg-path": ["abs-svg-path@0.1.1", "", {}, "sha512-d8XPSGjfyzlXC3Xx891DJRyZfqk5JU0BJrDQcsWomFIV1/BIzPW5HDH5iDdWpqWaav0YVIEzT1RHTwWr0FFshA=="],

    "acorn": ["acorn@8.15.0", "", { "bin": { "acorn": "bin/acorn" } }, "sha512-NZyJarBfL7nWwIq+FDL6Zp/yHEhePMNnnJ0y3qfieCrmNvYct8uvtiV41UvlSe6apAfk0fY1FbWx+NwfmpvtTg=="],
@@ -747,6 +766,8 @@

    "astring": ["astring@1.9.0", "", { "bin": { "astring": "bin/astring" } }, "sha512-LElXdjswlqjWrPpJFg1Fx4wpkOCxj1TDHlSV4PlaRxHGWko024xICaa97ZkMfs6DRKlCguiAI+rbXv5GWwXIkg=="],

+    "async": ["async@3.2.6", "", {}, "sha512-htCUDlxyyCLMgaM3xXg0C0LW2xqfuQ6p05pCEIsXuyQ+a1koYKTuBMzRNwmybfLgvJDMd0r1LTn4+E0Ti6C2AA=="],
+
    "async-validator": ["async-validator@3.5.2", "", {}, "sha512-8eLCg00W9pIRZSB781UUX/H6Oskmm8xloZfr09lz5bikRpBVDlJ3hRVuxxP1SxcwsEYfJ4IU8Q19Y8/893r3rQ=="],

    "asynckit": ["asynckit@0.4.0", "", {}, "sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q=="],
@@ -793,7 +814,7 @@

    "ccount": ["ccount@2.0.1", "", {}, "sha512-eyrF0jiFpY+3drT6383f1qhkbGsLSifNAjA61IUjZjmLCWjItY6LB9ft9YhoDgwfmclB2zhu51Lc7+95b8NRAg=="],

-    "chalk": ["chalk@4.1.2", "", { "dependencies": { "ansi-styles": "^4.1.0", "supports-color": "^7.1.0" } }, "sha512-oKnbhFyRIXpUuez8iBMmyEa4nbj4IOQyuhc/wy9kY7/WVPcwIO9VA668Pu8RkO7+0G76SLROeyw9CpQ061i4mA=="],
+    "chalk": ["chalk@4.1.1", "", { "dependencies": { "ansi-styles": "^4.1.0", "supports-color": "^7.1.0" } }, "sha512-diHzdDKxcU+bAsUboHLPEDQiw0qEe0qd7SYUn3HgcFlWgbDcfLGswOHYeGrHKzG9z6UYf01d9VFMfZxPM1xZSg=="],

    "character-entities": ["character-entities@2.0.2", "", {}, "sha512-shx7oQ0Awen/BRIdkjkvz54PnEEI/EjwXDSIZp86/KKdbafHh1Df/RYGBhn4hbe2+uKC9FnT5UCEdyPz3ai9hQ=="],

@@ -825,6 +846,8 @@

    "clsx": ["clsx@2.1.1", "", {}, "sha512-eYm0QWBtUrBWZWG0d386OGAw16Z995PiOVo2B7bjWSbHedGl5e0ZWaq65kOGgUSNesEIDkB9ISbTg/JK9dhCZA=="],

+    "code-inspector-plugin": ["code-inspector-plugin@1.3.3", "", { "dependencies": { "@code-inspector/core": "1.3.3", "@code-inspector/esbuild": "1.3.3", "@code-inspector/mako": "1.3.3", "@code-inspector/turbopack": "1.3.3", "@code-inspector/vite": "1.3.3", "@code-inspector/webpack": "1.3.3", "chalk": "4.1.1" } }, "sha512-yDi84v5tgXFSZLLXqHl/Mc2qy9d2CxcYhIaP192NhqTG1zA5uVtiNIzvDAXh5Vaqy8QGYkvBfbG/i55b/sXaSQ=="],
+
    "collapse-white-space": ["collapse-white-space@2.1.0", "", {}, "sha512-loKTxY1zCOuG4j9f6EPnuyyYkf58RnhhWTvRoZEokgB+WbdXehfjFviyOVYkqzEWz1Q5kRiZdBYS5SwxbQYwzw=="],

    "color-convert": ["color-convert@2.0.1", "", { "dependencies": { "color-name": "~1.1.4" } }, "sha512-RRECPsj7iu/xb5oKYcsFHSppFNnsj/52OVTRKb4zP5onXwVF3zVmmToNcOfGC+CRDpfK/U584fMg38ZHCaElKQ=="],
@@ -975,6 +998,8 @@

    "dompurify": ["dompurify@3.2.6", "", { "optionalDependencies": { "@types/trusted-types": "^2.0.7" } }, "sha512-/2GogDQlohXPZe6D6NOgQvXLPSYBqIWMnZ8zzOhn09REE4eyAzb+Hed3jhoM9OkuaJ8P6ZGTTVWQKAi8ieIzfQ=="],

+    "dotenv": ["dotenv@16.6.1", "", {}, "sha512-uBq4egWHTcTt33a72vpSG0z3HnPuIl6NqYcTrKEg2azoEyl2hpW0zqlxysq2pK9HlDIHyHyakeYaYnSAwd8bow=="],
+
    "dunder-proto": ["dunder-proto@1.0.1", "", { "dependencies": { "call-bind-apply-helpers": "^1.0.1", "es-errors": "^1.3.0", "gopd": "^1.2.0" } }, "sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A=="],

    "eastasianwidth": ["eastasianwidth@0.2.0", "", {}, "sha512-I88TYZWc9XiYHRQ4/3c5rjjfgkjhLyW2luGIheGERbNQ6OY7yTybanSpDXZa8y7VUP9YmDcYa+eyq4ca7iLqWA=="],
@@ -985,7 +1010,7 @@

    "emoji-regex": ["emoji-regex@10.4.0", "", {}, "sha512-EC+0oUMY1Rqm4O6LLrgjtYDvcVYTy7chDnM4Q7030tP4Kwj3u/pR6gP9ygnp2CJMK5Gq+9Q2oqmrFJAz01DXjw=="],

-    "entities": ["entities@6.0.0", "", {}, "sha512-aKstq2TDOndCn4diEyp9Uq/Flu2i1GlLkc6XIDQSDMuaFE3OPW5OphLCyQ5SpSJZTb4reN+kTcYru5yIfXoRPw=="],
+    "entities": ["entities@7.0.0", "", {}, "sha512-FDWG5cmEYf2Z00IkYRhbFrwIwvdFKH07uV8dvNy0omp/Qb1xcyCWp2UDtcwJF4QZZvk0sLudP6/hAu42TaqVhQ=="],

    "error-ex": ["error-ex@1.3.2", "", { "dependencies": { "is-arrayish": "^0.2.1" } }, "sha512-7dFHNmqeFSEt2ZBsCriorKnn3Z2pj+fd9kmI6QoWw4//DL+icEBfc0U7qJCisqrTsKTjw4fNFy2pW9OqStD84g=="],

@@ -1305,6 +1330,8 @@

    "langium": ["langium@3.3.1", "", { "dependencies": { "chevrotain": "~11.0.3", "chevrotain-allstar": "~0.3.0", "vscode-languageserver": "~9.0.1", "vscode-languageserver-textdocument": "~1.0.11", "vscode-uri": "~3.0.8" } }, "sha512-QJv/h939gDpvT+9SiLVlY7tZC3xB2qK57v0J04Sh9wpMb6MP1q8gB21L3WIo8T5P1MSMg3Ep14L7KkDCFG3y4w=="],

+    "launch-ide": ["launch-ide@1.3.0", "", { "dependencies": { "chalk": "^4.1.1", "dotenv": "^16.1.4" } }, "sha512-pxiF+HVNMV0dDc6Z0q89RDmzMF9XmSGaOn4ueTegjMy3cUkezc3zrki5PCiz68zZIqAuhW7iwoWX7JO4Kn6B0A=="],
+
    "layout-base": ["layout-base@1.0.2", "", {}, "sha512-8h2oVEZNktL4BH2JCOI90iD1yXwL6iNW7KcCKT2QZgQJR2vbqDsldCTPRU9NifTCqHZci57XvQQ15YTu+sTYPg=="],

    "leva": ["leva@0.10.0", "", { "dependencies": { "@radix-ui/react-portal": "1.0.2", "@radix-ui/react-tooltip": "1.0.5", "@stitches/react": "^1.2.8", "@use-gesture/react": "^10.2.5", "colord": "^2.9.2", "dequal": "^2.0.2", "merge-value": "^1.0.0", "react-colorful": "^5.5.1", "react-dropzone": "^12.0.0", "v8n": "^1.3.3", "zustand": "^3.6.9" }, "peerDependencies": { "react": "^18.0.0 || ^19.0.0", "react-dom": "^18.0.0 || ^19.0.0" } }, "sha512-RiNJWmeqQdKIeHuVXgshmxIHu144a2AMYtLxKf8Nm1j93pisDPexuQDHKNdQlbo37wdyDQibLjY9JKGIiD7gaw=="],
@@ -1595,6 +1622,8 @@

    "polished": ["polished@4.3.1", "", { "dependencies": { "@babel/runtime": "^7.17.8" } }, "sha512-OBatVyC/N7SCW/FaDHrSd+vn0o5cS855TOmYi4OkdWUMSJCET/xip//ch8xGUvtr3i44X9LVyWwQlRMTN3pwSA=="],

+    "portfinder": ["portfinder@1.0.38", "", { "dependencies": { "async": "^3.2.6", "debug": "^4.3.6" } }, "sha512-rEwq/ZHlJIKw++XtLAO8PPuOQA/zaPJOZJ37BVuN97nLpMJeuDVLVGRwbFoBgLudgdTMP2hdRJP++H+8QOA3vg=="],
+
    "postcss": ["postcss@8.5.3", "", { "dependencies": { "nanoid": "^3.3.8", "picocolors": "^1.1.1", "source-map-js": "^1.2.1" } }, "sha512-dle9A3yYxlBSrt8Fu+IpjGT8SY8hN0mlaA6GY8t0P5PjIOZemULz/E2Bnm/2dcUOena75OTNkHI76uZBNUUq3A=="],

    "postcss-import": ["postcss-import@15.1.0", "", { "dependencies": { "postcss-value-parser": "^4.0.0", "read-cache": "^1.0.0", "resolve": "^1.1.7" }, "peerDependencies": { "postcss": "^8.0.0" } }, "sha512-hpr+J05B2FVYUAXHeK1YyI267J/dDDhMU6B6civm8hSY1jYJnBXxzKDKDswzJmtLHryrjhnDjqqp/49t8FALew=="],
@@ -2081,6 +2110,8 @@

    "@babel/traverse/globals": ["globals@11.12.0", "", {}, "sha512-WOBp/EEGUiIsJSp7wcv/y6MO+lV9UoncWqxuFfm8eBwzWNgyfBd6Gz+IeKQ9jCmyhoH99g15M3T+QaVHFjizVA=="],

+    "@code-inspector/core/chalk": ["chalk@4.1.2", "", { "dependencies": { "ansi-styles": "^4.1.0", "supports-color": "^7.1.0" } }, "sha512-oKnbhFyRIXpUuez8iBMmyEa4nbj4IOQyuhc/wy9kY7/WVPcwIO9VA668Pu8RkO7+0G76SLROeyw9CpQ061i4mA=="],
+
    "@douyinfe/semi-foundation/remark-gfm": ["remark-gfm@4.0.0", "", { "dependencies": { "@types/mdast": "^4.0.0", "mdast-util-gfm": "^3.0.0", "micromark-extension-gfm": "^3.0.0", "remark-parse": "^11.0.0", "remark-stringify": "^11.0.0", "unified": "^11.0.0" } }, "sha512-U92vJgBPkbw4Zfu/IiW2oTZLSL3Zpv+uI7My2eq8JxKgqraFdU8YUGicEJCEgSbeaG+QDFqIcwwfMTOEelPxuA=="],

    "@emotion/babel-plugin/@emotion/hash": ["@emotion/hash@0.9.2", "", {}, "sha512-MyqliTZGuOm3+5ZRSaaBGP3USLw6+EGykkwZns2EPC5g8jJ4z9OrdZY9apkl3+UP9+sdz76YYkwCKP5gh8iY3g=="],
@@ -2131,6 +2162,10 @@

    "@visactor/vrender-kits/roughjs": ["roughjs@4.5.2", "", { "dependencies": { "path-data-parser": "^0.1.0", "points-on-curve": "^0.2.0", "points-on-path": "^0.2.1" } }, "sha512-2xSlLDKdsWyFxrveYWk9YQ/Y9UfK38EAMRNkYkMqYBJvPX8abCa9PN0x3w02H8Oa6/0bcZICJU+U95VumPqseg=="],

+    "@vue/compiler-core/@babel/parser": ["@babel/parser@7.28.5", "", { "dependencies": { "@babel/types": "^7.28.5" }, "bin": "./bin/babel-parser.js" }, "sha512-KKBU1VGYR7ORr3At5HAtUQ+TV3SzRCXmA/8OdDZiLDBIZxVyzXuztPjfLd3BV1PRAQGCMWWSHYhL0F8d5uHBDQ=="],
+
+    "@vue/compiler-core/estree-walker": ["estree-walker@2.0.2", "", {}, "sha512-Rfkk/Mp/DL7JVje3u18FxFujQlTNR2q6QfMSMB7AvCBx91NGj/ba3kCfza0f6dVDbw7YlRf/nDrn7pQrCCyQ/w=="],
+
    "antd/rc-collapse": ["rc-collapse@3.9.0", "", { "dependencies": { "@babel/runtime": "^7.10.1", "classnames": "2.x", "rc-motion": "^2.3.4", "rc-util": "^5.27.0" }, "peerDependencies": { "react": ">=16.9.0", "react-dom": ">=16.9.0" } }, "sha512-swDdz4QZ4dFTo4RAUMLL50qP0EY62N2kvmk2We5xYdRwcRn8WcYtuetCJpwpaCbUfUt5+huLpVxhvmnK+PHrkA=="],

    "antd/scroll-into-view-if-needed": ["scroll-into-view-if-needed@3.1.0", "", { "dependencies": { "compute-scroll-into-view": "^3.0.2" } }, "sha512-49oNpRjWRvnU8NyGVmUaYG4jtTkNonFZI86MmGRDqBphEK2EXT9gdEUoQPZhuBM8yWHxCWbobltqYO5M4XrUvQ=="],
@@ -2155,6 +2190,8 @@

    "esast-util-from-js/acorn": ["acorn@8.14.0", "", { "bin": { "acorn": "bin/acorn" } }, "sha512-cl669nCJTZBsL97OF4kUQm5g5hC2uihk0NxY3WENAC0TYdILVkAyHymAntgxGkl7K+t0cXIrH5siy5S4XkFycA=="],

+    "eslint/chalk": ["chalk@4.1.2", "", { "dependencies": { "ansi-styles": "^4.1.0", "supports-color": "^7.1.0" } }, "sha512-oKnbhFyRIXpUuez8iBMmyEa4nbj4IOQyuhc/wy9kY7/WVPcwIO9VA668Pu8RkO7+0G76SLROeyw9CpQ061i4mA=="],
+
    "extend-shallow/is-extendable": ["is-extendable@0.1.1", "", {}, "sha512-5BMULNob1vgFX6EjQw5izWDxrecWK9AM72rugNr0TFldMOi0fj6Jk+zeKIt0xGj4cEfQIJth4w3OKWOJ4f+AFw=="],

    "fast-glob/glob-parent": ["glob-parent@5.1.2", "", { "dependencies": { "is-glob": "^4.0.1" } }, "sha512-AOIgSQCepiJYwP3ARnGx+5VnTu2HBYdzbGP45eLw1vr3zB3vZLeyed1sC9hnbcOc9/SrMyM5RPQrkGz4aS9Zow=="],
@@ -2181,6 +2218,8 @@

    "katex/commander": ["commander@8.3.0", "", {}, "sha512-OkTL9umf+He2DZkUq8f8J9of7yL6RJKI24dVITBmNfZBmri9zYZQrKkuXiKhyfPSu8tUhnVBB1iKXevvnlR4Ww=="],

+    "launch-ide/chalk": ["chalk@4.1.2", "", { "dependencies": { "ansi-styles": "^4.1.0", "supports-color": "^7.1.0" } }, "sha512-oKnbhFyRIXpUuez8iBMmyEa4nbj4IOQyuhc/wy9kY7/WVPcwIO9VA668Pu8RkO7+0G76SLROeyw9CpQ061i4mA=="],
+
    "leva/react-dropzone": ["react-dropzone@12.1.0", "", { "dependencies": { "attr-accept": "^2.2.2", "file-selector": "^0.5.0", "prop-types": "^15.8.1" }, "peerDependencies": { "react": ">= 16.8" } }, "sha512-iBYHA1rbopIvtzokEX4QubO6qk5IF/x3BtKGu74rF2JkQDXnwC4uO/lHKpaw4PJIV6iIAYOlwLv2FpiGyqHNog=="],

    "mdast-util-find-and-replace/escape-string-regexp": ["escape-string-regexp@5.0.0", "", {}, "sha512-/veY75JbMK4j1yjvuUxuVsiS/hr/4iHs9FTT6cgTexxdE0Ly/glccBAkloH/DofkjRbZU3bnoj38mOmhkZ0lHw=="],
@@ -2201,6 +2240,8 @@

    "parse-entities/@types/unist": ["@types/unist@2.0.11", "", {}, "sha512-CmBKiL6NNo/OqgmMn95Fk9Whlp2mtvIv+KNpQKN2F4SjvrEesubTRWGYSg+BnWZOnlCaSTU1sMpsBOzgbYhnsA=="],

+    "parse5/entities": ["entities@6.0.0", "", {}, "sha512-aKstq2TDOndCn4diEyp9Uq/Flu2i1GlLkc6XIDQSDMuaFE3OPW5OphLCyQ5SpSJZTb4reN+kTcYru5yIfXoRPw=="],
+
    "path-scurry/lru-cache": ["lru-cache@11.2.2", "", {}, "sha512-F9ODfyqML2coTIsQpSkRHnLSZMtkU8Q+mSfcaIyKwy58u+8k5nvAYeiNhsyMARvzNcXJ9QfWVrcPsC9e9rAxtg=="],

    "prettier-package-json/commander": ["commander@4.1.1", "", {}, "sha512-NOKm8xhkzAjzFx8B2v5OAHT+u5pRQc2UCa2Vq9jYL/31o2wi9mxBA7LIFs3sV5VSC49z6pEhfbMULvShKj26WA=="],
@@ -2269,6 +2310,8 @@

    "@radix-ui/react-primitive/@radix-ui/react-slot/@radix-ui/react-compose-refs": ["@radix-ui/react-compose-refs@1.0.0", "", { "dependencies": { "@babel/runtime": "^7.13.10" }, "peerDependencies": { "react": "^16.8 || ^17.0 || ^18.0" } }, "sha512-0KaSv6sx787/hK3eF53iOkiSLwAGlFMx5lotrqD2pTjB18KbybKoEIgkNZTKC60YECDQTKGTRcDBILwZVqVKvA=="],

+    "@vue/compiler-core/@babel/parser/@babel/types": ["@babel/types@7.28.5", "", { "dependencies": { "@babel/helper-string-parser": "^7.27.1", "@babel/helper-validator-identifier": "^7.28.5" } }, "sha512-qQ5m48eI/MFLQ5PxQj4PFaprjyCTLI37ElWMmNs0K8Lk3dVeOdNpB3ks8jc7yM5CDmVC73eMVk/trk3fgmrUpA=="],
+
    "antd/scroll-into-view-if-needed/compute-scroll-into-view": ["compute-scroll-into-view@3.1.1", "", {}, "sha512-VRhuHOLoKYOy4UbilLbUzbYg93XLjv2PncJC50EuTWPA3gaja1UjBsUP/D/9/juV3vQFr6XBEzn9KCAHdUvOHw=="],

    "cytoscape-fcose/cose-base/layout-base": ["layout-base@2.0.1", "", {}, "sha512-dp3s92+uNI1hWIpPGH3jK2kxE2lMjdXdr+DH8ynZHpd6PUlH6x6cbuXnoMmiNumznqaNO31xu9e79F0uuZ0JFg=="],
@@ -2325,6 +2368,10 @@

    "@radix-ui/react-popper/@floating-ui/react-dom/@floating-ui/dom/@floating-ui/core": ["@floating-ui/core@0.7.3", "", {}, "sha512-buc8BXHmG9l82+OQXOFU3Kr2XQx9ys01U/Q9HMIrZ300iLc8HLMgh7dcCqgYzAzf4BkoQvDcXf5Y+CuEZ5JBYg=="],

+    "@vue/compiler-core/@babel/parser/@babel/types/@babel/helper-string-parser": ["@babel/helper-string-parser@7.27.1", "", {}, "sha512-qMlSxKbpRlAridDExk92nSobyDdpPijUq2DW6oDnUqd0iOGxmQjyqhMIihI9+zv4LPyZdRje2cavWPbCbWm3eA=="],
+
+    "@vue/compiler-core/@babel/parser/@babel/types/@babel/helper-validator-identifier": ["@babel/helper-validator-identifier@7.28.5", "", {}, "sha512-qSs4ifwzKJSV39ucNjsvc6WVHs6b7S03sOh2OcHF9UHfVPqWWALUsNUVzhSBiItjRZoLHx7nIarVjqKVusUZ1Q=="],
+
    "simplify-geojson/concat-stream/readable-stream/string_decoder": ["string_decoder@0.10.31", "", {}, "sha512-ev2QzSzWPYmy9GuqfIVildA4OdcGLeFZQrq5ys6RtiuF+RQQiZWr8TZNyAcuVXyQRYfEO+MsoB/1BuQVhOJuoQ=="],

    "sucrase/glob/minimatch/brace-expansion": ["brace-expansion@2.0.1", "", { "dependencies": { "balanced-match": "^1.0.0" } }, "sha512-XnAIvQ8eM+kC6aULx6wuQiwVsnzsi9d3WxzV3FpWTGA19F621kwdbsAcFKXgKUHZWsy+mY6iL1sHTxWEFCytDA=="],
--- a/web/package.json
+++ b/web/package.json
@@ -78,15 +78,16 @@
    "@so1ve/prettier-config": "^3.1.0",
    "@vitejs/plugin-react": "^4.2.1",
    "autoprefixer": "^10.4.21",
+    "code-inspector-plugin": "^1.3.3",
    "eslint": "8.57.0",
    "eslint-plugin-header": "^3.1.1",
    "eslint-plugin-react-hooks": "^5.2.0",
+    "i18next-cli": "^1.10.3",
    "postcss": "^8.5.3",
    "prettier": "^3.0.0",
    "tailwindcss": "^3",
    "typescript": "4.4.2",
-    "vite": "^5.2.0",
-    "i18next-cli": "^1.10.3"
+    "vite": "^5.2.0"
  },
  "prettier": {
    "singleQuote": true,
--- a/web/src/App.jsx
+++ b/web/src/App.jsx
@@ -42,6 +42,7 @@ import Midjourney from './pages/Midjourney';
 import Pricing from './pages/Pricing';
 import Task from './pages/Task';
 import ModelPage from './pages/Model';
+import ModelDeploymentPage from './pages/ModelDeployment';
 import Playground from './pages/Playground';
 import OAuth2Callback from './components/auth/OAuth2Callback';
 import PersonalSetting from './components/settings/PersonalSetting';
@@ -108,6 +109,14 @@ function App() {
            </AdminRoute>
          }
        />
+        <Route
+          path='/console/deployment'
+          element={
+            <AdminRoute>
+              <ModelDeploymentPage />
+            </AdminRoute>
+          }
+        />
        <Route
          path='/console/channel'
          element={
--- a/web/src/components/layout/SiderBar.jsx
+++ b/web/src/components/layout/SiderBar.jsx
@@ -45,6 +45,7 @@ const routerMap = {
  pricing: '/pricing',
  task: '/console/task',
  models: '/console/models',
+  deployment: '/console/deployment',
  playground: '/console/playground',
  personal: '/console/personal',
 };
@@ -157,6 +158,12 @@ const SiderBar = ({ onNavigate = () => {} }) => {
        to: '/console/models',
        className: isAdmin() ? '' : 'tableHiddle',
      },
+      {
+        text: t('模型部署'),
+        itemKey: 'deployment',
+        to: '/deployment',
+        className: isAdmin() ? '' : 'tableHiddle',
+      },
      {
        text: t('兑换码管理'),
        itemKey: 'redemption',
--- a/web/src/components/layout/components/SkeletonWrapper.jsx
+++ b/web/src/components/layout/components/SkeletonWrapper.jsx
@@ -52,7 +52,6 @@ const SkeletonWrapper = ({
            active
            placeholder={
              <Skeleton.Title
-                active
                style={{ width: isMobile ? 40 : width, height }}
              />
            }
@@ -71,7 +70,7 @@ const SkeletonWrapper = ({
          loading={true}
          active
          placeholder={
-            <Skeleton.Avatar active size='extra-small' className='shadow-sm' />
+            <Skeleton.Avatar size='extra-small' className='shadow-sm' />
          }
        />
        <div className='ml-1.5 mr-1'>
@@ -80,7 +79,6 @@ const SkeletonWrapper = ({
            active
            placeholder={
              <Skeleton.Title
-                active
                style={{ width: isMobile ? 15 : width, height: 12 }}
              />
            }
@@ -98,7 +96,6 @@ const SkeletonWrapper = ({
        active
        placeholder={
          <Skeleton.Image
-            active
            className={`absolute inset-0 !rounded-full ${className}`}
            style={{ width: '100%', height: '100%' }}
          />
@@ -113,7 +110,7 @@ const SkeletonWrapper = ({
      <Skeleton
        loading={true}
        active
-        placeholder={<Skeleton.Title active style={{ width, height: 24 }} />}
+        placeholder={<Skeleton.Title style={{ width, height: 24 }} />}
      />
    );
  };
@@ -125,7 +122,7 @@ const SkeletonWrapper = ({
        <Skeleton
          loading={true}
          active
-          placeholder={<Skeleton.Title active style={{ width, height }} />}
+          placeholder={<Skeleton.Title style={{ width, height }} />}
        />
      </div>
    );
@@ -140,7 +137,6 @@ const SkeletonWrapper = ({
          active
          placeholder={
            <Skeleton.Title
-              active
              style={{ width, height, borderRadius: 9999 }}
            />
          }
@@ -164,7 +160,7 @@ const SkeletonWrapper = ({
              loading={true}
              active
              placeholder={
-                <Skeleton.Avatar active size='extra-small' shape='square' />
+                <Skeleton.Avatar size='extra-small' shape='square' />
              }
            />
          </div>
@@ -174,7 +170,6 @@ const SkeletonWrapper = ({
            active
            placeholder={
              <Skeleton.Title
-                active
                style={{ width: width || 80, height: height || 14 }}
              />
            }
@@ -191,10 +186,7 @@ const SkeletonWrapper = ({
          loading={true}
          active
          placeholder={
-            <Skeleton.Title
-              active
-              style={{ width: width || 60, height: height || 12 }}
-            />
+            <Skeleton.Title style={{ width: width || 60, height: height || 12 }} />
          }
        />
      </div>
@@ -217,7 +209,6 @@ const SkeletonWrapper = ({
        active
        placeholder={
          <Skeleton.Avatar
-            active
            shape='square'
            style={{ width: ICON_SIZE, height: ICON_SIZE }}
          />
@@ -231,7 +222,6 @@ const SkeletonWrapper = ({
        active
        placeholder={
          <Skeleton.Title
-            active
            style={{ width: labelWidth, height: TEXT_HEIGHT }}
          />
        }
@@ -269,7 +259,6 @@ const SkeletonWrapper = ({
          active
          placeholder={
            <Skeleton.Avatar
-              active
              shape='square'
              style={{ width: ICON_SIZE, height: ICON_SIZE }}
            />
@@ -329,7 +318,6 @@ const SkeletonWrapper = ({
                    active
                    placeholder={
                      <Skeleton.Title
-                        active
                        style={{ width: sec.titleWidth, height: TITLE_HEIGHT }}
                      />
                    }
@@ -350,7 +338,6 @@ const SkeletonWrapper = ({
                    active
                    placeholder={
                      <Skeleton.Title
-                        active
                        style={{ width: sec.titleWidth, height: TITLE_HEIGHT }}
                      />
                    }
--- a/web/src/components/model-deployments/DeploymentAccessGuard.jsx
+++ b/web/src/components/model-deployments/DeploymentAccessGuard.jsx
@@ -0,0 +1,377 @@
+/*
+Copyright (C) 2025 QuantumNous
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU Affero General Public License as
+published by the Free Software Foundation, either version 3 of the
+License, or (at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU Affero General Public License for more details.
+
+You should have received a copy of the GNU Affero General Public License
+along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+For commercial licensing, please contact support@quantumnous.com
+*/
+
+import React from 'react';
+import { Card, Button, Typography } from '@douyinfe/semi-ui';
+import { useTranslation } from 'react-i18next';
+import { useNavigate } from 'react-router-dom';
+import { Settings, Server, AlertCircle, WifiOff } from 'lucide-react';
+
+const { Title, Text } = Typography;
+
+const DeploymentAccessGuard = ({
+  children,
+  loading,
+  isEnabled,
+  connectionLoading,
+  connectionOk,
+  connectionError,
+  onRetry,
+}) => {
+  const { t } = useTranslation();
+  const navigate = useNavigate();
+
+  const handleGoToSettings = () => {
+    navigate('/console/setting?tab=model-deployment');
+  };
+
+  if (loading) {
+    return (
+      <div className='mt-[60px] px-2'>
+        <Card loading={true} style={{ minHeight: '400px' }}>
+          <div style={{ textAlign: 'center', padding: '50px 0' }}>
+            <Text type="secondary">{t('加载设置中...')}</Text>
+          </div>
+        </Card>
+      </div>
+    );
+  }
+
+  if (!isEnabled) {
+    return (
+      <div 
+        className='mt-[60px] px-4' 
+        style={{
+          minHeight: 'calc(100vh - 60px)',
+          display: 'flex',
+          alignItems: 'center',
+          justifyContent: 'center'
+        }}
+      >
+        <div 
+          style={{
+            maxWidth: '600px',
+            width: '100%',
+            textAlign: 'center',
+            padding: '0 20px'
+          }}
+        >
+          <Card
+            style={{
+              padding: '60px 40px',
+              borderRadius: '16px',
+              border: '1px solid var(--semi-color-border)',
+              boxShadow: '0 4px 20px rgba(0, 0, 0, 0.08)',
+              background: 'linear-gradient(135deg, var(--semi-color-bg-0) 0%, var(--semi-color-fill-0) 100%)'
+            }}
+          >
+            {/* 图标区域 */}
+            <div style={{ marginBottom: '32px' }}>
+              <div style={{ 
+                display: 'inline-flex', 
+                alignItems: 'center',
+                justifyContent: 'center',
+                width: '120px',
+                height: '120px',
+                borderRadius: '50%',
+                background: 'linear-gradient(135deg, rgba(var(--semi-orange-4), 0.15) 0%, rgba(var(--semi-orange-5), 0.1) 100%)',
+                border: '3px solid rgba(var(--semi-orange-4), 0.3)',
+                marginBottom: '24px'
+              }}>
+                <AlertCircle size={56} color="var(--semi-color-warning)" />
+              </div>
+            </div>
+
+            {/* 标题区域 */}
+            <div style={{ marginBottom: '24px' }}>
+              <Title 
+                heading={2} 
+                style={{ 
+                  color: 'var(--semi-color-text-0)', 
+                  margin: '0 0 12px 0',
+                  fontSize: '28px',
+                  fontWeight: '700'
+                }}
+              >
+                {t('模型部署服务未启用')}
+              </Title>
+              <Text 
+                style={{ 
+                  fontSize: '18px', 
+                  lineHeight: '1.6',
+                  color: 'var(--semi-color-text-1)',
+                  display: 'block'
+                }}
+              >
+                {t('访问模型部署功能需要先启用 io.net 部署服务')}
+              </Text>
+            </div>
+
+            {/* 配置要求区域 */}
+            <div 
+              style={{ 
+                backgroundColor: 'var(--semi-color-bg-1)', 
+                padding: '24px', 
+                borderRadius: '12px',
+                border: '1px solid var(--semi-color-border)',
+                margin: '32px 0',
+                boxShadow: '0 2px 8px rgba(0, 0, 0, 0.04)'
+              }}
+            >
+              <div style={{ 
+                display: 'flex', 
+                alignItems: 'center', 
+                justifyContent: 'center',
+                gap: '12px', 
+                marginBottom: '16px' 
+              }}>
+                <div style={{
+                  display: 'flex',
+                  alignItems: 'center',
+                  justifyContent: 'center',
+                  width: '32px',
+                  height: '32px',
+                  borderRadius: '8px',
+                  backgroundColor: 'rgba(var(--semi-blue-4), 0.15)'
+                }}>
+                  <Server size={20} color="var(--semi-color-primary)" />
+                </div>
+                <Text 
+                  strong 
+                  style={{ 
+                    fontSize: '16px', 
+                    color: 'var(--semi-color-text-0)' 
+                  }}
+                >
+                  {t('需要配置的项目')}
+                </Text>
+              </div>
+              
+              <div style={{ 
+                display: 'flex', 
+                flexDirection: 'column', 
+                gap: '12px',
+                alignItems: 'flex-start',
+                textAlign: 'left',
+                maxWidth: '320px',
+                margin: '0 auto'
+              }}>
+                <div style={{ display: 'flex', alignItems: 'center', gap: '12px' }}>
+                  <div style={{
+                    width: '6px',
+                    height: '6px',
+                    borderRadius: '50%',
+                    backgroundColor: 'var(--semi-color-primary)',
+                    flexShrink: 0
+                  }}></div>
+                  <Text style={{ fontSize: '15px', color: 'var(--semi-color-text-1)' }}>
+                    {t('启用 io.net 部署开关')}
+                  </Text>
+                </div>
+                <div style={{ display: 'flex', alignItems: 'center', gap: '12px' }}>
+                  <div style={{
+                    width: '6px',
+                    height: '6px',
+                    borderRadius: '50%',
+                    backgroundColor: 'var(--semi-color-primary)',
+                    flexShrink: 0
+                  }}></div>
+                  <Text style={{ fontSize: '15px', color: 'var(--semi-color-text-1)' }}>
+                    {t('配置有效的 io.net API Key')}
+                  </Text>
+                </div>
+              </div>
+            </div>
+
+            {/* 操作链接区域 */}
+            <div style={{ marginBottom: '20px' }}>
+              <div 
+                onClick={handleGoToSettings}
+                style={{ 
+                  display: 'inline-flex',
+                  alignItems: 'center',
+                  gap: '8px',
+                  cursor: 'pointer',
+                  padding: '12px 24px',
+                  borderRadius: '8px',
+                  fontSize: '16px',
+                  fontWeight: '500',
+                  color: 'var(--semi-color-primary)',
+                  background: 'var(--semi-color-fill-0)',
+                  border: '1px solid var(--semi-color-border)',
+                  transition: 'all 0.2s ease',
+                  textDecoration: 'none'
+                }}
+                onMouseEnter={(e) => {
+                  e.target.style.background = 'var(--semi-color-fill-1)';
+                  e.target.style.transform = 'translateY(-1px)';
+                  e.target.style.boxShadow = '0 2px 8px rgba(0, 0, 0, 0.1)';
+                }}
+                onMouseLeave={(e) => {
+                  e.target.style.background = 'var(--semi-color-fill-0)';
+                  e.target.style.transform = 'translateY(0)';
+                  e.target.style.boxShadow = 'none';
+                }}
+              >
+                <Settings size={18} />
+                {t('前往设置页面')}
+              </div>
+            </div>
+
+            {/* 底部提示 */}
+            <Text 
+              type="tertiary" 
+              style={{ 
+                fontSize: '14px',
+                color: 'var(--semi-color-text-2)',
+                lineHeight: '1.5'
+              }}
+            >
+              {t('配置完成后刷新页面即可使用模型部署功能')}
+            </Text>
+          </Card>
+        </div>
+      </div>
+    );
+  }
+
+  if (connectionLoading || (connectionOk === null && !connectionError)) {
+    return (
+      <div className='mt-[60px] px-2'>
+        <Card loading={true} style={{ minHeight: '400px' }}>
+          <div style={{ textAlign: 'center', padding: '50px 0' }}>
+            <Text type="secondary">{t('Checking io.net connection...')}</Text>
+          </div>
+        </Card>
+      </div>
+    );
+  }
+
+  if (connectionOk === false) {
+    const isExpired = connectionError?.type === 'expired';
+    const title = isExpired
+      ? t('API key expired')
+      : t('io.net connection unavailable');
+    const description = isExpired
+      ? t('The current API key is expired. Please update it in settings.')
+      : t('Unable to connect to io.net with the current configuration.');
+    const detail = connectionError?.message || '';
+
+    return (
+      <div
+        className='mt-[60px] px-4'
+        style={{
+          minHeight: 'calc(100vh - 60px)',
+          display: 'flex',
+          alignItems: 'center',
+          justifyContent: 'center',
+        }}
+      >
+        <div
+          style={{
+            maxWidth: '600px',
+            width: '100%',
+            textAlign: 'center',
+            padding: '0 20px',
+          }}
+        >
+          <Card
+            style={{
+              padding: '60px 40px',
+              borderRadius: '16px',
+              border: '1px solid var(--semi-color-border)',
+              boxShadow: '0 4px 20px rgba(0, 0, 0, 0.08)',
+              background: 'linear-gradient(135deg, var(--semi-color-bg-0) 0%, var(--semi-color-fill-0) 100%)',
+            }}
+          >
+            <div style={{ marginBottom: '32px' }}>
+              <div
+                style={{
+                  display: 'inline-flex',
+                  alignItems: 'center',
+                  justifyContent: 'center',
+                  width: '120px',
+                  height: '120px',
+                  borderRadius: '50%',
+                  background: 'linear-gradient(135deg, rgba(var(--semi-red-4), 0.15) 0%, rgba(var(--semi-red-5), 0.1) 100%)',
+                  border: '3px solid rgba(var(--semi-red-4), 0.3)',
+                  marginBottom: '24px',
+                }}
+              >
+                <WifiOff size={56} color="var(--semi-color-danger)" />
+              </div>
+            </div>
+
+            <div style={{ marginBottom: '24px' }}>
+              <Title
+                heading={2}
+                style={{
+                  color: 'var(--semi-color-text-0)',
+                  margin: '0 0 12px 0',
+                  fontSize: '28px',
+                  fontWeight: '700',
+                }}
+              >
+                {title}
+              </Title>
+              <Text
+                style={{
+                  fontSize: '18px',
+                  lineHeight: '1.6',
+                  color: 'var(--semi-color-text-1)',
+                  display: 'block',
+                }}
+              >
+                {description}
+              </Text>
+              {detail ? (
+                <Text
+                  type="tertiary"
+                  style={{
+                    fontSize: '14px',
+                    lineHeight: '1.5',
+                    display: 'block',
+                    marginTop: '8px',
+                  }}
+                >
+                  {detail}
+                </Text>
+              ) : null}
+            </div>
+
+            <div style={{ display: 'flex', gap: '12px', justifyContent: 'center' }}>
+              <Button type="primary" icon={<Settings size={18} />} onClick={handleGoToSettings}>
+                {t('Go to settings')}
+              </Button>
+              {onRetry ? (
+                <Button type="tertiary" onClick={onRetry}>
+                  {t('Retry connection')}
+                </Button>
+              ) : null}
+            </div>
+          </Card>
+        </div>
+      </div>
+    );
+  }
+
+  return children;
+};
+
+export default DeploymentAccessGuard;
--- a/web/src/components/settings/ModelDeploymentSetting.jsx
+++ b/web/src/components/settings/ModelDeploymentSetting.jsx
@@ -0,0 +1,85 @@
+/*
+Copyright (C) 2025 QuantumNous
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU Affero General Public License as
+published by the Free Software Foundation, either version 3 of the
+License, or (at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU Affero General Public License for more details.
+
+You should have received a copy of the GNU Affero General Public License
+along with this program. If not, see <https://www.gnu.org/licenses/>.
+
+For commercial licensing, please contact support@quantumnous.com
+*/
+
+import React, { useEffect, useState } from 'react';
+import { Card, Spin } from '@douyinfe/semi-ui';
+import { API, showError, toBoolean } from '../../helpers';
+import { useTranslation } from 'react-i18next';
+import SettingModelDeployment from '../../pages/Setting/Model/SettingModelDeployment';
+
+const ModelDeploymentSetting = () => {
+  const { t } = useTranslation();
+  let [inputs, setInputs] = useState({
+    'model_deployment.ionet.api_key': '',
+    'model_deployment.ionet.enabled': false,
+  });
+
+  let [loading, setLoading] = useState(false);
+
+  const getOptions = async () => {
+    const res = await API.get('/api/option/');
+    const { success, message, data } = res.data;
+    if (success) {
+      let newInputs = {
+        'model_deployment.ionet.api_key': '',
+        'model_deployment.ionet.enabled': false,
+      };
+      
+      data.forEach((item) => {
+        if (item.key.endsWith('Enabled') || item.key.endsWith('enabled')) {
+          newInputs[item.key] = toBoolean(item.value);
+        } else {
+          newInputs[item.key] = item.value;
+        }
+      });
+
+      setInputs(newInputs);
+    } else {
+      showError(message);
+    }
+  };
+
+  async function onRefresh() {
+    try {
+      setLoading(true);
+      await getOptions();
+    } catch (error) {
+      showError('刷新失败');
+      console.error(error);
+    } finally {
+      setLoading(false);
+    }
+  }
+
+  useEffect(() => {
+    onRefresh();
+  }, []);
+
+  return (
+    <>
+      <Spin spinning={loading} size='large'>
+        <Card style={{ marginTop: '10px' }}>
+          <SettingModelDeployment options={inputs} refresh={onRefresh} />
+        </Card>
+      </Spin>
+    </>
+  );
+};
+
+export default ModelDeploymentSetting;
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
CaIon	48d358faec	feat(adaptor): 新适配百炼多种图片生成模型 - wan2.6系列生图与编辑，适配多图生成计费 - wan2.5系列生图与编辑 - z-image-turbo生图，适配prompt_extend计费	2025-12-29 23:00:17 +08:00
Seefs	8063897998	fix: glm 4.7 finish reason (#2545 )	2025-12-29 19:41:15 +08:00
Seefs	923dfbeecb	Merge pull request #2544 from seefs001/feature/wan-2.6	2025-12-29 14:53:31 +08:00
Seefs	24d359cf40	feat: Add "wan2.6-i2v" video ratio configuration to Ali adaptor.	2025-12-29 14:13:33 +08:00
Seefs	725d61c5d3	feat: ionet integrate (#2105 ) * wip ionet integrate * wip ionet integrate * wip ionet integrate * ollama wip * wip * feat: ionet integration & ollama manage * fix merge conflict * wip * fix: test conn cors * wip * fix ionet * fix ionet * wip * fix model select * refactor: Remove `pkg/ionet` test files and update related Go source and web UI model deployment components. * feat: Enhance model deployment UI with styling improvements, updated text, and a new description component. * Revert "feat: Enhance model deployment UI with styling improvements, updated text, and a new description component." This reverts commit 8b75cb5bf0d1a534b339df8c033be9a6c7df7964.	2025-12-28 15:55:35 +08:00
Seefs	1a69a93d20	Merge pull request #2536 from RedwindA/feat/oaiDevRole2Gemini	2025-12-28 15:52:45 +08:00
RedwindA	1de78f8749	feat: map OpenAI developer role to Gemini system instructions	2025-12-27 02:52:33 +08:00
skynono	9aeef6abec	feat: support first bind update password (#2520 )	2025-12-26 13:59:56 +08:00
Seefs	58db72d459	fix: Fix Openrouter test errors and optimize error messages (#2433 ) * fix: Refine openrouter error * fix: Refine openrouter error * fix: openrouter test max_output_token * fix: optimize messages * fix: maxToken unified to 16 * fix: codex系列模型使用 responses接口 * fix: codex系列模型使用 responses接口 * fix: 状态码非200打印错误信息 * fix: 日志里没有报错的响应体	2025-12-26 13:58:44 +08:00
Calcium-Ion	654bb10b45	Merge pull request #2460 from seefs001/feature/gemini-flash-minial fix(gemini): handle minimal reasoning effort budget	2025-12-26 13:57:56 +08:00
Seefs	f51b5bb0c8	Merge pull request #2455 from comeback01/french-translation	2025-12-26 13:56:30 +08:00
Calcium-Ion	a4cd84f276	Merge pull request #2450 from seefs001/fix/gemini-system-prompt fix: 支持传入system_instruction和systemInstruction两种风格系统提示词参数名	2025-12-26 13:54:21 +08:00
Calcium-Ion	c722ddd58b	Merge pull request #2512 from seefs001/fix/warning-pass-through-body fix: add warning for pass through body	2025-12-26 13:52:51 +08:00
Calcium-Ion	88e394a976	Merge pull request #2513 from seefs001/fix/token-auth-bearer fix: 支持小写bearer和Bearer后带多个空格 && 修复 WSS预扣费错误提取key的问题	2025-12-26 13:51:32 +08:00
Seefs	31a3487139	Merge pull request #2528 from QuantumNous/fix/model-sync-overwrite-empty-missing	2025-12-26 13:49:55 +08:00
Seefs	a07406d97e	Merge pull request #2530 from RedwindA/fix/i18n-with-http	2025-12-26 13:49:30 +08:00
RedwindA	f68858121c	fix(i18n): disable namespace separator to fix URL display in translations i18next uses ':' as namespace separator by default, causing URLs like 'https://api.openai.com' to be incorrectly parsed as namespace 'https' with key '//api.openai.com', resulting in truncated display. Setting nsSeparator to false fixes this issue since the project doesn't use multiple namespaces.	2025-12-26 00:10:19 +08:00
t0ng7u	83fbaba768	🚀 fix(model-sync): avoid unnecessary upstream fetch while keeping overwrite updates working - Only short-circuit when there are no missing models AND no overwrite fields requested - Preserve overwrite behavior even when the missing-model list is empty - Always return empty arrays (not null) for list fields to keep API responses stable - Clarify SyncUpstreamModels behavior in comments (create missing models + optional overwrite updates)	2025-12-25 23:01:09 +08:00
Calcium-Ion	d3c854fbed	Merge pull request #2154 from feitianbubu/pr/fix-model-sync fix: ensure overwrite works correctly when no missing models	2025-12-25 22:34:49 +08:00
Calcium-Ion	97b02685b1	Merge pull request #2475 from seefs001/feature/pyro feat: pyroscope integrate	2025-12-25 17:54:39 +08:00
Seefs	da1b51ac31	Merge branch 'upstream-main' into feature/pyro	2025-12-25 17:08:02 +08:00
CaIon	f17b3810d6	feat(user): simplify user response structure in JSON output	2025-12-25 15:39:58 +08:00
Calcium-Ion	8206084a77	Merge pull request #2524 from seefs001/fix/revert-model-ratio fix: revert model ratio	2025-12-25 15:38:36 +08:00
Seefs	559da6362a	fix: revert model ratio	2025-12-25 15:37:54 +08:00
Calcium-Ion	0b1a562df9	Merge pull request #2477 from 1420970597/fix/anthropic-cache-billing fix: 修复 Anthropic 渠道缓存计费错误	2025-12-24 16:59:23 +08:00
Seefs	a0c3d37d66	Merge pull request #2493 from shikaiwei1/patch-1	2025-12-24 16:52:24 +08:00
Seefs	347f2326f3	Merge pull request #2511 from JerryKwan/issue2499	2025-12-24 16:51:51 +08:00
Seefs	14c58aea77	fix: 支持小写bearer和Bearer后带多个空格 && 修复 WSS预扣费错误提取key的问题	2025-12-24 15:52:56 +08:00
Seefs	09f3957362	fix: add warning for pass through body	2025-12-24 15:35:36 +08:00
Jerry	31a79620ba	Resolving event mismatch in OpenAI2Claude add stricter validation for content_block_start corresponding to tool call and fix the crash issue when Claude Code is processing tool call	2025-12-24 14:52:39 +08:00
Calcium-Ion	12555a37d3	Merge pull request #2510 from feitianbubu/pr/0e7050dc89c1b761069f5e528d8ecf786e7008ae 修复claudeResponse流式请求空指针Panic	2025-12-24 14:15:51 +08:00
feitianbubu	3652dfdbd5	fix: check claudeResponse delta StopReason nil point	2025-12-24 11:54:23 +08:00
CaIon	42109c5840	feat(token): enhance error handling in ValidateUserToken for better clarity	2025-12-22 18:01:38 +08:00
John Chen	dbaba87c39	为Moonshot添加缓存tokens读取逻辑为Moonshot添加缓存tokens读取逻辑。其与智普V4的逻辑相同，所以共用逻辑	2025-12-22 17:05:16 +08:00
Calcium-Ion	afd9c29ace	Merge pull request #2486 from QuantumNous/docs/readme-update-doc-links-new-routing 🔗 docs(readme): update documentation links to new site routing	2025-12-21 21:28:35 +08:00
t0ng7u	470e0304d8	🔗 docs(readme): revert missing docs links to legacy site Keep new-site links (/{lang}/docs/...) where matching pages exist in the current docs repo Revert links that have no equivalent in the new docs to the legacy paths on doc.newapi.pro: Google Gemini Chat Midjourney-Proxy image docs Suno music docs Apply the same rule consistently across all README translations (zh/en/ja/fr)	2025-12-21 21:18:59 +08:00
t0ng7u	d6e97ab184	🔗 docs(readme): update documentation links to new site routing - Replace legacy `docs.newapi.pro` paths with the new `/{lang}/docs/...` structure across all README translations - Point key sections (installation, env vars, API, support, features) to their new locations - Ensure language-specific links use the correct locale prefix (zh/en/ja) and keep FR aligned with English routes	2025-12-21 21:00:33 +08:00
Calcium-Ion	d8aa327f05	Merge pull request #2483 from seefs001/fix/vertex-function-response-id fix: 模型设置增加针对Vertex渠道过滤content[].part[].functionResponse.id的选项，默认启用	2025-12-21 17:24:07 +08:00
Seefs	28f7a4feef	fix: 在Vertex Adapter过滤content[].part[].functionResponse.id	2025-12-21 17:22:04 +08:00
Seefs	5a64ae2a29	fix: 模型设置增加针对Vertex渠道过滤content[].part[].functionResponse.id的选项，默认启用	2025-12-21 17:09:49 +08:00
comeback01	f04ed7584a	Merge branch 'main' into french-translation	2025-12-20 11:08:07 +01:00
长安	0a2f12c04e	fix: 修复 Anthropic 渠道缓存计费错误 ## 问题描述当使用 Anthropic 渠道通过 `/v1/chat/completions` 端点调用且启用缓存功能时，计费逻辑错误地减去了缓存 tokens，导致严重的收入损失（94.5%）。 ## 根本原因不同 API 的 `prompt_tokens` 定义不同： - Anthropic API: `input_tokens` 字段已经是纯输入 tokens（不包含缓存） - OpenAI API: `prompt_tokens` 字段包含所有 tokens（包含缓存） - OpenRouter API: `prompt_tokens` 字段包含所有 tokens（包含缓存）当前 `postConsumeQuota` 函数对所有渠道都减去缓存 tokens，这对 Anthropic 渠道是错误的，因为其 `input_tokens` 已经不包含缓存。 ## 修复方案在 `relay/compatible_handler.go` 的 `postConsumeQuota` 函数中，添加渠道类型判断： ```go if relayInfo.ChannelType != constant.ChannelTypeAnthropic { baseTokens = baseTokens.Sub(dCacheTokens) } ``` 只对非 Anthropic 渠道减去缓存 tokens。 ## 影响分析 ### ✅ 不受影响的场景 1. 无缓存调用（所有渠道） - cache_tokens = 0 - 减去 0 = 不减去 - 结果：完全一致 2. OpenAI/OpenRouter 渠道 + 缓存 - 继续减去缓存（因为 ChannelType != Anthropic） - 结果：完全一致 3. Anthropic 渠道 + /v1/messages 端点 - 使用 PostClaudeConsumeQuota（不修改） - 结果：完全不受影响 ### ✅ 修复的场景 4. Anthropic 渠道 + /v1/chat/completions + 缓存 - 修复前：错误地减去缓存，导致 94.5% 收入损失 - 修复后：不减去缓存，计费正确 ## 验证数据以实际记录 143509 为例： \| 项目 \| 修复前 \| 修复后 \| 差异 \| \|------\|--------\|--------\|------\| \| Quota \| 10,489 \| 191,330 \| +180,841 \| \| 费用 \| ¥0.020978 \| ¥0.382660 \| +¥0.361682 \| \| 收入恢复 \| - \| - \| +1724.1% \| ## 测试建议 1. 测试 Anthropic 渠道 + 缓存场景 2. 测试 OpenAI 渠道 + 缓存场景（确保不受影响） 3. 测试无缓存场景（确保不受影响） ## 相关 Issue 修复 Anthropic 渠道使用 prompt caching 时的计费错误。	2025-12-20 14:17:12 +08:00
CaIon	cc3ba39e72	feat(gin): improve request body handling and error reporting	2025-12-20 13:34:10 +08:00
CaIon	4ee595c448	feat(init): increase MaxRequestBodyMB to enhance request handling	2025-12-20 13:27:55 +08:00
CaIon	d9634ad2d3	feat(channel): add error handling for SaveWithoutKey when channel ID is 0	2025-12-20 13:26:40 +08:00
Seefs	a343ce84ee	Merge pull request #2476 from TinsFox/chore/code-inspector-plugin	2025-12-20 11:04:40 +08:00
Seefs	531dfb2555	docs: document pyroscope env var	2025-12-19 23:16:56 +08:00
TinsFox	e6ec551fbf	chore: add code-inspector-plugin integration	2025-12-19 23:04:53 +08:00
Seefs	5ef7247eac	docs: document pyroscope env var	2025-12-19 23:03:04 +08:00
Seefs	1168ddf9f9	fix: systemname	2025-12-19 22:27:35 +08:00
Seefs	a98aad2501	Merge pull request #2474 from TinsFox/main	2025-12-19 21:39:56 +08:00
TinsFox	97132de2ca	style: add card spacing	2025-12-19 21:00:31 +08:00
Seefs	da24a165d0	fix(gemini): handle minimal reasoning effort budget - Add minimal case to clampThinkingBudgetByEffort to avoid defaulting to full thinking budget	2025-12-18 08:10:46 +08:00
comeback01	f88fc26150	Refine French translations for UI conciseness Updated web/src/i18n/locales/fr.json to improve French translations for the user interface. Removed verbose prefixes like 'Gestion des...' and 'Paramètres de...' to prevent truncation in sidebars and menus. Harmonized terms for consistency (e.g., 'Tâches', 'Journaux', 'Dessins'). Renamed 'Place du marché' to 'Marché des modèles'.	2025-12-17 12:10:36 +01:00
Seefs	b35ae9f693	Merge pull request #2452 from QuantumNous/fix/oom-request-body-limit	2025-12-16 18:21:59 +08:00
t0ng7u	8cb56fc319	🧹 fix: harden request-body size handling and error unwrapping Tighten oversized request handling across relay paths and make error matching reliable. - Align `MAX_REQUEST_BODY_MB` fallback to `32` in request body reader and decompression middleware - Stop ignoring `GetRequestBody` errors in relay retry paths; return consistent 413 on oversized bodies (400 for other read errors) - Add `Unwrap()` to `types.NewAPIError` so `errors.Is/As` can match wrapped underlying errors - `go test ./...` passes	2025-12-16 18:10:00 +08:00
t0ng7u	8e3f9b1faa	🛡️ fix: prevent OOM on large/decompressed requests; skip heavy prompt meta when token count is disabled Clamp request body size (including post-decompression) to avoid memory exhaustion caused by huge payloads/zip bombs, especially with large-context Claude requests. Add a configurable `MAX_REQUEST_BODY_MB` (default `32`) and document it. - Enforce max request body size after gzip/br decompression via `http.MaxBytesReader` - Add a secondary size guard in `common.GetRequestBody` and cache-safe handling - Return 413 Request Entity Too Large on oversized bodies in relay entry - Avoid building large `TokenCountMeta.CombineText` when both token counting and sensitive check are disabled (use lightweight meta for pricing) - Update READMEs (CN/EN/FR/JA) with `MAX_REQUEST_BODY_MB` - Fix a handful of vet/formatting issues encountered during the change - `go test ./...` passes	2025-12-16 17:00:19 +08:00
Seefs	2a511c6ee4	fix: 支持传入system_instruction和systemInstruction两种风格系统提示词参数名	2025-12-16 13:08:58 +08:00
Calcium-Ion	11593bd3da	Merge pull request #2445 from QuantumNous/feat/token-ip-whitelist-cidr feat(auth): enhance IP restriction handling with CIDR support	2025-12-15 20:14:09 +08:00
CaIon	e16e7d6fb9	feat(auth): refactor IP restriction handling to use clearer variable naming	2025-12-15 20:13:09 +08:00
CaIon	39593052b6	feat(auth): enhance IP restriction handling with CIDR support	2025-12-15 17:24:09 +08:00
CaIon	4ea8cbd207	Revert "feat(audio): replace SysLog with logger for improved logging in GetAudioDuration" This reverts commit `e293be0138`.	2025-12-14 00:04:40 +08:00
CaIon	e293be0138	feat(audio): replace SysLog with logger for improved logging in GetAudioDuration	2025-12-13 23:59:58 +08:00
CaIon	9c2483ef48	fix(audio): improve WAV duration calculation with enhanced PCM size handling	2025-12-13 23:57:32 +08:00
CaIon	689c43143b	feat(model_ratio): add default ratios for gpt-4o-mini-tts	2025-12-13 19:14:27 +08:00
CaIon	a2da6a9e90	refactor(channel_select): improve retry logic with reset functionality	2025-12-13 18:09:10 +08:00
Calcium-Ion	7a307e2e99	Merge pull request #2434 from QuantumNous/feat/gpt-4o-mini-tts feat: support gpt tts series model quota calculate	2025-12-13 17:55:16 +08:00
CaIon	7cae4a640b	fix(audio): correct TotalTokens calculation for accurate usage reporting	2025-12-13 17:49:57 +08:00
CaIon	e36e2e1b69	feat(audio): enhance audio request handling with token type detection and streaming support	2025-12-13 17:24:23 +08:00
CaIon	b602843ce1	feat(token): add CrossGroupRetry field to token insertion	2025-12-13 16:45:42 +08:00
CaIon	21fca238bf	refactor(error): replace dto.OpenAIError with types.OpenAIError for consistency	2025-12-13 16:43:57 +08:00
CaIon	c51936e068	refactor(channel_select): enhance retry logic and context key usage for channel selection	2025-12-13 16:43:38 +08:00
Seefs	fcafadc6bb	feat: pyroscope integrate	2025-12-13 13:49:38 +08:00
CaIon	b58fa3debc	fix(helper): improve error handling in FlushWriter and related functions	2025-12-13 13:29:21 +08:00
CaIon	1c167c1068	refactor(auth): replace direct token group setting with context key retrieval	2025-12-13 01:38:12 +08:00
feitianbubu	35538ecb3b	fix: ensure overwrite works correctly when no missing models	2025-11-03 17:50:00 +08:00