qwen3.5_fix

A high-performance proxy middleware to fix Qwen3.5 tool calling issues and enable seamless MCP client compatibility.

Qwen3.5 Tool Fix Proxy Server

Author: Le Xiaodong
A high-performance proxy middleware designed to enable Qwen3.5 to work seamlessly with MCP clients (like Cline/Pencil) by fixing tool call formatting issues in real-time.

A proxy middleware layer that fixes tool call formatting issues for Qwen3.5 models running on Sglang / vLLM / Ollama / llama.cpp servers.

🚀 Project Purpose

This project runs between the client and LLM backend server, aiming to fix common tool call defects in Qwen3.5 deployments, enabling seamless collaboration with MCP (Model Context Protocol) clients.

Core Fixes:

XML Tool Call Extraction: Extract <function=...> XML tags leaked in content into standard tool_calls format.
Thinking Tag Cleanup: Automatically remove `` tags remaining in the text to prevent context pollution.
Finish Reason Correction: Correct incorrect stop reasons to tool_use, ensuring Agents can trigger tool execution logic.
Response Normalization: Ensure all outputs strictly follow OpenAI or Anthropic API specifications.
JSON Parameter Parsing: Enhanced JSON parsing with fallback strategies for MCP tool arguments.

✨ Key Features

🌟 Multi-Backend Support

Sglang/Qwen Backend: Native support, optimized for Qwen3.5.
Anthropic Claude Backend: Full support, acts as a unified gateway.
OpenAI API Backend: Compatibility mode, supports various OpenAI-compatible interfaces.

🔄 Intelligent Routing & Compatibility

Auto-Routing: Automatically selects the backend based on the model parameter.
Protocol Conversion: Provides Anthropic Messages API format support on local backends.

🛡️ Enterprise-Grade Reliability

Connection Pooling: Reduces TCP handshake overhead and improves response speed.
Exponential Backoff Retries: Handles network jitter to enhance stability.
Stream Retry Support: Dedicated retry logic for streaming requests.
Asynchronous High Concurrency: Based on FastAPI and httpx architecture.

🔧 MCP Enhancements

Unique Tool Call ID Generation: Uses tool_{uuid} format to prevent ID collisions.
Unified Parameter Type Conversion: Handles string/object mixed cases.
Request/Response Validation: Structured validation for messages and parameters.
Type-Safe Error Handling: MCPError and ErrorType enum for structured errors.
JSON Fallback Parsing: Multiple strategies for parsing malformed JSON.

📐 Architecture

graph LR
    Client[Cline / MCP Client] -- "OpenAI/Anthropic API" --> Proxy[Qwen Tool Fix Proxy]
    Proxy -- "Fixed/Normalized Request" --> Backend[Sglang / vLLM / Claude]
    Backend -- "Raw Response (with XML leaks)" --> Proxy
    Proxy -- "Cleaned Response (Standard Tool Calls)" --> Client

🛠️ Quick Start

1. Environment Preparation

Python 3.12+

Install dependencies:

pip install -r qwen_tool_fix/requirements.txt

2. Configuration

Copy qwen_tool_fix/.env.example to qwen_tool_fix/.env and modify:

# Proxy Server Configuration
PORT=8123

# Backend API Configuration
BACKEND_TYPE=sglang
BACKEND_URL=http://127.0.0.1:5005
API_KEY=empty

# Other Configuration
TIMEOUT=120.0
LOG_LEVEL=INFO
ENABLE_CORS=true
ALLOWED_ORIGINS=*

3. Start Server

Option 1: Command Line (Recommended)

# Basic usage
python start_proxy.py --port 8123

# With custom backend URL
python start_proxy.py --port 8123 --backend-url http://localhost:5005

# With all options
python start_proxy.py --port 8123 --backend-url http://localhost:5005 --log-level DEBUG

# Development mode (auto-reload)
python start_proxy.py --reload

Command Line Options:

Option	Short	Default	Description
`--port`	`-p`	8123	Proxy server port
`--host`	`-H`	0.0.0.0	Listen address
`--backend-url`	`-b`	http://127.0.0.1:5005	Backend server URL
`--backend-model`	`-m`	qwen3.5-27b	Backend model name
`--log-level`	`-l`	INFO	Log level (DEBUG/INFO/WARNING/ERROR)
`--reload`	-	False	Enable auto-reload (development mode)

Option 2: Batch Scripts

Use the provided scripts in qwen_tool_fix/:

Windows: Run qwen_tool_fix/run_proxy.bat
Linux/Mac: Run bash qwen_tool_fix/run_proxy.sh

Option 3: Module Start

python -m qwen_tool_fix.proxy_server

4. Client Configuration (e.g., Cline)

In Cline settings, change the API Endpoint to:
http://127.0.0.1:8123/v1

📊 API Reference

Endpoint	Method	Description	Compatibility
`/v1/chat/completions`	`POST`	Chat Completion	OpenAI
`/v1/messages`	`POST`	Messages API	Anthropic
`/health`	`GET`	Health Check	-
`/models`	`GET`	Model List	OpenAI

🧑‍💻 Developer Guide

Using parsing tools directly

from qwen_tool_fix.tool_parser import parse_qwen_xml_tools, strip_think_tags, parse_json_with_fallback

# Example 1: Extract tool calls
text_with_tools = "Please execute <function=bash><parameter=command>ls</parameter></function>"
tools = parse_qwen_xml_tools(text_with_tools)

# Example 2: Remove think tags
text_with_think = " Thinking... The answer is 42"
cleaned = strip_think_tags(text_with_think)

# Example 3: Parse JSON with fallback
json_text = "{'key': 'value'}"  # Single quotes
parsed = parse_json_with_fallback(json_text)  # Returns {'key': 'value'}

Using MCP Enhancements Module

from qwen_tool_fix.mcp_enhancements import (
    generate_tool_call_id,
    normalize_tool_arguments,
    validate_chat_request,
    MCPError,
    ErrorType,
    ToolCall
)

# Generate unique tool call ID
tool_id = generate_tool_call_id()  # tool_1712999999123_a1b2c3d4

# Normalize arguments
args = normalize_tool_arguments('{"key": "value"}')

# Validate request
result = validate_chat_request({"messages": [{"role": "user", "content": "hello"}]})
if not result.is_valid:
    for error in result.errors:
        print(f"Error: {error}")

# Create structured error
error = MCPError(ErrorType.BAD_REQUEST, "Invalid request", {"field": "messages"})

🔧 Recent Improvements

JSON Parsing Enhancements

Multi-strategy Fallback: Tries standard JSON, single-quote fix, and ast.literal_eval
Parameter-specific Handling: Different parsing for command, arguments, and other parameters
Graceful Degradation: Returns original string if all parsing strategies fail

Stream Retry Logic

Connection Retry: Retries on connection establishment failure
Data Integrity: Does not retry after data transmission starts (prevents inconsistency)
Exponential Backoff: Configurable delay and max retry count

Unified ID Generation

Consistent Format: All tool calls use tool_{uuid} format
MCP Compatibility: Matches MCP protocol expectations

⚠️ Known Issues & Server Status

Server	XML Parsing	Think Leak	finish_reason	Recommendation
Sglang	✅ Good	✅ Fixed	✅ Stable	Recommended
vLLM	⚠️ Partial	✅ Fixed	⚠️ Fluctuating	Use `--tool-call-parser qwen3_coder`
Ollama	⚠️ Intermittent	✅ Fixed	⚠️ Errors	Upgrade to latest version
llama.cpp	❌ Poor	❌ Severe	❌ Errors	Must use with this proxy

📄 License

Distributed under the MIT License. See LICENSE for more information.

Repository

lexiaodong

lexiaodong/qwen3.5_fix

Created

April 12, 2026

Updated

April 13, 2026

Language

Python