Newcontext-mode—Save 98% of your AI coding agent's context windowLearn more
MCP Directory
ServersClientsBlog

context-mode

Save 98% of your AI coding agent's context window. Works with Claude Code, Cursor, Copilot, Codex, and more.

Try context-mode
MCP Directory

Model Context Protocol Directory

MKSF LTD
Suite 8805 5 Brayford Square
London, E1 0SG

MCP Directory

  • About
  • Blog
  • Documentation
  • Contact

Menu

  • Servers
  • Clients

© 2026 model-context-protocol.com

The Model Context Protocol (MCP) is an open standard for AI model communication.
Powered by Mert KoseogluSoftware Forge
  1. Home
  2. Clients
  3. qwen3.5_fix

qwen3.5_fix

GitHub

A high-performance proxy middleware to fix Qwen3.5 tool calling issues and enable seamless MCP client compatibility.

0
0

Qwen3.5 Tool Fix Proxy Server

License: MIT
Python 3.12+

Author: Le Xiaodong
A high-performance proxy middleware designed to enable Qwen3.5 to work seamlessly with MCP clients (like Cline/Pencil) by fixing tool call formatting issues in real-time.

A proxy middleware layer that fixes tool call formatting issues for Qwen3.5 models running on Sglang / vLLM / Ollama / llama.cpp servers.

🚀 Project Purpose

This project runs between the client and LLM backend server, aiming to fix common tool call defects in Qwen3.5 deployments, enabling seamless collaboration with MCP (Model Context Protocol) clients.

Core Fixes:

  1. XML Tool Call Extraction: Extract <function=...> XML tags leaked in content into standard tool_calls format.
  2. Thinking Tag Cleanup: Automatically remove `` tags remaining in the text to prevent context pollution.
  3. Finish Reason Correction: Correct incorrect stop reasons to tool_use, ensuring Agents can trigger tool execution logic.
  4. Response Normalization: Ensure all outputs strictly follow OpenAI or Anthropic API specifications.
  5. JSON Parameter Parsing: Enhanced JSON parsing with fallback strategies for MCP tool arguments.

✨ Key Features

🌟 Multi-Backend Support

  • Sglang/Qwen Backend: Native support, optimized for Qwen3.5.
  • Anthropic Claude Backend: Full support, acts as a unified gateway.
  • OpenAI API Backend: Compatibility mode, supports various OpenAI-compatible interfaces.

🔄 Intelligent Routing & Compatibility

  • Auto-Routing: Automatically selects the backend based on the model parameter.
  • Protocol Conversion: Provides Anthropic Messages API format support on local backends.

🛡️ Enterprise-Grade Reliability

  • Connection Pooling: Reduces TCP handshake overhead and improves response speed.
  • Exponential Backoff Retries: Handles network jitter to enhance stability.
  • Stream Retry Support: Dedicated retry logic for streaming requests.
  • Asynchronous High Concurrency: Based on FastAPI and httpx architecture.

🔧 MCP Enhancements

  • Unique Tool Call ID Generation: Uses tool_{uuid} format to prevent ID collisions.
  • Unified Parameter Type Conversion: Handles string/object mixed cases.
  • Request/Response Validation: Structured validation for messages and parameters.
  • Type-Safe Error Handling: MCPError and ErrorType enum for structured errors.
  • JSON Fallback Parsing: Multiple strategies for parsing malformed JSON.

📐 Architecture

graph LR
    Client[Cline / MCP Client] -- "OpenAI/Anthropic API" --> Proxy[Qwen Tool Fix Proxy]
    Proxy -- "Fixed/Normalized Request" --> Backend[Sglang / vLLM / Claude]
    Backend -- "Raw Response (with XML leaks)" --> Proxy
    Proxy -- "Cleaned Response (Standard Tool Calls)" --> Client

🛠️ Quick Start

1. Environment Preparation

  • Python 3.12+
  • Install dependencies:
    pip install -r qwen_tool_fix/requirements.txt

2. Configuration

Copy qwen_tool_fix/.env.example to qwen_tool_fix/.env and modify:

# Proxy Server Configuration
PORT=8123

# Backend API Configuration
BACKEND_TYPE=sglang
BACKEND_URL=http://127.0.0.1:5005
API_KEY=empty

# Other Configuration
TIMEOUT=120.0
LOG_LEVEL=INFO
ENABLE_CORS=true
ALLOWED_ORIGINS=*

3. Start Server

Option 1: Command Line (Recommended)

# Basic usage
python start_proxy.py --port 8123

# With custom backend URL
python start_proxy.py --port 8123 --backend-url http://localhost:5005

# With all options
python start_proxy.py --port 8123 --backend-url http://localhost:5005 --log-level DEBUG

# Development mode (auto-reload)
python start_proxy.py --reload

Command Line Options:

OptionShortDefaultDescription
--port-p8123Proxy server port
--host-H0.0.0.0Listen address
--backend-url-bhttp://127.0.0.1:5005Backend server URL
--backend-model-mqwen3.5-27bBackend model name
--log-level-lINFOLog level (DEBUG/INFO/WARNING/ERROR)
--reload-FalseEnable auto-reload (development mode)

Option 2: Batch Scripts

Use the provided scripts in qwen_tool_fix/:

  • Windows: Run qwen_tool_fix/run_proxy.bat
  • Linux/Mac: Run bash qwen_tool_fix/run_proxy.sh

Option 3: Module Start

python -m qwen_tool_fix.proxy_server

4. Client Configuration (e.g., Cline)

In Cline settings, change the API Endpoint to:
http://127.0.0.1:8123/v1


📊 API Reference

EndpointMethodDescriptionCompatibility
/v1/chat/completionsPOSTChat CompletionOpenAI
/v1/messagesPOSTMessages APIAnthropic
/healthGETHealth Check-
/modelsGETModel ListOpenAI

🧑‍💻 Developer Guide

Using parsing tools directly

from qwen_tool_fix.tool_parser import parse_qwen_xml_tools, strip_think_tags, parse_json_with_fallback

# Example 1: Extract tool calls
text_with_tools = "Please execute <function=bash><parameter=command>ls</parameter></function>"
tools = parse_qwen_xml_tools(text_with_tools)

# Example 2: Remove think tags
text_with_think = " Thinking... The answer is 42"
cleaned = strip_think_tags(text_with_think)

# Example 3: Parse JSON with fallback
json_text = "{'key': 'value'}"  # Single quotes
parsed = parse_json_with_fallback(json_text)  # Returns {'key': 'value'}

Using MCP Enhancements Module

from qwen_tool_fix.mcp_enhancements import (
    generate_tool_call_id,
    normalize_tool_arguments,
    validate_chat_request,
    MCPError,
    ErrorType,
    ToolCall
)

# Generate unique tool call ID
tool_id = generate_tool_call_id()  # tool_1712999999123_a1b2c3d4

# Normalize arguments
args = normalize_tool_arguments('{"key": "value"}')

# Validate request
result = validate_chat_request({"messages": [{"role": "user", "content": "hello"}]})
if not result.is_valid:
    for error in result.errors:
        print(f"Error: {error}")

# Create structured error
error = MCPError(ErrorType.BAD_REQUEST, "Invalid request", {"field": "messages"})

🔧 Recent Improvements

JSON Parsing Enhancements

  • Multi-strategy Fallback: Tries standard JSON, single-quote fix, and ast.literal_eval
  • Parameter-specific Handling: Different parsing for command, arguments, and other parameters
  • Graceful Degradation: Returns original string if all parsing strategies fail

Stream Retry Logic

  • Connection Retry: Retries on connection establishment failure
  • Data Integrity: Does not retry after data transmission starts (prevents inconsistency)
  • Exponential Backoff: Configurable delay and max retry count

Unified ID Generation

  • Consistent Format: All tool calls use tool_{uuid} format
  • MCP Compatibility: Matches MCP protocol expectations

⚠️ Known Issues & Server Status

ServerXML ParsingThink Leakfinish_reasonRecommendation
Sglang✅ Good✅ Fixed✅ StableRecommended
vLLM⚠️ Partial✅ Fixed⚠️ FluctuatingUse --tool-call-parser qwen3_coder
Ollama⚠️ Intermittent✅ Fixed⚠️ ErrorsUpgrade to latest version
llama.cpp❌ Poor❌ Severe❌ ErrorsMust use with this proxy

📄 License

Distributed under the MIT License. See LICENSE for more information.

Repository

LE
lexiaodong

lexiaodong/qwen3.5_fix

Created

April 12, 2026

Updated

April 13, 2026

Language

Python

Category

AI