SwitchBoard-MCP

GitHub

Switchboard --> MCP server that transparently routes AI coding agent requests across multiple models and providers. Never hit a rate limit dead end again.

Switchboard 🎛️

Never let your AI coding agent hit a dead end again.

A multi-provider AI routing layer that gives any MCP-compatible coding agent (Claude Code, Cursor, Windsurf) unlimited flow by transparently switching between AI providers when rate limits hit or when a better model exists for the task.

The Problem

It's 2am. You're deep in a coding session with Claude Code. You've got 80 messages of context, the agent is about to write the perfect solution — and then:

"You've reached your message limit. Please upgrade your plan or wait."

The agent stops dead. Your flow is gone. You're staring at an error message instead of working code.

This shouldn't happen. Your coding agent should never hit a wall because one provider said no.

Switchboard fixes this. It sits between your coding agent and the AI providers, watching for rate limits, picking the best model for each task, and silently falling back to alternatives — all without the agent (or you) ever knowing a switch happened.

How It Works

┌──────────────────────────────────────────────────────────────────┐
│                    YOUR WORKSTATION                              │
│                                                                  │
│  ┌─────────────┐                                                 │
│  │ Your Agent  │   Claude Code / Cursor / Windsurf               │
│  │ (Claude etc)│                                                 │
│  └──────┬──────┘                                                 │
│         │  MCP tool calls only                                   │
│         ▼                                                        │
│  ┌──────────────────────────────────────────────────────┐        │
│  │           SWITCHBOARD (The Whisperer)                │        │
│  │                                                      │        │
│  │  ┌──────────┐   ┌───────────┐   ┌──────────────┐     │        │
│  │  │ Layer 1  │──▶│  Layer 2  │──▶│   Fallback   │     │        │
│  │  │ Rules    │   │ Classifier│   │   Chain      │     │        │
│  │  │ (fast)   │   │ (DistilBERT)│  │ + Health     │    │        │
│  │  └──────────┘   └───────────┘   └──────┬───────┘     │        │
│  │                                        │             │        │
│  │  ┌──────────────────────────────────┐  │             │        │
│  │  │ Context Serializer (Handoff)     │──┘             │        │
│  │  │ Structured state ≠ raw history   │                │        │
│  │  └──────────────────────────────────┘                │        │
│  └───────────────────────┬──────────────────────────────┘        │
│                          │                                       │
│         ┌────────────────┼────────────────┐                      │
│         ▼                ▼                ▼                      │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐                    │ 
│  │ Anthropic│    │OpenRouter│    │ Direct   │                    │
│  │ (Claude) │    │(14 models│    │ APIs     │                    │
│  └──────────┘    └──────────┘    └──────────┘                    │
└──────────────────────────────────────────────────────────────────┘

Routing Decision Flow

Request arrives at Switchboard
       │
       ▼
┌──────────────────┐
│  Layer 1: Rules  │  ← Checks in order:
│  (instant)       │     1. User preferences (cheap/fast)?
└────────┬─────────┘     2. Context > 60K tokens?
         │                3. Simple task (explain, autocomplete)?
    conclusive?           4. Latency-sensitive?
         │                5. Budget cap?
    ┌────┴────┐
    │         │
   YES        NO
    │         │
    ▼         ▼
 Route to   ┌──────────────────┐
 chosen     │ Layer 2:         │  ← Hybrid classifier:
 model      │ Classifier       │     60% semantic (sentence-transformers)
            │ (all-MiniLM-L6)  │     40% keyword overlap
            └────────┬─────────┘
                     │
                     ▼
              Best model for task category
                     │
                     ▼
            ┌──────────────────┐
            │  Fallback Chain  │  ← If model fails:
            │  (health-aware)  │     1. Try next in priority list
            └────────┬─────────┘     2. Skip degraded/rate-limited
                     │               3. Fall to free-tier models
                     ▼               4. Never hard-stop
              Response back to agent
              (switch is invisible)

Quick Start

Install

pip install switchboard-mcp

Set your API keys

export ANTHROPIC_API_KEY="sk-ant-..."
export OPENROUTER_API_KEY="sk-or-..."

Add to your MCP client

Every MCP client speaks the same stdio transport protocol. Your server runs as a subprocess, the client connects, discovers the 4 tools, and starts calling them. Switchboard is completely client-agnostic.

The core config block is always the same — just the root key changes per client:

{
  "command": "switchboard",
  "env": {
    "ANTHROPIC_API_KEY": "sk-ant-...",
    "OPENROUTER_API_KEY": "sk-or-..."
  }
}

Here's exactly where it goes for each client. Find yours, copy, paste.

Client Configuration Reference

Client	Config File / Location	Root Key	Notes
Cursor	`~/.cursor/mcp.json`	`"mcpServers"`	Or via GUI: Settings → Features → MCP
VS Code (Copilot)	`.vscode/mcp.json`	`"servers"`	Workspace-level
Claude Code	`~/.claude/settings.json`	`"mcpServers"`	Global
Windsurf	Settings → MCP Servers	`"mcpServers"`	Same JSON structure as Cursor
JetBrains	Settings → AI Assistant → MCP	`"mcpServers"`	GUI paste or auto-configure
Antigravity / Google ADK	CLI config	stdio	One config, multiple agents
GitKraken	MCP settings	`"mcpServers"`	Same as Cursor

Cursor

File: ~/.cursor/mcp.json (or Settings → Features → MCP → Add Server)

{
  "mcpServers": {
    "switchboard": {
      "command": "switchboard",
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "OPENROUTER_API_KEY": "sk-or-..."
      }
    }
  }
}

Restart Cursor. Your agent now has unlimited flow.

VS Code (Copilot)

File: .vscode/mcp.json (workspace-level)

Note: VS Code uses "servers" as the root key, not "mcpServers".

{
  "servers": {
    "switchboard": {
      "command": "switchboard",
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "OPENROUTER_API_KEY": "sk-or-..."
      }
    }
  }
}

Reload the window (Ctrl+Shift+P → "Developer: Reload Window").

Claude Code

File: ~/.claude/settings.json

{
  "mcpServers": {
    "switchboard": {
      "command": "switchboard",
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "OPENROUTER_API_KEY": "sk-or-..."
      }
    }
  }
}

Next claude invocation picks it up automatically.

Windsurf

Location: Settings → MCP Servers → Add Server

Use the same JSON structure as Cursor:

{
  "mcpServers": {
    "switchboard": {
      "command": "switchboard",
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "OPENROUTER_API_KEY": "sk-or-..."
      }
    }
  }
}

JetBrains

Location: Settings → Tools → AI Assistant → Model Context Protocol (MCP) → Add → "As JSON ChatForest"

Paste this config:

{
  "mcpServers": {
    "switchboard": {
      "command": "switchboard",
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "OPENROUTER_API_KEY": "sk-or-..."
      }
    }
  }
}

Tip: JetBrains can also auto-configure external clients from the IDE side. If you're already using other MCP servers, just add Switchboard alongside them.

Antigravity / Google ADK

Uses the same stdio config. Configure once, use across multiple agents:

{
  "mcpServers": {
    "switchboard": {
      "command": "switchboard",
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "OPENROUTER_API_KEY": "sk-or-..."
      }
    }
  }
}

GitKraken

Location: MCP Settings → Add Server

Same JSON structure as Cursor:

{
  "mcpServers": {
    "switchboard": {
      "command": "switchboard",
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "OPENROUTER_API_KEY": "sk-or-..."
      }
    }
  }
}

Using a Virtual Environment

If switchboard isn't on your PATH (you installed from source or in a venv), use the full path:

{
  "mcpServers": {
    "switchboard": {
      "command": "/path/to/your/venv/bin/switchboard",
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "OPENROUTER_API_KEY": "sk-or-..."
      }
    }
  }
}

Or with python -m:

{
  "mcpServers": {
    "switchboard": {
      "command": "python",
      "args": ["-m", "server"],
      "cwd": "/path/to/switchboard/package",
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "OPENROUTER_API_KEY": "sk-or-..."
      }
    }
  }
}

That's it. Your agent now has unlimited flow. When Claude hits a rate limit, Switchboard silently routes to DeepSeek, GPT-4o, or Gemini — your agent never knows.

Model Registry

14 models across 3 tiers, from premium to free fallback:

Model	Provider	Cost/1K	Context	Best For	Tier
claude-sonnet-4	Anthropic	$0.003	200K	code gen, review, arch	Paid
claude-opus-4	Anthropic	$0.015	200K	review, security, arch	Paid
claude-haiku-3.5	Anthropic	$0.0008	200K	explanation, autocomplete	Paid
claude-sonnet-4	OpenRouter	$0.003	200K	code gen, review	Paid
deepseek-v3	OpenRouter	$0.00014	128K	code gen, debugging	Paid
deepseek-r1	OpenRouter	$0.0005	128K	debugging, review	Paid
gpt-4o	OpenRouter	$0.0025	128K	code gen, review	Paid
gpt-4o-mini	OpenRouter	$0.00015	128K	autocomplete, explanation	Paid
o3-mini	OpenRouter	$0.0011	200K	debugging, code gen	Paid
gemini-1.5-pro	OpenRouter	$0.00125	2M	code gen, architecture	Paid
gemini-2.0-flash	OpenRouter	$0.0001	1M	explanation, autocomplete	Paid
qwen-2.5-coder	OpenRouter	$0.00018	128K	code gen, autocomplete	Paid
llama-3.3-70b:free	OpenRouter	$0	128K	explanation, autocomplete	Free
qwen-2.5-72b:free	OpenRouter	$0	32K	explanation, autocomplete	Free

Full definitions in config/models.yaml. Add your own models by editing this file.

Routing Logic

Hybrid Approach: Rules First, Classifier Second

Layer 1 (Rule Engine) runs in < 1ms. It checks:

Signal	Action
`prefer_cheap` is set	Route to cheapest available model
`prefer_fast` is set	Route to fastest available model
Context > 60K tokens	Route to large-window model (Gemini 1.5 Pro: 2M)
Simple task keywords (explain, autocomplete, rename)	Route to cheapest model
Latency-sensitive (autocomplete, suggest, stream)	Route to fast model
`max_cost_per_request` budget	Filter to affordable models

Layer 2 (Classifier) runs when rules are inconclusive. A sentence-transformer (all-MiniLM-L6-v2, 22MB) classifies the task into 7 categories and picks the best model for that task type:

Category	Preferred Models
`code_generation`	Sonnet, DeepSeek-V3, GPT-4o
`code_review`	Opus, GPT-5
`debugging`	DeepSeek-R1, o3-mini
`explanation`	Haiku, Gemini Flash
`architecture`	Opus, GPT-5
`autocomplete`	Haiku, GPT-4o-mini, Qwen-Coder
`security_audit`	Opus, GPT-5

Fallback Priority

When a model fails, Switchboard walks down this chain:

claude-sonnet-4 (Anthropic direct)
anthropic/claude-sonnet-4 (OpenRouter)
deepseek/deepseek-v3
openai/gpt-4o
google/gemini-1.5-pro
qwen/qwen-2.5-coder-32b-instruct
meta-llama/llama-3.3-70b-instruct:free ← free tier
qwen/qwen-2.5-72b-instruct:free ← free tier (final safety net)

The agent NEVER hard-stops. Steps 7-8 are always free.

Context Serialization: Why This Isn't LiteLLM

When you switch models mid-conversation, most routers just forward the entire message history. That's wasteful and often breaks things (different models have different context windows and system prompt expectations).

Switchboard maintains a structured ConversationState:

GOAL: Build a REST API for a todo app
ACTIVE FILES:
  - app/main.py
  - app/models.py
CHANGES MADE SO FAR:
  + Added FastAPI setup
  + Created Todo model
KEY DECISIONS:
  - Used SQLite for persistence
CURRENT SUBTASK: Add authentication
Recent conversation:
[user]: Add JWT authentication
[assistant]: I'll add middleware...

This handoff prompt is 60-90% smaller than forwarding raw history, and the new model gets everything it actually needs to continue.

Configuration

Environment Variables

Variable	Required	Description
`ANTHROPIC_API_KEY`	Recommended	Direct access to Claude models
`OPENROUTER_API_KEY`	Recommended	Access to 12+ models via OpenRouter
`OPENAI_API_KEY`	Optional	Future direct OpenAI support

Runtime Preferences

Set at runtime via set_routing_preferences:

{
  "prefer_cheap": true,
  "prefer_fast": false,
  "max_cost_per_request": 0.50,
  "blacklist_providers": ["openrouter"]
}

Customizing Models

Edit config/models.yaml to add models or adjust costs:

models:
  - id: your/custom-model
    provider: openrouter
    cost_per_1k_tokens: 0.001
    context_window: 128000
    strengths: [code_generation, debugging]
    speed: fast
    tier: paid

Logs

All routing decisions are logged to ~/.switchboard/routing.log in JSONL format:

tail -f ~/.switchboard/routing.log | jq .

Dry Run

See what Switchboard would do without making any API calls:

switchboard --dry-run

Or via CLI tester:

python cli_test.py dry_run

CLI Tester

The CLI tester demonstrates every routing scenario with beautiful Rich-formatted output. No API keys needed — all HTTP is mocked.

pip install switchboard-mcp[dev]
python cli_test.py dry_run

Available Scenarios

Command	What It Shows
`python cli_test.py dry_run`	All 7 task categories routed — zero API keys
`python cli_test.py task_routing`	Full routing table with classifications
`python cli_test.py rate_limit`	429 handling and fallback
`python cli_test.py context_switch`	Structured handoff vs raw history comparison
`python cli_test.py provider_health`	Error injection, degradation, recovery
`python cli_test.py stress`	20 concurrent requests, mixed tasks
`python cli_test.py all`	Run all scenarios

This is the zero-setup way to experience Switchboard. Someone finds this repo, runs python cli_test.py dry_run, sees the routing decisions in a pretty table — that's the "oh this is real" moment.

Contributing

Quick Start

git clone https://github.com/yourname/switchboard-mcp.git
cd switchboard-mcp
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/ -v

Adding a New Provider

Create providers/your_provider.py extending providers/base.py
Implement chat_complete() and health_check()
Register in config/models.yaml with the correct provider name
Add tests to tests/test_integration.py

Adding a New Model

Just edit config/models.yaml. No code changes needed:

- id: provider/model-name
  provider: openrouter     # or 'anthropic'
  cost_per_1k_tokens: 0.001
  context_window: 128000
  strengths: [code_generation, debugging]
  speed: fast
  tier: paid

Improving the Classifier

The classifier uses hybrid semantic + keyword scoring. To improve it:

Add keywords to CATEGORY_KEYWORDS in router/classifier.py
The semantic embeddings will automatically adapt to new text
Run pytest tests/test_router.py::TestClassifierStability to verify

PR Checklist

All tests pass (pytest tests/ -v)
New models added to config/models.yaml
CHANGELOG.md updated
No TODOs or stubs left in new code

See CONTRIBUTING.md for full details.

Roadmap

Version	Feature
v0.1.0	Initial release — hybrid router, 14 models, context handoff, health tracking
v0.2.0	Web dashboard for routing analytics and cost tracking
v0.3.0	Fine-tuned classifier trained on real routing data
v0.4.0	Cost analytics — per-session, per-project cost tracking and budgets
v0.5.0	Community model marketplace — share routing rules and model configs

What Makes This Different

Feature	Switchboard	LiteLLM / OpenRouter Router
MCP-native	✅ Zero-code-change for Claude Code, Cursor	❌ Requires SDK integration
Semantic context	✅ Structured handoff, not raw history	❌ Raw message forwarding only
Task-aware routing	✅ Understands debug vs review vs gen	❌ Cost/availability only
Feedback loop	✅ `report_outcome` tool	❌ None
Free-tier fallback	✅ Always 2+ free models	⚠️ Depends on plan
License	✅ MIT, for individual devs	⚠️ Hosted service / Enterprise

License

MIT — built for individual developers, not enterprise.

Made with ☕ at 2am because the rate limit wall needed to go.

Repository

tejasprasad2008-afk

tejasprasad2008-afk/SwitchBoard-MCP

Created

April 13, 2026

Updated

April 13, 2026

Language

Python

SwitchBoard-MCP

Switchboard 🎛️

The Problem

How It Works

Routing Decision Flow

Quick Start

Install

Set your API keys

Add to your MCP client

Client Configuration Reference

Cursor

VS Code (Copilot)

Claude Code

Windsurf

JetBrains

Antigravity / Google ADK

GitKraken

Using a Virtual Environment

Model Registry

Routing Logic

Hybrid Approach: Rules First, Classifier Second

Fallback Priority

Context Serialization: Why This Isn't LiteLLM

Configuration

Environment Variables

Runtime Preferences

Customizing Models

Logs

Dry Run

CLI Tester

Available Scenarios

Contributing

Quick Start

Adding a New Provider

Adding a New Model

Improving the Classifier

PR Checklist

Roadmap

What Makes This Different

License

Repository

Created

Updated

Language

Category