This repository offers a voice-controlled AI interface using Google Gemini and Anthropic MCP, enabling natural speech and multimodal inputs for interacting with AI systems. It supports both custom and Systemprompt MCP servers.
Website ⌘¢ Documentation ⌘¢ Blog ⌘¢ Get API Key
This open-source project is a voice-controlled AI interface powered by Google Gemini and Anthropic MCP (Model Control Protocol). It transforms AI interaction through speech and multimodal inputs. Built with Vite, React, and TypeScript, it supports both custom and Systemprompt MCP servers. Systemprompt servers are easily installed with a free API key, while custom servers require a mcp.config.custom.json
file.
Key features include natural voice control, multimodal understanding (text, voice, visual), real-time voice synthesis, and AI workflow orchestration via MCP. It's ideal for developers building voice-controlled AI applications.
To get started, clone the repository, install dependencies, configure API keys in .env
(Google AI Studio, Systemprompt), and run npm run dev
. The project is licensed under the MIT License.
Ejb503/multimodal-mcp-client
January 8, 2025
March 27, 2025
TypeScript