multimodal-mcp-client

This repository offers a voice-controlled AI interface using Google Gemini and Anthropic MCP, enabling natural speech and multimodal inputs for interacting with AI systems. It supports both custom and Systemprompt MCP servers.

205

Systemprompt Multimodal MCP Client

Website ⌘¢ Documentation ⌘¢ Blog ⌘¢ Get API Key

This open-source project is a voice-controlled AI interface powered by Google Gemini and Anthropic MCP (Model Control Protocol). It transforms AI interaction through speech and multimodal inputs. Built with Vite, React, and TypeScript, it supports both custom and Systemprompt MCP servers. Systemprompt servers are easily installed with a free API key, while custom servers require a mcp.config.custom.json file.

Key features include natural voice control, multimodal understanding (text, voice, visual), real-time voice synthesis, and AI workflow orchestration via MCP. It's ideal for developers building voice-controlled AI applications.

To get started, clone the repository, install dependencies, configure API keys in .env (Google AI Studio, Systemprompt), and run npm run dev. The project is licensed under the MIT License.

Repository

Ejb503

Ejb503/multimodal-mcp-client

Created

January 8, 2025

Updated

March 27, 2025

Language

TypeScript