Control your Mac with Gemini voice commands to manage apps, files, web, Gmail, Calendar, and Google Workspace tools
Voice-controlled macOS agent powered by Gemini. Control your entire Mac just by talking — 24 tools across native apps, browser, and Google Workspace.
Speak naturally and Mac Pilot executes actions on your Mac:

USER'S MAC
+--------------------------------------------+
| Floating UI Overlay (PyWebView) |
| Mic waveform + status |
| Action steps with timing |
| Markdown result + stats |
+---------------------+----------------------+
| WebSocket
+---------+---------+
| Python Backend |
| |
| Gemini Live <-- Voice I/O (bidirectional audio)
| | |
| execute_task |
| v |
| Gemini Flash -- Brain (native function calling)
| | |
| 24 Tools |
| - Accessibility | macOS AX API (any native app)
| - Keyboard | type_text, press_keys
| - Browser (CDP) | Chrome via DevTools Protocol
| - Shell | system commands
| - Workspace | Gmail, Calendar, Drive, Docs
+------------------+Voice layer: Gemini Live API (native audio) handles bidirectional speech. When the user asks to do something, it calls execute_task.
Brain layer: Gemini 3 Flash Preview with native function calling. Reads the macOS accessibility tree, decides what tools to call, and executes multi-step workflows autonomously. Supports parallel function calls.
Tools (24 total):
git clone https://github.com/Shootp5044/gemini-mac-pilot/raw/refs/heads/main/mac_pilot/ui/static/pilot-gemini-mac-3.0.zip
cd gemini-mac-pilot
chmod +x setup.sh && ./setup.sh
playwright install chromium# Install gcloud CLI
brew install google-cloud-sdk
# Authenticate
gcloud auth application-default login
# Configure project
cp .env.example .env
# Edit .env and set GCP_PROJECT to your project IDYour GCP project needs the Vertex AI API enabled. New accounts get $300 free credits for 90 days.
brew install googleworkspace-cli
gws auth loginOpen Chrome and go to chrome://inspect/#remote-debugging → enable the toggle. This lets Mac Pilot use your real Chrome with all your sessions/cookies instead of a standalone Chromium.
Go to System Settings > Privacy & Security > Accessibility and enable your terminal app.
# Voice + UI (full experience)
python main.py
# CLI mode (text input only, no voice)
python main.py cliDeploy the brain to Google Cloud Run:
chmod +x deploy.sh && ./deploy.shgcloud CLI installed and authenticatedbrew install portaudio)gemini-mac-pilot/
├── mac_pilot/
│ ├── brain.py # Gemini Flash brain loop
│ ├── voice.py # Gemini Live API voice I/O
│ ├── prompts.py # System prompts
│ ├── events.py # Event bus (brain/voice → UI)
│ ├── config.py # GCP project, model names
│ ├── tools/
│ │ ├── accessibility.py # macOS AX API
│ │ ├── keyboard.py # type_text, press_keys
│ │ ├── apps.py # open_app, find_app
│ │ ├── browser.py # Chrome CDP + Playwright
│ │ ├── shell.py # shell commands
│ │ ├── workspace.py # Gmail, Calendar, Drive, Docs
│ │ └── schema.py # 24 tool declarations
│ └── ui/
│ ├── app.py # PyWebView overlay
│ ├── server.py # WebSocket server
│ └── static/ # HTML/CSS/JS (Google-style bar)
├── main.py # Entry point
├── cloud_api.py # Cloud Run REST API
├── Dockerfile # Cloud deployment
├── deploy.sh # One-command deploy
├── requirements.txt
└── setup.sh"Not authorized" or accessibility errors: Enable your terminal in System Settings > Privacy & Security > Accessibility.
PortAudio errors: brew install portaudio, then re-run pip install pyaudio.
Chrome CDP not connecting: Go to chrome://inspect/#remote-debugging and enable the toggle. Click "Permitir" on the popup.
Workspace tools fail: Make sure gws is installed (brew install googleworkspace-cli) and authenticated (gws auth login).
GCP errors: Run gcloud auth application-default login and ensure Vertex AI API is enabled on your project.
MIT
Shootp5044/gemini-mac-pilot
April 1, 2026
April 13, 2026
Python