Talk it out.
Take command.
Voice-Command.
Talk to your AI as you cook. Command your AI as you're doing the dishes. Get updates on what your AI is doing with just your earpiece. Hands-free for hands-busy moments.
Windows · Apache-2.0 licensed · Works with Claude, Codex, Gemini, LM Studio
What it is
You talk, your AI does the thing. Voice adds no new privileges; it only invokes tools the user has already installed and enabled. Sensitive actions should require confirmation. If your AI can do it typed, you can ask for it spoken.
The voice layer doesn't add new capabilities. It just changes how you reach the ones your AI already has. Same connectors. Same MCPs. Same tools. Same permission boundaries. You're just using your mouth instead of your keyboard.
What stays local
Microphone capture, speech-to-text, silence detection, noise filtering, emotion detection, and audio playback run on your machine. The listening server binds to localhost:5123, not your LAN. Text-to-speech currently uses edge-tts, which calls Microsoft Edge's online TTS service. Your AI still reaches out for its own model calls (Claude pings Anthropic, ChatGPT pings OpenAI), and tools it calls might too (web search, email connectors). For a fully offline loop, pair this with a local model and replace edge-tts with a local TTS backend.
Safe Use / Permission Model
AIWander tools are local, user-authorized MCP capability surfaces. They do not grant an AI new permissions by themselves. They expose tools the user explicitly installs and enables. Sensitive actions should be confirmed by the user, credentials should stay in the OS keyring or local vault, and demos should use mock data.
How a turn works
- You hear a series of beeps. That's the AI's "I'm listening, your turn" cue.
- You talk. Anything your AI has the tools and authorization to handle counts.
- The AI works — and tells you out loud what it's doing as it goes. ("Checking your calendar… found three events tomorrow… drafting the reply…")
- You hear the beeps again. The AI's done with that turn. Your move.
One thing to know: the audio flow is one-way at a time. You can't cut the AI off mid-sentence with your voice — once it's talking or working, the only way to interrupt is to click or tap in the AI's UI. The beeps are the only handoff signal.
To end a session, just tell the AI you're done. "I'm done talking," "let's talk later," "bye for now" — anything in that family. Or hit stop in your AI's UI. Both work.
Install
You shouldn't need a CS degree to get this running. If you've got Claude Desktop, Cowork, Claude Code, Codex (the Windows app), Gemini CLI, or LM Studio open right now, just copy this and paste it to your AI:
paste to your AIhttps://github.com/AIWander/Voice-Command — Can you install this MCP for us to use here, set up the voice listening server, and make me a .bat to launch it. Walk me through any restart or step I need. Tell me when everything's installed and we're ready to talk.
Your AI will grab the right voice-mcp.exe for your machine (ARM64 or x64), drop it in %LOCALAPPDATA%\CPC\servers\, wire it into your client's MCP config (backed up first), install the Python pieces, and write you a START_VOICE_SERVER.bat you can double-click whenever you want to talk.
If your AI can run scripts but isn't great at multi-step shell flows, point it at install.ps1 instead — it does the whole install loop deterministically.
Verify by saying hi
Skip the formal health check. Just use the tools themselves. Ask your AI:
"Say hi out loud and then listen for me."
If you hear the AI greet you and then hear the listening beep, you're wired up end-to-end. Talk back to confirm it transcribed you. If something's broken, you'll know exactly which half:
- No voice → TTS is off. ffmpeg missing, or your audio output device is wrong.
- Voice but no beep → MCP wiring worked for
speak, not for listen_for_speech. Config has the entry but voice_server.py isn't running.
- Beep but no transcription → microphone permission off, or mic device not picked up. Most installs have it on by default; if not, Windows Settings → Privacy & security → Microphone → "Let desktop apps access your microphone".
- Full round-trip works → done. Talk away.
Pairs nicely with
Voice-Command is most useful when your AI also has hands. Three companion MCPs — all local, all callable by voice once you're wired up — give your AI the rest of the body it needs.
OPS
File and shell operations
Read/write files, run commands, manage processes. The recommended operator MCP — install this first if you don't have one yet.
HANDS
Browser, Windows UI, vision
Drive Chrome through every channel — DOM, accessibility tree, network capture, JS eval. Plus Windows UIA and vision/OCR.
WORKFLOW
API discovery and replay
Watch the agent work once, capture the API patterns, then replay direct HTTP forever after. OS-keyring credential vault, scheduled flows.
Install any combination. Voice-Command is the mouth and ears; these are the rest of the body. These tools run locally, but the actions they perform can still touch files, browsers, APIs, email, or shell commands depending on what you have enabled.
Works with
Voice-Command is a STDIO MCP server, so it plugs into any AI client that speaks MCP. That includes:
- Claude (chat) — Claude Desktop and the web app
- Cowork — Claude's desktop agent
- Claude Code — the CLI coding agent
- Codex — OpenAI's Windows app
- Gemini CLI — Google
- LM Studio — for running local models (Llama, Qwen, Mistral, whatever you've loaded up)
- Anything else that can call a STDIO MCP server — the protocol is the only requirement
It doesn't care which model is on the other end. If your AI of choice can call MCP tools, you can talk to it.