Talk it out.
Take command.
Voice-Command.

Talk to your AI as you cook. Command your AI as you're doing the dishes. Get updates on what your AI is doing with just your earpiece. Hands-free for hands-busy moments.

Windows · MIT-licensed · Works with Claude, Codex, Gemini, LM Studio

Install GitHub Latest release

What it is

You talk, your AI does the thing. Whatever your AI can do typed — search the web, check your calendar, send an email through your connectors, write code, edit files on your computer, kick off automations — you can now ask for out loud. And it narrates back what it's doing as it goes, so you don't have to look at the screen.

The voice layer doesn't add new capabilities. It just changes how you reach the ones your AI already has. Same connectors. Same MCPs. Same tools. You're just using your mouth instead of your keyboard.

Stays on your computer

Your voice never leaves the machine. Speech-to-text runs through faster-whisper locally; text-to-speech goes through edge-tts. The audio capture, the playback — all of it is on your hardware. Voice-Command itself adds zero outbound traffic. Your AI still reaches out for its own model calls (Claude pings Anthropic, ChatGPT pings OpenAI), and tools it calls might too (web search, email connectors). But the voice layer stays put. Pair it with a local model in LM Studio and the whole loop is offline.

How a turn works

You hear a series of beeps. That's the AI's "I'm listening, your turn" cue.
You talk. Anything your AI has the tools to handle counts.
The AI works — and tells you out loud what it's doing as it goes. ("Checking your calendar… found three events tomorrow… drafting the reply…")
You hear the beeps again. The AI's done with that turn. Your move.

One thing to know: the audio flow is one-way at a time. You can't cut the AI off mid-sentence with your voice — once it's talking or working, the only way to interrupt is to click or tap in the AI's UI. The beeps are the only handoff signal.

To end a session, just tell the AI you're done. "I'm done talking," "let's talk later," "bye for now" — anything in that family. Or hit stop in your AI's UI. Both work.

Install

You shouldn't need a CS degree to get this running. If you've got Claude Desktop, Cowork, Claude Code, Codex (the Windows app), Gemini CLI, or LM Studio open right now, just copy this and paste it to your AI:

paste to your AIhttps://github.com/AIWander/Voice-Command — Can you install this MCP for us to use here, set up the voice listening server, and make me a .bat to launch it. Walk me through any restart or step I need. Tell me when everything's installed and we're ready to talk.

Your AI will grab the right voice-mcp.exe for your machine (ARM64 or x64), drop it in C:\CPC\servers\, wire it into your client's MCP config (backed up first), install the Python pieces, and write you a START_VOICE_SERVER.bat you can double-click whenever you want to talk.

If your AI can run scripts but isn't great at multi-step shell flows, point it at install.ps1 instead — it does the whole install loop deterministically.

Verify by saying hi

Skip the formal health check. Just use the tools themselves. Ask your AI:

"Say hi out loud and then listen for me."

If you hear the AI greet you and then hear the listening beep, you're wired up end-to-end. Talk back to confirm it transcribed you. If something's broken, you'll know exactly which half:

No voice → TTS is off. ffmpeg missing, or your audio output device is wrong.
Voice but no beep → MCP wiring worked for speak, not for listen_for_speech. Config has the entry but voice_server.py isn't running.
Beep but no transcription → microphone permission off, or mic device not picked up. Most installs have it on by default; if not, Windows Settings → Privacy & security → Microphone → "Let desktop apps access your microphone".
Full round-trip works → done. Talk away.

Pairs nicely with

Voice-Command is most useful when your AI also has hands. Three companion MCPs — all local, all callable by voice once you're wired up — give your AI the rest of the body it needs.

OPS

File and shell operations

Read/write files, run commands, manage processes. The recommended operator MCP — install this first if you don't have one yet.

HANDS

Browser, Windows UI, vision

Drive Chrome through every channel — DOM, accessibility tree, network capture, JS eval. Plus Windows UIA and vision/OCR.

WORKFLOW

API discovery and replay

Watch the agent work once, capture the API patterns, then replay direct HTTP forever after. OS-keyring credential vault, scheduled flows.

Install any combination. Voice-Command is the mouth and ears; these are the rest of the body. None of them reach out unless the AI explicitly asks them to.

Works with

Voice-Command is a STDIO MCP server, so it plugs into any AI client that speaks MCP. That includes:

Claude (chat) — Claude Desktop and the web app
Cowork — Claude's desktop agent
Claude Code — the CLI coding agent
Codex — OpenAI's Windows app
Gemini CLI — Google
LM Studio — for running local models (Llama, Qwen, Mistral, whatever you've loaded up)
Anything else that can call a STDIO MCP server — the protocol is the only requirement

It doesn't care which model is on the other end. If your AI of choice can call MCP tools, you can talk to it.

Talk it out.Take command.Voice-Command.

Stays on your computer

Verify by saying hi

Talk it out.
Take command.
Voice-Command.