OmniVoice Studio — How to Use It
01 / 08
What Is OmniVoice Studio?
OmniVoice Studio is an open-source desktop application for voice cloning, video dubbing, real-time dictation, and speaker diarization. Everything runs locally on your machine. No API keys, no cloud account, no subscription required.
- 646 languages supported for TTS via the default OmniVoice engine
- 99 languages for transcription via WhisperX
- Available on macOS, Windows, and Linux
- GPU is optional — full pipeline runs on CPU
- Free for personal, educational, and research use (FSL-1.1-ALv2)
OmniVoice Studio — How to Use It
02 / 08
System Requirements
A GPU is optional. Without one, TTS runs approximately 3× slower on CPU. With ≤8 GB VRAM, TTS automatically offloads to CPU during transcription — no config needed.
| Component | Minimum | Recommended |
|---|---|---|
| OS | Win 10 / macOS 12+ / Ubuntu 20.04+ | Any modern 64-bit OS |
| RAM | 8 GB | 16 GB+ |
| VRAM | 4 GB (auto-offloads) | 8 GB+ (RTX 3060+) |
| Disk | 10 GB free | 20 GB+ SSD |
| Python | 3.10+ | 3.11–3.12 |
| GPU | Optional | CUDA / MPS / ROCm |
OmniVoice Studio — How to Use It
03 / 08
Installation
The project recommends running from source. Install three prerequisites first: ffmpeg, Bun (JS runtime), and uv (Python package manager).
git clone https://github.com/debpalash/OmniVoice-Studio.git
cd OmniVoice-Studio
uv sync
bun install
bun dev
Frontend loads at http://localhost:5173 | API runs on port 8000.
Model weights download automatically on first generation.
Pre-built installers available: macOS DMG, Windows MSI, Linux AppImage and .deb — see the Releases page on GitHub.
OmniVoice Studio — How to Use It
04 / 08
Voice Cloning
Voice cloning uses zero-shot learning — it clones a voice from a clip as short as 3 seconds, without prior training on that voice. The default OmniVoice engine conditions a diffusion-based TTS model on the reference audio.
- Go to the Voice Clone tab in the UI
- Upload or record a 3-second audio clip of the target voice
- Enter your text and select a target language (646 available)
- Click Generate — output is saved to your project library
Voice Gallery: Search YouTube, browse categories, and download reference clips directly inside the app to build your voice library.
OmniVoice Studio — How to Use It
05 / 08
Video Dubbing
The full dubbing pipeline runs locally: transcribe → translate → synthesize → mux. Demucs isolates vocals so the original background audio is preserved in the final export.
- Go to the Dub tab — paste a YouTube URL or upload a local file
- WhisperX transcribes speech with word-level alignment
- Select a target language; translation runs automatically
- TTS engine re-voices the transcript; Demucs preserves background audio
- Export the final MP4 with dubbed audio mixed in
Batch Queue: Drop up to 50 videos and walk away. Each job has its own progress bar tracking through the full pipeline.
OmniVoice Studio — How to Use It
06 / 08
Dictation & Speaker Diarization
Dictation works system-wide from any application. Diarization identifies individual speakers in a multi-speaker audio file using Pyannote + WhisperX.
- Press ⌘+⇧+Space (macOS) to open the floating dictation widget
- Speech streams via WebSocket and auto-pastes into the active input field
- Upload a multi-speaker file to the Diarization tab
- Pyannote identifies who said what; each speaker gets an auto-extracted voice profile
- Assign a TTS voice per speaker for per-speaker dubbing
Hugging Face token required for Pyannote diarization. See docs/setup/huggingface-token.md in the repo.
OmniVoice Studio — How to Use It
07 / 08
TTS Engines
Six TTS engines are built in. Switch via Settings → TTS Engine or the env var:OMNIVOICE_TTS_BACKEND=cosyvoice
| Engine | Languages | Clone | Platform |
|---|---|---|---|
| OmniVoice (default) | 600+ | ✓ | CUDA / MPS / CPU |
| CosyVoice 3 | 9 + 18 dialects | ✓ | CUDA / MPS / CPU |
| MLX-Audio | Multi | Varies | Apple Silicon only |
| VoxCPM2 | 30 | ✓ | CUDA / MPS / CPU |
| MOSS-TTS-Nano | 20 | ✓ | CUDA / CPU |
| KittenTTS | English | ✗ | CPU only |
Custom engine: Subclass TTSBackend in backend/services/tts_backend.py and add it to _REGISTRY. ~50 lines of Python.
OmniVoice Studio — How to Use It
08 / 08
MCP Server & Resources
OmniVoice Studio ships a built-in MCP Server, exposing voice and dubbing capabilities to any MCP-compatible client — Claude, Cursor, or your own tooling — without opening the desktop UI.
- MCP Server starts alongside the FastAPI backend on bun dev
- Point your MCP client at the local server to access all endpoints
- AudioSeal (Meta) embeds an invisible neural watermark in all generated audio for AI provenance
- GitHub: github.com/debpalash/OmniVoice-Studio
- Install docs: docs/install/ (macos / windows / linux / docker)
- Troubleshooting: docs/install/troubleshooting.md
- Discord: discord.gg/bzQavDfVV9
Credit: Source link

























