InWorld AI TTS Integration
Why InWorld AI
- Already working in Nexus voice pipeline
- Better voice quality than Kokoro/Whisper TTS
- Supports expression markup [happy], [sad], etc.
- Voice cloning available (Lena voice)
Current Setup
Voice MCP server at /opt/mcp-servers/voice/ - Uses InWorld as primary provider - ElevenLabs as fallback - WebSocket delivery to browser
DO NOT USE
- Kokoro TTS (poor quality)
- Whisper TTS (poor quality)
- Any other TTS in LiveKit stack
Integration Approach
When LiveKit pipeline produces text response: 1. Intercept the text output 2. Route to our voice MCP server 3. voice.voice({paragraphs: ["response text"]}) 4. WebSocket delivers audio to client
Code Pattern
# Instead of LiveKit's TTS
# response_audio = tts.synthesize(text)
# Route to our gateway
import requests
def speak_via_nexus(text: str):
# Call our voice MCP through gateway
# This uses InWorld AI under the hood
response = requests.post(
"http://nexus-gateway/voice",
json={"paragraphs": [text]}
)
return response.json()
Voice Characteristics
- Natural pauses at commas and periods
- Expression tags: [happy], [laughing], [sigh]
- Pronunciation: Use hyphens for unusual words (Marga-heat-a)
- Max 500 chars per paragraph