Phase 2: Voice Pipeline
Goal
Connect wake word to full voice conversation.
Architecture
[Wake Word Detected]
|
v
[Start Recording]
|
v
[Whisper STT] --> Text
|
v
[LARS/Ollama] --> Response
|
v
[InWorld TTS] --> Audio
|
v
[Play Response]
Tasks
1. STT Setup
- [ ] Install Whisper locally
- [ ] Test transcription accuracy
- [ ] Optimize for speed vs accuracy
2. LARS Integration
- [ ] Verify lars-trained model in Ollama
- [ ] Create API wrapper for conversation
- [ ] Handle conversation context/memory
3. TTS Integration
- [ ] Route responses to Nexus voice MCP
- [ ] Test InWorld AI output
- [ ] Handle long responses (paragraph splitting)
4. End-to-End Test
- [ ] "Hey LARS" → Question → Response → Speech
- [ ] Measure total latency
- [ ] Test interruption handling
Success Criteria
- Full conversation loop working
- Total latency < 5 seconds
- Natural sounding responses
- Handles multi-turn conversations