page

Phase 2 - Voice Pipeline

Phase 2: Voice Pipeline

Goal

Connect wake word to full voice conversation.

Architecture

[Wake Word Detected]
       |
       v
[Start Recording]
       |
       v
[Whisper STT] --> Text
       |
       v
[LARS/Ollama] --> Response
       |
       v
[InWorld TTS] --> Audio
       |
       v
[Play Response]

Tasks

1. STT Setup

[ ] Install Whisper locally
[ ] Test transcription accuracy
[ ] Optimize for speed vs accuracy

2. LARS Integration

[ ] Verify lars-trained model in Ollama
[ ] Create API wrapper for conversation
[ ] Handle conversation context/memory

3. TTS Integration

[ ] Route responses to Nexus voice MCP
[ ] Test InWorld AI output
[ ] Handle long responses (paragraph splitting)

4. End-to-End Test

[ ] "Hey LARS" → Question → Response → Speech
[ ] Measure total latency
[ ] Test interruption handling

Success Criteria

Full conversation loop working
Total latency < 5 seconds
Natural sounding responses
Handles multi-turn conversations

🌳 View Tree