Inworld AI TTS - ElevenLabs Alternative
Research Date: 2025-12-28 Track Project: 390b2ed1 Status: Evaluation in progress
Executive Summary
Inworld AI TTS is dramatically cheaper than ElevenLabs: - Inworld: $5 per 1 million characters - ElevenLabs Pro: $100/month - Savings: Up to 95% cost reduction
Current Promotion: FREE until December 31, 2025!
Pricing Comparison
| Service | Cost | Notes |
|---|---|---|
| Inworld AI | $5 / 1M chars | ~$0.25 per audio hour |
| ElevenLabs Pro | $100/month | Fixed monthly |
| Inworld Promo | FREE | Until Dec 31, 2025 |
Inworld also includes: - 2 million free characters for new users - Free zero-shot voice cloning - No per-character charge during promo
API Structure
Endpoints
POST https://api.inworld.ai/tts/v1/voice # Standard
POST https://api.inworld.ai/tts/v1/voice:stream # Streaming
Authentication
Basic auth with Base64-encoded API key:
headers = {
"Authorization": f"Basic {base64_api_key}",
"Content-Type": "application/json"
}
Request Format
{
"text": "Hello world",
"voiceId": "Ashley",
"modelId": "inworld-tts-1"
}
Response Format
{
"result": {
"audioContent": "<base64-encoded-audio>"
}
}
Available Models
| Model | ID | Features |
|---|---|---|
| Inworld TTS | inworld-tts-1 |
Rich, expressive, low-latency |
| Inworld TTS Max | inworld-tts-1-max |
More expressive, better multilingual |
Supported Languages: English, German, Spanish, French, Italian, Japanese, Korean, Dutch, Polish, Portuguese, Russian, Chinese (12 total)
Voice Cloning
Instant Cloning (All Users)
- Only 5-15 seconds of audio needed
- Up to 3 samples
- Formats: wav, mp3, webm
- Max 16MB total
Professional Cloning (Contact Sales)
- 30+ minutes of audio
- Higher quality output
- For unique voices/accents
Expression & Markup Features
Emphasis (use asterisks)
We *need* a beach vacation
The word "need" will be emphasized.
Non-Verbal Tags
[breathe] [clear_throat] [cough] [laugh] [sigh] [yawn]
Example:
[clear_throat] Did you hear what I said? [sigh] You never listen!
Text Normalization (Speak Out)
| Type | Written | Spoken |
|---|---|---|
| Phone | (123)456-7891 | one two three, four five six... |
| Date | 5/6/2025 | may sixth twenty twenty five |
| Time | 12:55 PM | twelve fifty-five PM |
| test@example.com | test at example dot com | |
| Money | $5,342.29 | five thousand three hundred... |
| Math | 2+2=4 | two plus two equals four |
Natural Speech
Add filler words for realism: "uh", "um", "well", "like"
Migration Path from ElevenLabs
What Changes
- API endpoint URL
- Authentication method (Bearer → Basic)
- Request payload structure
- Voice IDs (need to map or clone)
What Stays Similar
- Send text, get base64 audio back
- Streaming support available
- Voice cloning available
- Queue-based playback works the same
Migration Steps
- Create Inworld account
- Generate API key in Portal
- Clone Nexus/LARS voices (instant cloning)
- Update voice server to use Inworld API
- Test with both services in parallel
- Switch over when satisfied
Potential Voice Protocol Updates
If we switch to Inworld, add to workflow:
**inworld_formatting**: Use *asterisks* for emphasis. Normalize numbers/dates to spoken form. Add [sigh], [laugh] etc. for expression.
Resources
- Pricing: https://inworld.ai/pricing
- TTS Docs: https://docs.inworld.ai/docs/tts/tts
- Quickstart: https://docs.inworld.ai/docs/quickstart-tts
- Best Practices: https://docs.inworld.ai/docs/tts/best-practices/generating-speech
- Voice Cloning: https://docs.inworld.ai/docs/tts/voice-cloning
Recommendation
Strong candidate for migration.
Pros: - 95% cost savings ($5/1M vs $100/mo) - Currently FREE through Dec 2025 - Voice cloning included free - Expression markup (asterisks, non-verbal tags) - Streaming support - 12 languages
Cons: - Need to clone our voices (quick process) - Different API structure (minor code changes) - Less mature ecosystem than ElevenLabs
Next Step: Create Inworld account, clone Nexus voice, test quality.