Text to Speech
The text_to_speech model converts text into natural-sounding spoken audio. These models enable voice generation for accessibility, content creation, virtual assistants, and more.
✦₊⁺ Overview
Text-to-speech (TTS) models transform written text into spoken audio. They enable:
- Voice Synthesis: Convert any text to natural-sounding speech
- Voice Selection: Choose from different voice profiles
- Speed Control: Adjust speaking rate
- Format Options: Generate audio in various formats
- Streaming: Real-time audio generation
Common Use Cases
- Accessibility: Convert text to speech for visually impaired users
- Content Creation: Generate voiceovers for videos and podcasts
- Virtual Assistants: Add voice to chatbots and AI assistants
- Audio Books: Convert written content to audio format
- Language Learning: Pronunciation examples
- Notifications: Voice alerts and announcements
1. Supported Providers
Dependencies
See Dependency Management for the complete provider matrix.
Example
2. Quick Start
Example
3. Audio Formats
Format Comparison
| Format | Quality | Size | Use Case |
|---|---|---|---|
| opus | High | Small | Streaming, real-time |
| mp3 | Good | Medium | Universal playback |
| aac | High | Small | Mobile, web |
| flac | Lossless | Large | Archival, editing |
| wav | Lossless | Large | Professional audio |
| pcm | Raw | Largest | Audio processing |
4. Voice Instructions
Control voice characteristics with prompts:
Example
import msgflux as mf
model = mf.Model.text_to_speech("openai/gpt-4o-mini-tts")
# Add emotional tone
response = model(
"I'm so excited about this!",
prompt="Speak with enthusiasm and energy"
)
# Control pacing
response = model(
"This is an important announcement.",
prompt="Speak slowly and clearly, emphasizing each word"
)
# Set context
response = model(
"Welcome to the show!",
prompt="Speak as a radio host, upbeat and friendly"
)
Note: gpt-4o-mini-tts has native steerability — you can instruct not just what to say but how to say it.
5. Streaming Audio
Generate and play audio in real-time:
Example
import msgflux as mf
import asyncio
model = mf.Model.text_to_speech("openai/gpt-4o-mini-tts")
response = model(
"This is a long text that will be streamed as audio chunks...",
stream=True
)
# consume() returns an async generator
async def handle():
async for chunk in response.consume():
if chunk is None: # End of stream
break
# chunk is bytes - play or save incrementally
process_audio_chunk(chunk)
asyncio.run(handle())
import msgflux as mf
import asyncio
model = mf.Model.text_to_speech("openai/gpt-4o-mini-tts")
async def save():
response = model(
"This will be streamed to a file.",
stream=True,
response_format="mp3"
)
with open("output.mp3", "wb") as f:
async for chunk in response.consume():
if chunk is None:
break
f.write(chunk)
print("Audio saved to output.mp3")
asyncio.run(save())
import msgflux as mf
import asyncio
model = mf.Model.text_to_speech(
"together/canopylabs/orpheus-3b-0.1-ft",
voice="tara",
response_format="mp3"
)
async def save():
response = model(
"Today is a wonderful day to build something people love!",
stream=True
)
with open("output.mp3", "wb") as f:
async for chunk in response.consume():
if chunk is None:
break
f.write(chunk)
asyncio.run(save())
import msgflux as mf
import asyncio
import pyaudio # pip install pyaudio
model = mf.Model.text_to_speech("openai/gpt-4o-mini-tts")
async def play():
p = pyaudio.PyAudio()
stream = p.open(
format=pyaudio.paInt16,
channels=1,
rate=24000, # 24kHz for TTS
output=True
)
response = model(
"This will be played in real-time.",
stream=True,
response_format="pcm"
)
async for chunk in response.consume():
if chunk is None:
break
stream.write(chunk)
stream.stop_stream()
stream.close()
p.terminate()
asyncio.run(play())
6. Async Support
Generate audio asynchronously:
Example
import msgflux as mf
import asyncio
model = mf.Model.text_to_speech("openai/gpt-4o-mini-tts")
async def stream_speech(text):
response = await model.acall(text, stream=True)
async for chunk in response.consume():
if chunk is None:
break
# Process chunk asynchronously
await process_chunk(chunk)
asyncio.run(stream_speech("Hello world"))
7. Batch Processing
Generate multiple audio files:
Example
import msgflux as mf
import msgflux.nn.functional as F
model = mf.Model.text_to_speech("openai/gpt-4o-mini-tts", voice="nova")
texts = [
"Welcome to chapter one.",
"Welcome to chapter two.",
"Welcome to chapter three."
]
# Generate in parallel
results = F.map_gather(
model,
args_list=[(text,) for text in texts]
)
# Save all files
for i, result in enumerate(results):
audio_path = result.consume()
import shutil
shutil.copy(audio_path, f"chapter_{i+1}.opus")
8. Working with Audio Files
Example
import msgflux as mf
import shutil
model = mf.Model.text_to_speech("openai/gpt-4o-mini-tts")
response = model("Save this audio", response_format="mp3")
# Get temporary file path and copy to desired location
temp_path = response.consume()
shutil.copy(temp_path, "output.mp3")
print("Saved to output.mp3")
import msgflux as mf
from pydub import AudioSegment
model = mf.Model.text_to_speech("openai/gpt-4o-mini-tts")
response = model("Get information about this audio", response_format="mp3")
audio_path = response.consume()
# Load with pydub
audio = AudioSegment.from_mp3(audio_path)
print(f"Duration: {len(audio) / 1000:.2f} seconds")
print(f"Channels: {audio.channels}")
print(f"Frame rate: {audio.frame_rate} Hz")
print(f"Sample width: {audio.sample_width} bytes")
9. Error Handling
Example
import msgflux as mf
model = mf.Model.text_to_speech("openai/gpt-4o-mini-tts")
try:
response = model("Hello world")
audio_path = response.consume()
except ImportError:
print("Provider not installed")
except ValueError as e:
print(f"Invalid parameters: {e}")
# Common issues:
# - Invalid voice name
# - Speed out of range (0.25-4.0)
# - Invalid response_format
except Exception as e:
print(f"Generation failed: {e}")
# Common errors:
# - Rate limits
# - Network issues
# - Text too long