nn.Transcriber
✦₊⁺ Overview
The nn.Transcriber module wraps speech-to-text models to transcribe audio into text or structured data.
1. Quick Start
Initialization styles
import msgflux as mf
import msgflux.nn as nn
class Speech2Text(nn.Transcriber):
"""Transcribes user voice notes."""
model = mf.Model.speech_to_text("openai/whisper-1")
response_mode = "content"
message_fields = {"task_multimodal": {"audio": "user_audio"}}
transcriber = Speech2Text()
result = transcriber("/path/to/audio.mp3")
2. Input Types
Supported audio inputs
Extract audio from a structured message via message_fields:
import msgflux as mf
import msgflux.nn as nn
class Speech2Text(nn.Transcriber):
model = mf.Model.speech_to_text("openai/whisper-1")
message_fields = {"task_multimodal": {"audio": "user_audio"}}
response_mode = "transcription"
transcriber = Speech2Text()
msg = mf.dotdict(user_audio="/path/to/audio.mp3")
transcriber(msg)
print(msg.transcription)
3. Parameters
| Parameter | Description |
|---|---|
model |
Speech-to-text model client instance |
message_fields |
Map inputs (audio) from Message fields |
response_mode |
Where to write the output in the Message |
response_template |
Jinja template to format the output string |
response_format |
"text" (default), "json", "verbose_json", "srt", "vtt" |
prompt |
Optional text prompt to guide style or vocabulary |
config |
Runtime params passed to the model: language, stream, timestamp_granularities |
4. Configuration
Controlling transcription behavior
Specify the spoken language to improve accuracy and speed via config:
Request word or segment-level timestamps. Requires response_format="verbose_json" and whisper-1:
class TimestampTranscriber(nn.Transcriber):
model = mf.Model.speech_to_text("openai/whisper-1")
response_format = "verbose_json"
config = {"timestamp_granularities": ["word"]}
transcriber = TimestampTranscriber()
result = transcriber("audio.mp3")
# result["text"] — full transcript
# result["words"] — [{"word": "Hello", "start": 0.0, "end": 0.5}, ...]
5. Integration with Agents
Transcribers are often the first step in a voice processing pipeline.
Transcriber → Agent pipeline
import msgflux as mf
import msgflux.nn as nn
class Speech2Text(nn.Transcriber):
model = mf.Model.speech_to_text("openai/whisper-1")
message_fields = {"task_multimodal": {"audio": "user_audio"}}
response_mode = "content"
class Analyzer(nn.Agent):
"""Analyzes the transcribed text."""
model = mf.Model.chat_completion("openai/gpt-4.1-mini")
message_fields = {"task": "content"}
response_mode = "analysis"
transcriber = Speech2Text()
analyzer = Analyzer()
pipeline = mf.Inline(
"{user_audio is not None? transcriber} -> analyzer",
{"transcriber": transcriber, "analyzer": analyzer},
)
msg = mf.dotdict(user_audio="/path/to/voice_note.mp3")
pipeline(msg)
print(f"Transcript: {msg.content}")
print(f"Analysis: {msg.analysis}")