ZONOS
Frontier audio generation & speech models
ZONOS2
Expressive, low-latency voice generation with faithful cloning and multilingual precision.
PREVIOUS MODELS
ZONOS0.1
Modalities
Text-to-Speech (TTS)
Architecture
1.6B transformer and 1.6B hybrid
Features
Zero-Shot Voice Cloning
Clones voices from short voice samples.
Highly Expressive
Realistic and emotional generations.
Multilingual Support
Support for English and Japanese.
Controllable
Fine grained control over speaker and audio characteristics.