Audio

ZONOS

Frontier audio generation & speech models

ZONOS2

Expressive, low-latency voice generation with faithful cloning and multilingual precision.

PREVIOUS MODELS

ZONOS0.1

Modalities

Text-to-Speech (TTS)

Architecture

1.6B transformer and 1.6B hybrid

Features

Zero-Shot Voice Cloning

Clones voices from short voice samples.

Highly Expressive

Realistic and emotional generations.

Multilingual Support

Support for English and Japanese.

Controllable

Fine grained control over speaker and audio characteristics.