ZAYA
Suite of language models - ...
ZAYA1 REASONING
ZAYA1 Reasoning excels at applications requiring long-context and rapid response times.
MODALITIES
Large Language Model (LLM)
ARCHITECTURE
8.3B-760M Parameters Mixture-of-Experts (MoE)
FEATURES
State-of-the-Art Benchmarks
ZAYA achieves quality at parity with Qwen3-4B and Gemma3-12B, while exceeding models such as IBM-Granite-4-H-Tiny.
Fast Time-to-first-token
ZAYA’s MoE architecture enables the response times of an 800M dense model, but the quality of a 12B dense model. Quality is never sacrificed for performance.
Trained Full Stack AMD
First AI model trained entirely end-to-end on AMD’s hardware, software, and networking stack.
MODEL COMPONENTS
Compressed Convolutional Attention (CCA)
CCA compresses inputs before applying attention; carefully tuned to match the quality of full attention for a fraction of the compute cost.
ZAYA Router
The ZAYA router ensures that experts are actually experts while preserving balancing. This increases model capacity, resolves the fine-tuning problem inherent to MoEs, and prevents load imbalance on hardware.
PREVIOUS MODELS
ZAYA1 Base