Language

FEATURES

State-of-the-Art Benchmarks

ZAYA achieves quality at parity with Qwen3-4B and Gemma3-12B, while exceeding models such as IBM-Granite-4-H-Tiny.

Fast Time-to-first-token

ZAYA’s MoE architecture enables the response times of an 800M dense model, but the quality of a 12B dense model. Quality is never sacrificed for performance.

Trained Full Stack AMD

First AI model trained entirely end-to-end on AMD’s hardware, software, and networking stack.

MODEL COMPONENTS

Compressed Convolutional Attention (CCA)

CCA compresses inputs before applying attention; carefully tuned to match the quality of full attention for a fraction of the compute cost.

ZAYA Router

The ZAYA router ensures that experts are actually experts while preserving balancing. This increases model capacity, resolves the fine-tuning problem inherent to MoEs, and prevents load imbalance on hardware.