Back to Newsroom
Online Vector Quantized Attention
Press Release
February 5, 2026
san francisco, CALIFORNIA

In this blog, we describe a novel sequence mixing layer developed here at Zyphra that aims to find a better compromise between memory-compute costs and long-context capabilities than standard sequence mixing layers. We call this layer Online Vector-Quantized (OVQ) attention.

Authors
Nick Alonso, Tomás Figliolia, Beren Millidge
Collaborators
Daniel A Roberts (Sequoia Capital & MIT), Andrey Gromov (Meta FAIR), Kushal Tirumala (Meta FAIR) and Hassan Shapourian (Cisco)