Back to Newsroom
Can Long-Context Tasks be Solved with RAG?
October 17, 2024
PALO ALTO, CALIFORNIA

Building on our prior work solving complex multi-hop reasoning with retrieval-augmented-generation (RAG), we demonstrate how we can build a retrieval system that can solve the Hash-Hop task on up to 100M context in only a few seconds on a standard CPU, which can be paired with any off-the-shelf LLM. Our approach utilizes sparse embeddings to save CPU memory when handling long contexts and enables extremely rapid retrieval on CPU by exploiting sparse matrix multiplication routines. We utilize a sparse variant of our previous personalized page-rank graph retriever to perform searches over 100M context in under 100ms. Since our retriever runs entirely on CPU, it can operate in complete synergy with the GPU-based LLM. 

Authors
Nick Alonso, Beren Millidge
Collaborators
Daniel A Roberts (Sequoia Capital & MIT), Andrey Gromov (Meta FAIR), Kushal Tirumala (Meta FAIR) and Hassan Shapourian (Cisco)