Volume 3, Issue 3
Memory$^3$: Language Modeling with Explicit Memory

Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang & Weinan E

J. Mach. Learn. , 3 (2024), pp. 300-346.

Published online: 2024-09

[An open-access article; the PDF is free to any online user.]

Export citation
  • Abstract

The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowledge externalized to explicit memories, the LLM can enjoy a smaller parameter size, training cost, and inference cost, all proportional to the amount of remaining “abstract knowledge”. As a preliminary proof of concept, we train from scratch a 2.4 B LLM, which achieves better performance than much larger LLMs as well as RAG models, and maintains higher decoding speed than RAG. The model is named ${\rm Memory}^3$, since explicit memory is the third form of memory in LLMs after implicit memory (model parameters) and working memory (context key-values). We introduce a memory circuitry theory to support the externalization of knowledge, and present novel techniques including a memory sparsification mechanism that makes storage tractable and a two-stage pretraining scheme that facilitates memory formation.

  • AMS Subject Headings

  • Copyright

COPYRIGHT: © Global Science Press

  • Email address
  • BibTex
  • RIS
  • TXT
@Article{JML-3-300, author = {Yang , HongkangLin , ZehaoWang , WenjinWu , HaoLi , ZhiyuTang , BoWei , WenqiangWang , JinboTang , ZeyunSong , ShichaoXi , ChenyangYu , YuChen , KaiXiong , FeiyuTang , Linpeng and E , Weinan}, title = {Memory$^3$: Language Modeling with Explicit Memory}, journal = {Journal of Machine Learning}, year = {2024}, volume = {3}, number = {3}, pages = {300--346}, abstract = {

The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowledge externalized to explicit memories, the LLM can enjoy a smaller parameter size, training cost, and inference cost, all proportional to the amount of remaining “abstract knowledge”. As a preliminary proof of concept, we train from scratch a 2.4 B LLM, which achieves better performance than much larger LLMs as well as RAG models, and maintains higher decoding speed than RAG. The model is named ${\rm Memory}^3$, since explicit memory is the third form of memory in LLMs after implicit memory (model parameters) and working memory (context key-values). We introduce a memory circuitry theory to support the externalization of knowledge, and present novel techniques including a memory sparsification mechanism that makes storage tractable and a two-stage pretraining scheme that facilitates memory formation.

}, issn = {2790-2048}, doi = {https://doi.org/10.4208/jml.240708}, url = {http://global-sci.org/intro/article_detail/jml/23419.html} }
TY - JOUR T1 - Memory$^3$: Language Modeling with Explicit Memory AU - Yang , Hongkang AU - Lin , Zehao AU - Wang , Wenjin AU - Wu , Hao AU - Li , Zhiyu AU - Tang , Bo AU - Wei , Wenqiang AU - Wang , Jinbo AU - Tang , Zeyun AU - Song , Shichao AU - Xi , Chenyang AU - Yu , Yu AU - Chen , Kai AU - Xiong , Feiyu AU - Tang , Linpeng AU - E , Weinan JO - Journal of Machine Learning VL - 3 SP - 300 EP - 346 PY - 2024 DA - 2024/09 SN - 3 DO - http://doi.org/10.4208/jml.240708 UR - https://global-sci.org/intro/article_detail/jml/23419.html KW - Large language model, Explicit memory, Large-scale pretraining, Efficient inference, AI database. AB -

The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowledge externalized to explicit memories, the LLM can enjoy a smaller parameter size, training cost, and inference cost, all proportional to the amount of remaining “abstract knowledge”. As a preliminary proof of concept, we train from scratch a 2.4 B LLM, which achieves better performance than much larger LLMs as well as RAG models, and maintains higher decoding speed than RAG. The model is named ${\rm Memory}^3$, since explicit memory is the third form of memory in LLMs after implicit memory (model parameters) and working memory (context key-values). We introduce a memory circuitry theory to support the externalization of knowledge, and present novel techniques including a memory sparsification mechanism that makes storage tractable and a two-stage pretraining scheme that facilitates memory formation.

Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang & Weinan E. (2024). Memory$^3$: Language Modeling with Explicit Memory. Journal of Machine Learning. 3 (3). 300-346. doi:10.4208/jml.240708
Copy to clipboard
The citation has been copied to your clipboard