InMemQK: A Product Quantization Based MatMul Module for Compute-in-Memory Attention Macro

Large Language Models (LLMs), based on transformer architecture, have demonstrated remarkable capabilities in natural language processing tasks, enabling machines to generate human-like text and engage in meaningful dialogues. However, the exponential increase in model parameters has led to limitati...

Full description

Saved in:

Bibliographic Details
Main Authors:	Pengcheng Feng, Yihao Chen, Jinke Yu, Hao Yue, Zhelong Jiang, Yi Xiao, Wan’ang Xiao, Huaxiang Lu, Gang Chen
Format:	Article
Language:	English
Published:	MDPI AG 2024-12-01
Series:	Applied Sciences
Subjects:	neural network compute in memory transformer time domain non-volatile memory
Online Access:	https://www.mdpi.com/2076-3417/14/23/11198
Tags:	Add Tag No Tags, Be the first to tag this record!

Internet

https://www.mdpi.com/2076-3417/14/23/11198

InMemQK: A Product Quantization Based MatMul Module for Compute-in-Memory Attention Macro

Internet

Similar Items