Entropy-Guided KV Caching for Efficient LLM Inference

Large language models (LLMs), built upon Transformer architectures, have demonstrated remarkable performance in a wide range of natural language processing tasks. However, their practical deployment—especially in long-context scenarios—is often hindered by the computational and memory costs associat...

Full description

Saved in:

Bibliographic Details
Main Authors:	Heekyum Kim, Yuchul Jung
Format:	Article
Language:	English
Published:	MDPI AG 2025-07-01
Series:	Mathematics
Subjects:	LLM KV cache transformer LLM inference optimization attention entropy memory-efficient caching
Online Access:	https://www.mdpi.com/2227-7390/13/15/2366
Tags:	Add Tag No Tags, Be the first to tag this record!

Internet

https://www.mdpi.com/2227-7390/13/15/2366

Entropy-Guided KV Caching for Efficient LLM Inference

Internet

Similar Items