Entropy-Guided KV Caching for Efficient LLM Inference

Large language models (LLMs), built upon Transformer architectures, have demonstrated remarkable performance in a wide range of natural language processing tasks. However, their practical deployment—especially in long-context scenarios—is often hindered by the computational and memory costs associat...

Full description

Saved in:
Bibliographic Details
Main Authors: Heekyum Kim, Yuchul Jung
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/15/2366
Tags: Add Tag
No Tags, Be the first to tag this record!