Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance

Abstract Multi-modal large language models (MLLMs) have demonstrated impressive performance in vision-language tasks across a wide range of domains. However, the large model scale and associated high computational cost pose significant challenges for training and deploying MLLMs on consumer-grade GP...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhangwei Gao, Zhe Chen, Erfei Cui, Yiming Ren, Weiyun Wang, Jinguo Zhu, Hao Tian, Shenglong Ye, Junjun He, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Jifeng Dai, Wenhai Wang
Format:	Article
Language:	English
Published:	Springer 2024-12-01
Series:	Visual Intelligence
Subjects:	Lightweight multi-modal large language model Vision-language model Knowledge distillation Visual instruction tuning
Online Access:	https://doi.org/10.1007/s44267-024-00067-6
Tags:	Add Tag No Tags, Be the first to tag this record!

Internet

https://doi.org/10.1007/s44267-024-00067-6

Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance

Internet

Similar Items