Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance

Abstract Multi-modal large language models (MLLMs) have demonstrated impressive performance in vision-language tasks across a wide range of domains. However, the large model scale and associated high computational cost pose significant challenges for training and deploying MLLMs on consumer-grade GP...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhangwei Gao, Zhe Chen, Erfei Cui, Yiming Ren, Weiyun Wang, Jinguo Zhu, Hao Tian, Shenglong Ye, Junjun He, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Jifeng Dai, Wenhai Wang
Format: Article
Language:English
Published: Springer 2024-12-01
Series:Visual Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44267-024-00067-6
Tags: Add Tag
No Tags, Be the first to tag this record!