ELO-Mask: Effective and Layerwise Optimization of Mask for Sparse LLMs

To address the issue of the substantial computational resource consumption during the inference phase of large language models due to their vast number of parameters, model sparsification is an effective solution. However, current sparsification methods for large models are costly. We propose a comp...

Full description

Saved in:

Bibliographic Details
Main Authors:	Bingjie Xiang, Jiarui Wu, Xiaoying Han, Qian Gu, Fei Chao, Xiao Yang, Fan Wu, Xin Fu
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Model sparsification large language model mask rearrangement accuracy recovery small samples
Online Access:	https://ieeexplore.ieee.org/document/10753603/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846156544094240768
author	Bingjie Xiang Jiarui Wu Xiaoying Han Qian Gu Fei Chao Xiao Yang Fan Wu Xin Fu
author_facet	Bingjie Xiang Jiarui Wu Xiaoying Han Qian Gu Fei Chao Xiao Yang Fan Wu Xin Fu
author_sort	Bingjie Xiang
collection	DOAJ
description	To address the issue of the substantial computational resource consumption during the inference phase of large language models due to their vast number of parameters, model sparsification is an effective solution. However, current sparsification methods for large models are costly. We propose a comprehensive two-stage approach called ELO-Mask for the rapid sparsification of large language models using a small calibration dataset. The approach consists of two steps: 1) Mask Reordering Step, this step involves initializing the mask using predefined parameter importance metrics, followed by reordering the model masks in blocks using the Straight-Through Estimator method with a small sample dataset. 2) Mask Fine-Tuning Step, this step involves further fine-tuning the masks obtained from the first step in blocks, using the same small sample dataset. Our experiments demonstrate the effectiveness of this approach. When sparsifying the Llama-7B model, our method shows significant superiority over the standard sparsification plus LoRA fine-tuning approach. It achieves comparable performance in the final sparse model while consuming less computational power, using a smaller dataset, occupying less GPU memory, and not affecting the inference speed of the sparse model.
format	Article
id	doaj-art-99ae6ac2d9c441e19b2e27f1092ddf6d
institution	Kabale University
issn	2169-3536
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-99ae6ac2d9c441e19b2e27f1092ddf6d2024-11-26T00:00:50ZengIEEEIEEE Access2169-35362024-01-011217054117055210.1109/ACCESS.2024.349890410753603ELO-Mask: Effective and Layerwise Optimization of Mask for Sparse LLMsBingjie Xiang0Jiarui Wu1Xiaoying Han2Qian Gu3Fei Chao4https://orcid.org/0000-0002-6928-2638Xiao Yang5Fan Wu6Xin Fu7https://orcid.org/0000-0001-7958-8684Information Center, China Tobacco Fujian Industrial Company Ltd., Xiamen, Fujian, ChinaInstitute of Artificial Intelligence, Xiamen University, Xiamen, ChinaInformation Center, China Tobacco Fujian Industrial Company Ltd., Xiamen, Fujian, ChinaInformation Center, China Tobacco Fujian Industrial Company Ltd., Xiamen, Fujian, ChinaInstitute of Artificial Intelligence, Xiamen University, Xiamen, ChinaSchool of Informatics, Xiamen University, Xiamen, ChinaSchool of Informatics, Xiamen University, Xiamen, ChinaManagement School, Xiamen University, Xiamen, ChinaTo address the issue of the substantial computational resource consumption during the inference phase of large language models due to their vast number of parameters, model sparsification is an effective solution. However, current sparsification methods for large models are costly. We propose a comprehensive two-stage approach called ELO-Mask for the rapid sparsification of large language models using a small calibration dataset. The approach consists of two steps: 1) Mask Reordering Step, this step involves initializing the mask using predefined parameter importance metrics, followed by reordering the model masks in blocks using the Straight-Through Estimator method with a small sample dataset. 2) Mask Fine-Tuning Step, this step involves further fine-tuning the masks obtained from the first step in blocks, using the same small sample dataset. Our experiments demonstrate the effectiveness of this approach. When sparsifying the Llama-7B model, our method shows significant superiority over the standard sparsification plus LoRA fine-tuning approach. It achieves comparable performance in the final sparse model while consuming less computational power, using a smaller dataset, occupying less GPU memory, and not affecting the inference speed of the sparse model.https://ieeexplore.ieee.org/document/10753603/Model sparsificationlarge language modelmask rearrangementaccuracy recoverysmall samples
spellingShingle	Bingjie Xiang Jiarui Wu Xiaoying Han Qian Gu Fei Chao Xiao Yang Fan Wu Xin Fu ELO-Mask: Effective and Layerwise Optimization of Mask for Sparse LLMs IEEE Access Model sparsification large language model mask rearrangement accuracy recovery small samples
title	ELO-Mask: Effective and Layerwise Optimization of Mask for Sparse LLMs
title_full	ELO-Mask: Effective and Layerwise Optimization of Mask for Sparse LLMs
title_fullStr	ELO-Mask: Effective and Layerwise Optimization of Mask for Sparse LLMs
title_full_unstemmed	ELO-Mask: Effective and Layerwise Optimization of Mask for Sparse LLMs
title_short	ELO-Mask: Effective and Layerwise Optimization of Mask for Sparse LLMs
title_sort	elo mask effective and layerwise optimization of mask for sparse llms
topic	Model sparsification large language model mask rearrangement accuracy recovery small samples
url	https://ieeexplore.ieee.org/document/10753603/
work_keys_str_mv	AT bingjiexiang elomaskeffectiveandlayerwiseoptimizationofmaskforsparsellms AT jiaruiwu elomaskeffectiveandlayerwiseoptimizationofmaskforsparsellms AT xiaoyinghan elomaskeffectiveandlayerwiseoptimizationofmaskforsparsellms AT qiangu elomaskeffectiveandlayerwiseoptimizationofmaskforsparsellms AT feichao elomaskeffectiveandlayerwiseoptimizationofmaskforsparsellms AT xiaoyang elomaskeffectiveandlayerwiseoptimizationofmaskforsparsellms AT fanwu elomaskeffectiveandlayerwiseoptimizationofmaskforsparsellms AT xinfu elomaskeffectiveandlayerwiseoptimizationofmaskforsparsellms

ELO-Mask: Effective and Layerwise Optimization of Mask for Sparse LLMs

Similar Items