Convolution Smooth: A Post-Training Quantization Method for Convolutional Neural Networks
Convolutional neural network (CNN) quantization is an efficient model compression technique primarily used for accelerating inference and optimizing resources. However, existing methods often apply different quantization strategies to activations and weights, without considering their interplay. To...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10955493/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850148378630946816 |
|---|---|
| author | Yongyuan Chen Zhendao Wang |
| author_facet | Yongyuan Chen Zhendao Wang |
| author_sort | Yongyuan Chen |
| collection | DOAJ |
| description | Convolutional neural network (CNN) quantization is an efficient model compression technique primarily used for accelerating inference and optimizing resources. However, existing methods often apply different quantization strategies to activations and weights, without considering their interplay. To address this issue, this paper proposes a new method called Convolution Smooth, which aims to balance the quantization difficulty of activations and weights. This method effectively mitigates the significant accuracy drop typically observed in activation quantization with traditional methods. By appropriately scaling the tensors, the method shifts quantization complexity from activations to weights, ensuring a reasonable distribution of quantization difficulty between activations and weights. Experimental results show that the proposed method is applicable to a wide range of network models and can be seamlessly integrated into multiple post-training quantization (PTQ) methods. Through tensor scaling and the fusion of factors, the network achieves significant accuracy improvements in most cases, particularly when there is a significant discrepancy between the activation values and weight quantization bit-widths. This study provides a new theoretical foundation and technical support for CNN quantization compression. |
| format | Article |
| id | doaj-art-19b51b8f59ff4ef8828e963fe95e71f1 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-19b51b8f59ff4ef8828e963fe95e71f12025-08-20T02:27:16ZengIEEEIEEE Access2169-35362025-01-0113647276473610.1109/ACCESS.2025.355895610955493Convolution Smooth: A Post-Training Quantization Method for Convolutional Neural NetworksYongyuan Chen0https://orcid.org/0009-0008-9813-5603Zhendao Wang1https://orcid.org/0009-0002-8504-9649Department of Electronic Science and Technology, School of Physics and Electronics, Hunan University, Changsha, ChinaDepartment of Electronic Science and Technology, School of Physics and Electronics, Hunan University, Changsha, ChinaConvolutional neural network (CNN) quantization is an efficient model compression technique primarily used for accelerating inference and optimizing resources. However, existing methods often apply different quantization strategies to activations and weights, without considering their interplay. To address this issue, this paper proposes a new method called Convolution Smooth, which aims to balance the quantization difficulty of activations and weights. This method effectively mitigates the significant accuracy drop typically observed in activation quantization with traditional methods. By appropriately scaling the tensors, the method shifts quantization complexity from activations to weights, ensuring a reasonable distribution of quantization difficulty between activations and weights. Experimental results show that the proposed method is applicable to a wide range of network models and can be seamlessly integrated into multiple post-training quantization (PTQ) methods. Through tensor scaling and the fusion of factors, the network achieves significant accuracy improvements in most cases, particularly when there is a significant discrepancy between the activation values and weight quantization bit-widths. This study provides a new theoretical foundation and technical support for CNN quantization compression.https://ieeexplore.ieee.org/document/10955493/Convolutional neural networkpost-training quantizationtensor scalingfactor fusion |
| spellingShingle | Yongyuan Chen Zhendao Wang Convolution Smooth: A Post-Training Quantization Method for Convolutional Neural Networks IEEE Access Convolutional neural network post-training quantization tensor scaling factor fusion |
| title | Convolution Smooth: A Post-Training Quantization Method for Convolutional Neural Networks |
| title_full | Convolution Smooth: A Post-Training Quantization Method for Convolutional Neural Networks |
| title_fullStr | Convolution Smooth: A Post-Training Quantization Method for Convolutional Neural Networks |
| title_full_unstemmed | Convolution Smooth: A Post-Training Quantization Method for Convolutional Neural Networks |
| title_short | Convolution Smooth: A Post-Training Quantization Method for Convolutional Neural Networks |
| title_sort | convolution smooth a post training quantization method for convolutional neural networks |
| topic | Convolutional neural network post-training quantization tensor scaling factor fusion |
| url | https://ieeexplore.ieee.org/document/10955493/ |
| work_keys_str_mv | AT yongyuanchen convolutionsmoothaposttrainingquantizationmethodforconvolutionalneuralnetworks AT zhendaowang convolutionsmoothaposttrainingquantizationmethodforconvolutionalneuralnetworks |