Reducing Memory and Computational Cost for Deep Neural Network Training with Quantized Parameter Updates

For embedded devices, both memory and computational efficiency are essential due to their constrained resources. However, neural network training remains both computation and memory intensive. Although many existing studies apply quantization schemes to mitigate memory overhead, they often employ st...

Full description

Saved in:

Bibliographic Details
Main Authors:	Leo Buron, Andreas Erbslöh, Gregor Schiele
Format:	Article
Language:	English
Published:	Graz University of Technology 2025-08-01
Series:	Journal of Universal Computer Science
Subjects:	deep learning quantized parameters edge training
Online Access:	https://lib.jucs.org/article/164737/download/pdf/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	For embedded devices, both memory and computational efficiency are essential due to their constrained resources. However, neural network training remains both computation and memory intensive. Although many existing studies apply quantization schemes to mitigate memory overhead, they often employ stochastic rounding for both inference and gradient computation. Notably, no prior work has explored its advantages exclusively in parameter updates. Here, we in-troduce Quantized Parameter Updates (QPU), which uses stochastic rounding (SQPU) to achieve improved and more stable training outcomes. Our fixed-point quantization scheme quantizes parameters (weights and biases) upon model initialization, conducts high-precision gradient com-putations during training, and applies stochastically quantized updates thereafter. This approach substantially lowers memory usage and enables mostly quantized inference, thereby accelerating calculations. Furthermore, storing quantized inputs for gradient computation reduces memory demands even more. When tested on the FASHION-MNIST dataset, our method matches the Straight-Through Estimator (STE) in performance, delivering 0.92% validation accuracy while consuming just 57% of the memory during training. Accepting a slight 1.5% drop in accuracy yields a 50% memory reduction. Additional techniques include stochastic rounding in inference, the use of higher precision for parameters than for layer outputs to limit overflow, L2 regularization via weight decay, and adaptive learning-rate scheduling for improved optimization across a range of batch sizes.
ISSN:	0948-6968

Reducing Memory and Computational Cost for Deep Neural Network Training with Quantized Parameter Updates

Similar Items