Reducing Memory and Computational Cost for Deep Neural Network Training with Quantized Parameter Updates
For embedded devices, both memory and computational efficiency are essential due to their constrained resources. However, neural network training remains both computation and memory intensive. Although many existing studies apply quantization schemes to mitigate memory overhead, they often employ st...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Graz University of Technology
2025-08-01
|
| Series: | Journal of Universal Computer Science |
| Subjects: | |
| Online Access: | https://lib.jucs.org/article/164737/download/pdf/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | For embedded devices, both memory and computational efficiency are essential due to their constrained resources. However, neural network training remains both computation and memory intensive. Although many existing studies apply quantization schemes to mitigate memory overhead, they often employ stochastic rounding for both inference and gradient computation. Notably, no prior work has explored its advantages exclusively in parameter updates. Here, we in-troduce Quantized Parameter Updates (QPU), which uses stochastic rounding (SQPU) to achieve improved and more stable training outcomes. Our fixed-point quantization scheme quantizes parameters (weights and biases) upon model initialization, conducts high-precision gradient com-putations during training, and applies stochastically quantized updates thereafter. This approach substantially lowers memory usage and enables mostly quantized inference, thereby accelerating calculations. Furthermore, storing quantized inputs for gradient computation reduces memory demands even more. When tested on the FASHION-MNIST dataset, our method matches the Straight-Through Estimator (STE) in performance, delivering 0.92% validation accuracy while consuming just 57% of the memory during training. Accepting a slight 1.5% drop in accuracy yields a 50% memory reduction. Additional techniques include stochastic rounding in inference, the use of higher precision for parameters than for layer outputs to limit overflow, L2 regularization via weight decay, and adaptive learning-rate scheduling for improved optimization across a range of batch sizes. |
|---|---|
| ISSN: | 0948-6968 |