Alpha-Backpropagation Algorithm: Probability Estimation for One-Hot and Quasi-One-Hot Teachers

This study proposes theories and applications of probabilistic divergences to neural network training. This theory generalizes the cross-entropy method for backpropagation to the alpha-divergence method. This new method includes the cross-entropy method as a limited case study. An advantage is the s...

Full description

Saved in:
Bibliographic Details
Main Author: Yasuo Matsuyama
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11007017/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850142007319592960
author Yasuo Matsuyama
author_facet Yasuo Matsuyama
author_sort Yasuo Matsuyama
collection DOAJ
description This study proposes theories and applications of probabilistic divergences to neural network training. This theory generalizes the cross-entropy method for backpropagation to the alpha-divergence method. This new method includes the cross-entropy method as a limited case study. An advantage is the speedup during the training phase. The target architecture is the one-hot and quasi-one-hot teacher cases. Quasi-one-hotness occurs when a teacher intentionally weakens its one-hotness moderately. In addition, we devise a reverse process for quasi-one-hotness, that is, quenching. For non-one-hot elements, this process is equivalent to annealing. The new methods automatically weigh backpropagation, reflecting the nature of one-hot and quasi-one-hot teacher data. These strategies are applicable to versatile generative artificial intelligence (AI) tools. Using generic neural networks, we investigated the degree of learning speedup. The new methods present remarkable speedups, illustrations of which require logarithmic scales. This property also allows more elaborate conditional probability estimations by partitioning a neural network. Thus, the alpha-divergence method has become meritorious for accelerating versatile edge AI developments, particularly for low-cost and low-power fine-tuning.
format Article
id doaj-art-d39b3b4c5eec43b689e4262e772c2074
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-d39b3b4c5eec43b689e4262e772c20742025-08-20T02:29:15ZengIEEEIEEE Access2169-35362025-01-0113877128772910.1109/ACCESS.2025.357140411007017Alpha-Backpropagation Algorithm: Probability Estimation for One-Hot and Quasi-One-Hot TeachersYasuo Matsuyama0https://orcid.org/0000-0002-0819-9240Faculty of Science and Engineering, Waseda University, Tokyo, JapanThis study proposes theories and applications of probabilistic divergences to neural network training. This theory generalizes the cross-entropy method for backpropagation to the alpha-divergence method. This new method includes the cross-entropy method as a limited case study. An advantage is the speedup during the training phase. The target architecture is the one-hot and quasi-one-hot teacher cases. Quasi-one-hotness occurs when a teacher intentionally weakens its one-hotness moderately. In addition, we devise a reverse process for quasi-one-hotness, that is, quenching. For non-one-hot elements, this process is equivalent to annealing. The new methods automatically weigh backpropagation, reflecting the nature of one-hot and quasi-one-hot teacher data. These strategies are applicable to versatile generative artificial intelligence (AI) tools. Using generic neural networks, we investigated the degree of learning speedup. The new methods present remarkable speedups, illustrations of which require logarithmic scales. This property also allows more elaborate conditional probability estimations by partitioning a neural network. Thus, the alpha-divergence method has become meritorious for accelerating versatile edge AI developments, particularly for low-cost and low-power fine-tuning.https://ieeexplore.ieee.org/document/11007017/Alpha-divergenceannealingbackpropagationcross-entropylearning speedneural network training
spellingShingle Yasuo Matsuyama
Alpha-Backpropagation Algorithm: Probability Estimation for One-Hot and Quasi-One-Hot Teachers
IEEE Access
Alpha-divergence
annealing
backpropagation
cross-entropy
learning speed
neural network training
title Alpha-Backpropagation Algorithm: Probability Estimation for One-Hot and Quasi-One-Hot Teachers
title_full Alpha-Backpropagation Algorithm: Probability Estimation for One-Hot and Quasi-One-Hot Teachers
title_fullStr Alpha-Backpropagation Algorithm: Probability Estimation for One-Hot and Quasi-One-Hot Teachers
title_full_unstemmed Alpha-Backpropagation Algorithm: Probability Estimation for One-Hot and Quasi-One-Hot Teachers
title_short Alpha-Backpropagation Algorithm: Probability Estimation for One-Hot and Quasi-One-Hot Teachers
title_sort alpha backpropagation algorithm probability estimation for one hot and quasi one hot teachers
topic Alpha-divergence
annealing
backpropagation
cross-entropy
learning speed
neural network training
url https://ieeexplore.ieee.org/document/11007017/
work_keys_str_mv AT yasuomatsuyama alphabackpropagationalgorithmprobabilityestimationforonehotandquasionehotteachers