Alpha-Backpropagation Algorithm: Probability Estimation for One-Hot and Quasi-One-Hot Teachers
This study proposes theories and applications of probabilistic divergences to neural network training. This theory generalizes the cross-entropy method for backpropagation to the alpha-divergence method. This new method includes the cross-entropy method as a limited case study. An advantage is the s...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11007017/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850142007319592960 |
|---|---|
| author | Yasuo Matsuyama |
| author_facet | Yasuo Matsuyama |
| author_sort | Yasuo Matsuyama |
| collection | DOAJ |
| description | This study proposes theories and applications of probabilistic divergences to neural network training. This theory generalizes the cross-entropy method for backpropagation to the alpha-divergence method. This new method includes the cross-entropy method as a limited case study. An advantage is the speedup during the training phase. The target architecture is the one-hot and quasi-one-hot teacher cases. Quasi-one-hotness occurs when a teacher intentionally weakens its one-hotness moderately. In addition, we devise a reverse process for quasi-one-hotness, that is, quenching. For non-one-hot elements, this process is equivalent to annealing. The new methods automatically weigh backpropagation, reflecting the nature of one-hot and quasi-one-hot teacher data. These strategies are applicable to versatile generative artificial intelligence (AI) tools. Using generic neural networks, we investigated the degree of learning speedup. The new methods present remarkable speedups, illustrations of which require logarithmic scales. This property also allows more elaborate conditional probability estimations by partitioning a neural network. Thus, the alpha-divergence method has become meritorious for accelerating versatile edge AI developments, particularly for low-cost and low-power fine-tuning. |
| format | Article |
| id | doaj-art-d39b3b4c5eec43b689e4262e772c2074 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-d39b3b4c5eec43b689e4262e772c20742025-08-20T02:29:15ZengIEEEIEEE Access2169-35362025-01-0113877128772910.1109/ACCESS.2025.357140411007017Alpha-Backpropagation Algorithm: Probability Estimation for One-Hot and Quasi-One-Hot TeachersYasuo Matsuyama0https://orcid.org/0000-0002-0819-9240Faculty of Science and Engineering, Waseda University, Tokyo, JapanThis study proposes theories and applications of probabilistic divergences to neural network training. This theory generalizes the cross-entropy method for backpropagation to the alpha-divergence method. This new method includes the cross-entropy method as a limited case study. An advantage is the speedup during the training phase. The target architecture is the one-hot and quasi-one-hot teacher cases. Quasi-one-hotness occurs when a teacher intentionally weakens its one-hotness moderately. In addition, we devise a reverse process for quasi-one-hotness, that is, quenching. For non-one-hot elements, this process is equivalent to annealing. The new methods automatically weigh backpropagation, reflecting the nature of one-hot and quasi-one-hot teacher data. These strategies are applicable to versatile generative artificial intelligence (AI) tools. Using generic neural networks, we investigated the degree of learning speedup. The new methods present remarkable speedups, illustrations of which require logarithmic scales. This property also allows more elaborate conditional probability estimations by partitioning a neural network. Thus, the alpha-divergence method has become meritorious for accelerating versatile edge AI developments, particularly for low-cost and low-power fine-tuning.https://ieeexplore.ieee.org/document/11007017/Alpha-divergenceannealingbackpropagationcross-entropylearning speedneural network training |
| spellingShingle | Yasuo Matsuyama Alpha-Backpropagation Algorithm: Probability Estimation for One-Hot and Quasi-One-Hot Teachers IEEE Access Alpha-divergence annealing backpropagation cross-entropy learning speed neural network training |
| title | Alpha-Backpropagation Algorithm: Probability Estimation for One-Hot and Quasi-One-Hot Teachers |
| title_full | Alpha-Backpropagation Algorithm: Probability Estimation for One-Hot and Quasi-One-Hot Teachers |
| title_fullStr | Alpha-Backpropagation Algorithm: Probability Estimation for One-Hot and Quasi-One-Hot Teachers |
| title_full_unstemmed | Alpha-Backpropagation Algorithm: Probability Estimation for One-Hot and Quasi-One-Hot Teachers |
| title_short | Alpha-Backpropagation Algorithm: Probability Estimation for One-Hot and Quasi-One-Hot Teachers |
| title_sort | alpha backpropagation algorithm probability estimation for one hot and quasi one hot teachers |
| topic | Alpha-divergence annealing backpropagation cross-entropy learning speed neural network training |
| url | https://ieeexplore.ieee.org/document/11007017/ |
| work_keys_str_mv | AT yasuomatsuyama alphabackpropagationalgorithmprobabilityestimationforonehotandquasionehotteachers |