Using low-discrepancy points for data compression in machine learning: an experimental comparison
Abstract Low-discrepancy points (also called Quasi-Monte Carlo points) are deterministically and cleverly chosen point sets in the unit cube, which provide an approximation of the uniform distribution. We explore two methods based on such low-discrepancy points to reduce large data sets in order to...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2025-01-01
|
Series: | Journal of Mathematics in Industry |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13362-024-00166-5 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841559850861985792 |
---|---|
author | S. Göttlich J. Heieck A. Neuenkirch |
author_facet | S. Göttlich J. Heieck A. Neuenkirch |
author_sort | S. Göttlich |
collection | DOAJ |
description | Abstract Low-discrepancy points (also called Quasi-Monte Carlo points) are deterministically and cleverly chosen point sets in the unit cube, which provide an approximation of the uniform distribution. We explore two methods based on such low-discrepancy points to reduce large data sets in order to train neural networks. The first one is the method of Dick and Feischl (J Complex 67:101587, 2021), which relies on digital nets and an averaging procedure. Motivated by our experimental findings, we construct a second method, which again uses digital nets, but Voronoi clustering instead of averaging. Both methods are compared to the supercompress approach of (Stat Anal Data Min ASA Data Sci J 14:217–229, 2021), which is a variant of the K-means clustering algorithm. The comparison is done in terms of the compression error for different objective functions and the accuracy of the training of a neural network. |
format | Article |
id | doaj-art-ea1bc32183fd494cbc5a58ce9db58abd |
institution | Kabale University |
issn | 2190-5983 |
language | English |
publishDate | 2025-01-01 |
publisher | SpringerOpen |
record_format | Article |
series | Journal of Mathematics in Industry |
spelling | doaj-art-ea1bc32183fd494cbc5a58ce9db58abd2025-01-05T12:10:54ZengSpringerOpenJournal of Mathematics in Industry2190-59832025-01-0115112410.1186/s13362-024-00166-5Using low-discrepancy points for data compression in machine learning: an experimental comparisonS. Göttlich0J. Heieck1A. Neuenkirch2Department of Mathematics, University of MannheimDepartment of Mathematics, University of MannheimDepartment of Mathematics, University of MannheimAbstract Low-discrepancy points (also called Quasi-Monte Carlo points) are deterministically and cleverly chosen point sets in the unit cube, which provide an approximation of the uniform distribution. We explore two methods based on such low-discrepancy points to reduce large data sets in order to train neural networks. The first one is the method of Dick and Feischl (J Complex 67:101587, 2021), which relies on digital nets and an averaging procedure. Motivated by our experimental findings, we construct a second method, which again uses digital nets, but Voronoi clustering instead of averaging. Both methods are compared to the supercompress approach of (Stat Anal Data Min ASA Data Sci J 14:217–229, 2021), which is a variant of the K-means clustering algorithm. The comparison is done in terms of the compression error for different objective functions and the accuracy of the training of a neural network.https://doi.org/10.1186/s13362-024-00166-5Data reductionLow-discrepancy pointsQuasi-Monte CarloDigital netsK-means algorithmNeural networks |
spellingShingle | S. Göttlich J. Heieck A. Neuenkirch Using low-discrepancy points for data compression in machine learning: an experimental comparison Journal of Mathematics in Industry Data reduction Low-discrepancy points Quasi-Monte Carlo Digital nets K-means algorithm Neural networks |
title | Using low-discrepancy points for data compression in machine learning: an experimental comparison |
title_full | Using low-discrepancy points for data compression in machine learning: an experimental comparison |
title_fullStr | Using low-discrepancy points for data compression in machine learning: an experimental comparison |
title_full_unstemmed | Using low-discrepancy points for data compression in machine learning: an experimental comparison |
title_short | Using low-discrepancy points for data compression in machine learning: an experimental comparison |
title_sort | using low discrepancy points for data compression in machine learning an experimental comparison |
topic | Data reduction Low-discrepancy points Quasi-Monte Carlo Digital nets K-means algorithm Neural networks |
url | https://doi.org/10.1186/s13362-024-00166-5 |
work_keys_str_mv | AT sgottlich usinglowdiscrepancypointsfordatacompressioninmachinelearninganexperimentalcomparison AT jheieck usinglowdiscrepancypointsfordatacompressioninmachinelearninganexperimentalcomparison AT aneuenkirch usinglowdiscrepancypointsfordatacompressioninmachinelearninganexperimentalcomparison |