Using low-discrepancy points for data compression in machine learning: an experimental comparison

Abstract Low-discrepancy points (also called Quasi-Monte Carlo points) are deterministically and cleverly chosen point sets in the unit cube, which provide an approximation of the uniform distribution. We explore two methods based on such low-discrepancy points to reduce large data sets in order to...

Full description

Saved in:
Bibliographic Details
Main Authors: S. Göttlich, J. Heieck, A. Neuenkirch
Format: Article
Language:English
Published: SpringerOpen 2025-01-01
Series:Journal of Mathematics in Industry
Subjects:
Online Access:https://doi.org/10.1186/s13362-024-00166-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559850861985792
author S. Göttlich
J. Heieck
A. Neuenkirch
author_facet S. Göttlich
J. Heieck
A. Neuenkirch
author_sort S. Göttlich
collection DOAJ
description Abstract Low-discrepancy points (also called Quasi-Monte Carlo points) are deterministically and cleverly chosen point sets in the unit cube, which provide an approximation of the uniform distribution. We explore two methods based on such low-discrepancy points to reduce large data sets in order to train neural networks. The first one is the method of Dick and Feischl (J Complex 67:101587, 2021), which relies on digital nets and an averaging procedure. Motivated by our experimental findings, we construct a second method, which again uses digital nets, but Voronoi clustering instead of averaging. Both methods are compared to the supercompress approach of (Stat Anal Data Min ASA Data Sci J 14:217–229, 2021), which is a variant of the K-means clustering algorithm. The comparison is done in terms of the compression error for different objective functions and the accuracy of the training of a neural network.
format Article
id doaj-art-ea1bc32183fd494cbc5a58ce9db58abd
institution Kabale University
issn 2190-5983
language English
publishDate 2025-01-01
publisher SpringerOpen
record_format Article
series Journal of Mathematics in Industry
spelling doaj-art-ea1bc32183fd494cbc5a58ce9db58abd2025-01-05T12:10:54ZengSpringerOpenJournal of Mathematics in Industry2190-59832025-01-0115112410.1186/s13362-024-00166-5Using low-discrepancy points for data compression in machine learning: an experimental comparisonS. Göttlich0J. Heieck1A. Neuenkirch2Department of Mathematics, University of MannheimDepartment of Mathematics, University of MannheimDepartment of Mathematics, University of MannheimAbstract Low-discrepancy points (also called Quasi-Monte Carlo points) are deterministically and cleverly chosen point sets in the unit cube, which provide an approximation of the uniform distribution. We explore two methods based on such low-discrepancy points to reduce large data sets in order to train neural networks. The first one is the method of Dick and Feischl (J Complex 67:101587, 2021), which relies on digital nets and an averaging procedure. Motivated by our experimental findings, we construct a second method, which again uses digital nets, but Voronoi clustering instead of averaging. Both methods are compared to the supercompress approach of (Stat Anal Data Min ASA Data Sci J 14:217–229, 2021), which is a variant of the K-means clustering algorithm. The comparison is done in terms of the compression error for different objective functions and the accuracy of the training of a neural network.https://doi.org/10.1186/s13362-024-00166-5Data reductionLow-discrepancy pointsQuasi-Monte CarloDigital netsK-means algorithmNeural networks
spellingShingle S. Göttlich
J. Heieck
A. Neuenkirch
Using low-discrepancy points for data compression in machine learning: an experimental comparison
Journal of Mathematics in Industry
Data reduction
Low-discrepancy points
Quasi-Monte Carlo
Digital nets
K-means algorithm
Neural networks
title Using low-discrepancy points for data compression in machine learning: an experimental comparison
title_full Using low-discrepancy points for data compression in machine learning: an experimental comparison
title_fullStr Using low-discrepancy points for data compression in machine learning: an experimental comparison
title_full_unstemmed Using low-discrepancy points for data compression in machine learning: an experimental comparison
title_short Using low-discrepancy points for data compression in machine learning: an experimental comparison
title_sort using low discrepancy points for data compression in machine learning an experimental comparison
topic Data reduction
Low-discrepancy points
Quasi-Monte Carlo
Digital nets
K-means algorithm
Neural networks
url https://doi.org/10.1186/s13362-024-00166-5
work_keys_str_mv AT sgottlich usinglowdiscrepancypointsfordatacompressioninmachinelearninganexperimentalcomparison
AT jheieck usinglowdiscrepancypointsfordatacompressioninmachinelearninganexperimentalcomparison
AT aneuenkirch usinglowdiscrepancypointsfordatacompressioninmachinelearninganexperimentalcomparison