Using low-discrepancy points for data compression in machine learning: an experimental comparison

Abstract Low-discrepancy points (also called Quasi-Monte Carlo points) are deterministically and cleverly chosen point sets in the unit cube, which provide an approximation of the uniform distribution. We explore two methods based on such low-discrepancy points to reduce large data sets in order to...

Full description

Saved in:

Bibliographic Details
Main Authors:	S. Göttlich, J. Heieck, A. Neuenkirch
Format:	Article
Language:	English
Published:	SpringerOpen 2025-01-01
Series:	Journal of Mathematics in Industry
Subjects:	Data reduction Low-discrepancy points Quasi-Monte Carlo Digital nets K-means algorithm Neural networks
Online Access:	https://doi.org/10.1186/s13362-024-00166-5
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841559850861985792
author	S. Göttlich J. Heieck A. Neuenkirch
author_facet	S. Göttlich J. Heieck A. Neuenkirch
author_sort	S. Göttlich
collection	DOAJ
description	Abstract Low-discrepancy points (also called Quasi-Monte Carlo points) are deterministically and cleverly chosen point sets in the unit cube, which provide an approximation of the uniform distribution. We explore two methods based on such low-discrepancy points to reduce large data sets in order to train neural networks. The first one is the method of Dick and Feischl (J Complex 67:101587, 2021), which relies on digital nets and an averaging procedure. Motivated by our experimental findings, we construct a second method, which again uses digital nets, but Voronoi clustering instead of averaging. Both methods are compared to the supercompress approach of (Stat Anal Data Min ASA Data Sci J 14:217–229, 2021), which is a variant of the K-means clustering algorithm. The comparison is done in terms of the compression error for different objective functions and the accuracy of the training of a neural network.
format	Article
id	doaj-art-ea1bc32183fd494cbc5a58ce9db58abd
institution	Kabale University
issn	2190-5983
language	English
publishDate	2025-01-01
publisher	SpringerOpen
record_format	Article
series	Journal of Mathematics in Industry
spelling	doaj-art-ea1bc32183fd494cbc5a58ce9db58abd2025-01-05T12:10:54ZengSpringerOpenJournal of Mathematics in Industry2190-59832025-01-0115112410.1186/s13362-024-00166-5Using low-discrepancy points for data compression in machine learning: an experimental comparisonS. Göttlich0J. Heieck1A. Neuenkirch2Department of Mathematics, University of MannheimDepartment of Mathematics, University of MannheimDepartment of Mathematics, University of MannheimAbstract Low-discrepancy points (also called Quasi-Monte Carlo points) are deterministically and cleverly chosen point sets in the unit cube, which provide an approximation of the uniform distribution. We explore two methods based on such low-discrepancy points to reduce large data sets in order to train neural networks. The first one is the method of Dick and Feischl (J Complex 67:101587, 2021), which relies on digital nets and an averaging procedure. Motivated by our experimental findings, we construct a second method, which again uses digital nets, but Voronoi clustering instead of averaging. Both methods are compared to the supercompress approach of (Stat Anal Data Min ASA Data Sci J 14:217–229, 2021), which is a variant of the K-means clustering algorithm. The comparison is done in terms of the compression error for different objective functions and the accuracy of the training of a neural network.https://doi.org/10.1186/s13362-024-00166-5Data reductionLow-discrepancy pointsQuasi-Monte CarloDigital netsK-means algorithmNeural networks
spellingShingle	S. Göttlich J. Heieck A. Neuenkirch Using low-discrepancy points for data compression in machine learning: an experimental comparison Journal of Mathematics in Industry Data reduction Low-discrepancy points Quasi-Monte Carlo Digital nets K-means algorithm Neural networks
title	Using low-discrepancy points for data compression in machine learning: an experimental comparison
title_full	Using low-discrepancy points for data compression in machine learning: an experimental comparison
title_fullStr	Using low-discrepancy points for data compression in machine learning: an experimental comparison
title_full_unstemmed	Using low-discrepancy points for data compression in machine learning: an experimental comparison
title_short	Using low-discrepancy points for data compression in machine learning: an experimental comparison
title_sort	using low discrepancy points for data compression in machine learning an experimental comparison
topic	Data reduction Low-discrepancy points Quasi-Monte Carlo Digital nets K-means algorithm Neural networks
url	https://doi.org/10.1186/s13362-024-00166-5
work_keys_str_mv	AT sgottlich usinglowdiscrepancypointsfordatacompressioninmachinelearninganexperimentalcomparison AT jheieck usinglowdiscrepancypointsfordatacompressioninmachinelearninganexperimentalcomparison AT aneuenkirch usinglowdiscrepancypointsfordatacompressioninmachinelearninganexperimentalcomparison

Using low-discrepancy points for data compression in machine learning: an experimental comparison

Similar Items