Using low-discrepancy points for data compression in machine learning: an experimental comparison
Abstract Low-discrepancy points (also called Quasi-Monte Carlo points) are deterministically and cleverly chosen point sets in the unit cube, which provide an approximation of the uniform distribution. We explore two methods based on such low-discrepancy points to reduce large data sets in order to...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2025-01-01
|
Series: | Journal of Mathematics in Industry |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13362-024-00166-5 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Abstract Low-discrepancy points (also called Quasi-Monte Carlo points) are deterministically and cleverly chosen point sets in the unit cube, which provide an approximation of the uniform distribution. We explore two methods based on such low-discrepancy points to reduce large data sets in order to train neural networks. The first one is the method of Dick and Feischl (J Complex 67:101587, 2021), which relies on digital nets and an averaging procedure. Motivated by our experimental findings, we construct a second method, which again uses digital nets, but Voronoi clustering instead of averaging. Both methods are compared to the supercompress approach of (Stat Anal Data Min ASA Data Sci J 14:217–229, 2021), which is a variant of the K-means clustering algorithm. The comparison is done in terms of the compression error for different objective functions and the accuracy of the training of a neural network. |
---|---|
ISSN: | 2190-5983 |