CUR matrix approximation through convex optimization for feature selection

The singular value decomposition (SVD) is commonly used in applications that require a low-rank matrix approximation. However, the singular vectors cannot be interpreted in terms of the original data. For applications requiring this type of interpretation, e.g., selection of important data matrix co...

Full description

Saved in:
Bibliographic Details
Main Authors: Kathryn Linehan, Radu Balan
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-08-01
Series:Frontiers in Applied Mathematics and Statistics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fams.2025.1632218/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849229130404462592
author Kathryn Linehan
Kathryn Linehan
Radu Balan
author_facet Kathryn Linehan
Kathryn Linehan
Radu Balan
author_sort Kathryn Linehan
collection DOAJ
description The singular value decomposition (SVD) is commonly used in applications that require a low-rank matrix approximation. However, the singular vectors cannot be interpreted in terms of the original data. For applications requiring this type of interpretation, e.g., selection of important data matrix columns or rows, the approximate CUR matrix factorization can be used. Work on the CUR matrix approximation has generally focused on algorithm development, theoretical guarantees, and applications. In this study, we present a novel deterministic CUR formulation and algorithm with theoretical convergence guarantees. The algorithm utilizes convex optimization, finds important columns and rows separately, and allows the user to control the number of important columns and rows selected from the original data matrix. We present numerical results and demonstrate the effectiveness of our CUR algorithm as a feature selection method on gene expression data. These results are compared to those using the SVD and other CUR algorithms as the feature selection method. Finally, we present a novel application of CUR as a feature selection method to determine discriminant proteins when clustering protein expression data in a self-organizing map (SOM), and compare the performance of multiple CUR algorithms in this application.
format Article
id doaj-art-e4512c329b7e40a7b27cc0d22e5ed5bb
institution Kabale University
issn 2297-4687
language English
publishDate 2025-08-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Applied Mathematics and Statistics
spelling doaj-art-e4512c329b7e40a7b27cc0d22e5ed5bb2025-08-22T05:45:44ZengFrontiers Media S.A.Frontiers in Applied Mathematics and Statistics2297-46872025-08-011110.3389/fams.2025.16322181632218CUR matrix approximation through convex optimization for feature selectionKathryn Linehan0Kathryn Linehan1Radu Balan2Department of Mathematics, University of Maryland, College Park, MD, United StatesResearch Computing, University of Virginia, Charlottesville, VA, United StatesDepartment of Mathematics, University of Maryland, College Park, MD, United StatesThe singular value decomposition (SVD) is commonly used in applications that require a low-rank matrix approximation. However, the singular vectors cannot be interpreted in terms of the original data. For applications requiring this type of interpretation, e.g., selection of important data matrix columns or rows, the approximate CUR matrix factorization can be used. Work on the CUR matrix approximation has generally focused on algorithm development, theoretical guarantees, and applications. In this study, we present a novel deterministic CUR formulation and algorithm with theoretical convergence guarantees. The algorithm utilizes convex optimization, finds important columns and rows separately, and allows the user to control the number of important columns and rows selected from the original data matrix. We present numerical results and demonstrate the effectiveness of our CUR algorithm as a feature selection method on gene expression data. These results are compared to those using the SVD and other CUR algorithms as the feature selection method. Finally, we present a novel application of CUR as a feature selection method to determine discriminant proteins when clustering protein expression data in a self-organizing map (SOM), and compare the performance of multiple CUR algorithms in this application.https://www.frontiersin.org/articles/10.3389/fams.2025.1632218/fullCUR matrix approximationconvex optimizationlow-rank matrix approximationfeature selectioninterpretation
spellingShingle Kathryn Linehan
Kathryn Linehan
Radu Balan
CUR matrix approximation through convex optimization for feature selection
Frontiers in Applied Mathematics and Statistics
CUR matrix approximation
convex optimization
low-rank matrix approximation
feature selection
interpretation
title CUR matrix approximation through convex optimization for feature selection
title_full CUR matrix approximation through convex optimization for feature selection
title_fullStr CUR matrix approximation through convex optimization for feature selection
title_full_unstemmed CUR matrix approximation through convex optimization for feature selection
title_short CUR matrix approximation through convex optimization for feature selection
title_sort cur matrix approximation through convex optimization for feature selection
topic CUR matrix approximation
convex optimization
low-rank matrix approximation
feature selection
interpretation
url https://www.frontiersin.org/articles/10.3389/fams.2025.1632218/full
work_keys_str_mv AT kathrynlinehan curmatrixapproximationthroughconvexoptimizationforfeatureselection
AT kathrynlinehan curmatrixapproximationthroughconvexoptimizationforfeatureselection
AT radubalan curmatrixapproximationthroughconvexoptimizationforfeatureselection