MLGL: An R Package Implementing Correlated Variable Selection by Hierarchical Clustering and Group-Lasso

The R package MLGL, standing for multi-layer group-Lasso, implements a new procedure of variable selection in the context of redundancy between explanatory variables, which holds true with high-dimensional data. A sparsity assumption is made that postulates that only a few variables are relevant fo...

Full description

Saved in:
Bibliographic Details
Main Authors: Quentin Grimonprez, Samuel Blanck, Alain Celisse, Guillemette Marot
Format: Article
Language:English
Published: Foundation for Open Access Statistics 2023-03-01
Series:Journal of Statistical Software
Subjects:
Online Access:https://www.jstatsoft.org/index.php/jss/article/view/3539
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846101700418469888
author Quentin Grimonprez
Samuel Blanck
Alain Celisse
Guillemette Marot
author_facet Quentin Grimonprez
Samuel Blanck
Alain Celisse
Guillemette Marot
author_sort Quentin Grimonprez
collection DOAJ
description The R package MLGL, standing for multi-layer group-Lasso, implements a new procedure of variable selection in the context of redundancy between explanatory variables, which holds true with high-dimensional data. A sparsity assumption is made that postulates that only a few variables are relevant for predicting the response variable. In this context, the performance of classical Lasso-based approaches strongly deteriorates as the redundancy increases. The proposed approach combines variables aggregation and selection in order to improve interpretability and performance. First, a hierarchical clustering procedure provides at each level a partition of the variables into groups. Then, the set of groups of variables from the different levels of the hierarchy is given as input to group-Lasso, with weights adapted to the structure of the hierarchy. At this step, group-Lasso outputs sets of candidate groups of variables for each value of the regularization parameter. The versatility offered by package MLGL to choose groups at different levels of the hierarchy a priori induces a high computational complexity. MLGL, however, exploits the structure of the hierarchy and the weights used in group-Lasso to greatly reduce the final time cost. The final choice of the regularization parameter - and therefore the final choice of groups - is made by a multiple hierarchical testing procedure.
format Article
id doaj-art-69f0ec05ef5e4bd9a63b3db029020104
institution Kabale University
issn 1548-7660
language English
publishDate 2023-03-01
publisher Foundation for Open Access Statistics
record_format Article
series Journal of Statistical Software
spelling doaj-art-69f0ec05ef5e4bd9a63b3db0290201042024-12-29T00:12:52ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602023-03-01106110.18637/jss.v106.i033386MLGL: An R Package Implementing Correlated Variable Selection by Hierarchical Clustering and Group-LassoQuentin Grimonprez0Samuel Blanck1Alain Celisse2Guillemette Marot3Inria Lille-Nord EuropeUniversité de LilleUniversité Paris 1Université de Lille The R package MLGL, standing for multi-layer group-Lasso, implements a new procedure of variable selection in the context of redundancy between explanatory variables, which holds true with high-dimensional data. A sparsity assumption is made that postulates that only a few variables are relevant for predicting the response variable. In this context, the performance of classical Lasso-based approaches strongly deteriorates as the redundancy increases. The proposed approach combines variables aggregation and selection in order to improve interpretability and performance. First, a hierarchical clustering procedure provides at each level a partition of the variables into groups. Then, the set of groups of variables from the different levels of the hierarchy is given as input to group-Lasso, with weights adapted to the structure of the hierarchy. At this step, group-Lasso outputs sets of candidate groups of variables for each value of the regularization parameter. The versatility offered by package MLGL to choose groups at different levels of the hierarchy a priori induces a high computational complexity. MLGL, however, exploits the structure of the hierarchy and the weights used in group-Lasso to greatly reduce the final time cost. The final choice of the regularization parameter - and therefore the final choice of groups - is made by a multiple hierarchical testing procedure. https://www.jstatsoft.org/index.php/jss/article/view/3539penalized regressioncorrelated variableshierarchical clusteringgroup selectionR
spellingShingle Quentin Grimonprez
Samuel Blanck
Alain Celisse
Guillemette Marot
MLGL: An R Package Implementing Correlated Variable Selection by Hierarchical Clustering and Group-Lasso
Journal of Statistical Software
penalized regression
correlated variables
hierarchical clustering
group selection
R
title MLGL: An R Package Implementing Correlated Variable Selection by Hierarchical Clustering and Group-Lasso
title_full MLGL: An R Package Implementing Correlated Variable Selection by Hierarchical Clustering and Group-Lasso
title_fullStr MLGL: An R Package Implementing Correlated Variable Selection by Hierarchical Clustering and Group-Lasso
title_full_unstemmed MLGL: An R Package Implementing Correlated Variable Selection by Hierarchical Clustering and Group-Lasso
title_short MLGL: An R Package Implementing Correlated Variable Selection by Hierarchical Clustering and Group-Lasso
title_sort mlgl an r package implementing correlated variable selection by hierarchical clustering and group lasso
topic penalized regression
correlated variables
hierarchical clustering
group selection
R
url https://www.jstatsoft.org/index.php/jss/article/view/3539
work_keys_str_mv AT quentingrimonprez mlglanrpackageimplementingcorrelatedvariableselectionbyhierarchicalclusteringandgrouplasso
AT samuelblanck mlglanrpackageimplementingcorrelatedvariableselectionbyhierarchicalclusteringandgrouplasso
AT alaincelisse mlglanrpackageimplementingcorrelatedvariableselectionbyhierarchicalclusteringandgrouplasso
AT guillemettemarot mlglanrpackageimplementingcorrelatedvariableselectionbyhierarchicalclusteringandgrouplasso