FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflow

Typical Machine Learning (ML) approaches are characterized by their iterative and exploratory nature: continuously refining and adapting not only code but also ML models to optimize the results and the performance on new data. This poses novel challenges related to keeping the trained model Findable...

Full description

Saved in:
Bibliographic Details
Main Authors: Lincoln Sherpa, Valentin Khaydarov, Ralph Müller-Pfefferkorn
Format: Article
Language:English
Published: Ubiquity Press 2024-12-01
Series:Data Science Journal
Subjects:
Online Access:https://account.datascience.codata.org/index.php/up-j-dsj/article/view/1731
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841554972967174144
author Lincoln Sherpa
Valentin Khaydarov
Ralph Müller-Pfefferkorn
author_facet Lincoln Sherpa
Valentin Khaydarov
Ralph Müller-Pfefferkorn
author_sort Lincoln Sherpa
collection DOAJ
description Typical Machine Learning (ML) approaches are characterized by their iterative and exploratory nature: continuously refining and adapting not only code but also ML models to optimize the results and the performance on new data. This poses novel challenges related to keeping the trained model Findable, Accessible, Interoperable and Reusable (FAIR), especially for the automation of the entire machine learning lifecycle within the concept of Machine Learning Operations (MLOps). The article introduces a comprehensive integration of a data repository (based on the software Dataverse) and an ML platform (based on the MLflow framework) that enables seamless sharing and publishing of data, experiments and models, ensuring FAIRness. The presented solution is evaluated using an ML use case scenario with model training, hyper-parameter optimization, and model sharing via the data platform.
format Article
id doaj-art-df3d02a48d394612a417cdef5b5caede
institution Kabale University
issn 1683-1470
language English
publishDate 2024-12-01
publisher Ubiquity Press
record_format Article
series Data Science Journal
spelling doaj-art-df3d02a48d394612a417cdef5b5caede2025-01-08T07:55:16ZengUbiquity PressData Science Journal1683-14702024-12-0123555510.5334/dsj-2024-0551731FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflowLincoln Sherpa0https://orcid.org/0009-0000-3287-0295Valentin Khaydarov1Ralph Müller-Pfefferkorn2https://orcid.org/0000-0001-8719-5741Technische Universität Dresden, Center for Interdisciplinary Digital Sciences, Department Information Services and High Performance Computing, DresdenTechnische Universität Dresden, P20, DresdenTechnische Universität Dresden, Center for Interdisciplinary Digital Sciences, Department Information Services and High Performance Computing, DresdenTypical Machine Learning (ML) approaches are characterized by their iterative and exploratory nature: continuously refining and adapting not only code but also ML models to optimize the results and the performance on new data. This poses novel challenges related to keeping the trained model Findable, Accessible, Interoperable and Reusable (FAIR), especially for the automation of the entire machine learning lifecycle within the concept of Machine Learning Operations (MLOps). The article introduces a comprehensive integration of a data repository (based on the software Dataverse) and an ML platform (based on the MLflow framework) that enables seamless sharing and publishing of data, experiments and models, ensuring FAIRness. The presented solution is evaluated using an ML use case scenario with model training, hyper-parameter optimization, and model sharing via the data platform.https://account.datascience.codata.org/index.php/up-j-dsj/article/view/1731machine learning (ml)research data managementfair data principlesfair datadatabase management systemcompeting interests
spellingShingle Lincoln Sherpa
Valentin Khaydarov
Ralph Müller-Pfefferkorn
FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflow
Data Science Journal
machine learning (ml)
research data management
fair data principles
fair data
database management system
competing interests
title FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflow
title_full FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflow
title_fullStr FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflow
title_full_unstemmed FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflow
title_short FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflow
title_sort fairness along the machine learning lifecycle using dataverse in combination with mlflow
topic machine learning (ml)
research data management
fair data principles
fair data
database management system
competing interests
url https://account.datascience.codata.org/index.php/up-j-dsj/article/view/1731
work_keys_str_mv AT lincolnsherpa fairnessalongthemachinelearninglifecycleusingdataverseincombinationwithmlflow
AT valentinkhaydarov fairnessalongthemachinelearninglifecycleusingdataverseincombinationwithmlflow
AT ralphmullerpfefferkorn fairnessalongthemachinelearninglifecycleusingdataverseincombinationwithmlflow