FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflow
Typical Machine Learning (ML) approaches are characterized by their iterative and exploratory nature: continuously refining and adapting not only code but also ML models to optimize the results and the performance on new data. This poses novel challenges related to keeping the trained model Findable...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Ubiquity Press
2024-12-01
|
Series: | Data Science Journal |
Subjects: | |
Online Access: | https://account.datascience.codata.org/index.php/up-j-dsj/article/view/1731 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841554972967174144 |
---|---|
author | Lincoln Sherpa Valentin Khaydarov Ralph Müller-Pfefferkorn |
author_facet | Lincoln Sherpa Valentin Khaydarov Ralph Müller-Pfefferkorn |
author_sort | Lincoln Sherpa |
collection | DOAJ |
description | Typical Machine Learning (ML) approaches are characterized by their iterative and exploratory nature: continuously refining and adapting not only code but also ML models to optimize the results and the performance on new data. This poses novel challenges related to keeping the trained model Findable, Accessible, Interoperable and Reusable (FAIR), especially for the automation of the entire machine learning lifecycle within the concept of Machine Learning Operations (MLOps). The article introduces a comprehensive integration of a data repository (based on the software Dataverse) and an ML platform (based on the MLflow framework) that enables seamless sharing and publishing of data, experiments and models, ensuring FAIRness. The presented solution is evaluated using an ML use case scenario with model training, hyper-parameter optimization, and model sharing via the data platform. |
format | Article |
id | doaj-art-df3d02a48d394612a417cdef5b5caede |
institution | Kabale University |
issn | 1683-1470 |
language | English |
publishDate | 2024-12-01 |
publisher | Ubiquity Press |
record_format | Article |
series | Data Science Journal |
spelling | doaj-art-df3d02a48d394612a417cdef5b5caede2025-01-08T07:55:16ZengUbiquity PressData Science Journal1683-14702024-12-0123555510.5334/dsj-2024-0551731FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflowLincoln Sherpa0https://orcid.org/0009-0000-3287-0295Valentin Khaydarov1Ralph Müller-Pfefferkorn2https://orcid.org/0000-0001-8719-5741Technische Universität Dresden, Center for Interdisciplinary Digital Sciences, Department Information Services and High Performance Computing, DresdenTechnische Universität Dresden, P20, DresdenTechnische Universität Dresden, Center for Interdisciplinary Digital Sciences, Department Information Services and High Performance Computing, DresdenTypical Machine Learning (ML) approaches are characterized by their iterative and exploratory nature: continuously refining and adapting not only code but also ML models to optimize the results and the performance on new data. This poses novel challenges related to keeping the trained model Findable, Accessible, Interoperable and Reusable (FAIR), especially for the automation of the entire machine learning lifecycle within the concept of Machine Learning Operations (MLOps). The article introduces a comprehensive integration of a data repository (based on the software Dataverse) and an ML platform (based on the MLflow framework) that enables seamless sharing and publishing of data, experiments and models, ensuring FAIRness. The presented solution is evaluated using an ML use case scenario with model training, hyper-parameter optimization, and model sharing via the data platform.https://account.datascience.codata.org/index.php/up-j-dsj/article/view/1731machine learning (ml)research data managementfair data principlesfair datadatabase management systemcompeting interests |
spellingShingle | Lincoln Sherpa Valentin Khaydarov Ralph Müller-Pfefferkorn FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflow Data Science Journal machine learning (ml) research data management fair data principles fair data database management system competing interests |
title | FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflow |
title_full | FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflow |
title_fullStr | FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflow |
title_full_unstemmed | FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflow |
title_short | FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflow |
title_sort | fairness along the machine learning lifecycle using dataverse in combination with mlflow |
topic | machine learning (ml) research data management fair data principles fair data database management system competing interests |
url | https://account.datascience.codata.org/index.php/up-j-dsj/article/view/1731 |
work_keys_str_mv | AT lincolnsherpa fairnessalongthemachinelearninglifecycleusingdataverseincombinationwithmlflow AT valentinkhaydarov fairnessalongthemachinelearninglifecycleusingdataverseincombinationwithmlflow AT ralphmullerpfefferkorn fairnessalongthemachinelearninglifecycleusingdataverseincombinationwithmlflow |