Constructing a metadata knowledge graph as an atlas for demystifying AI pipeline optimization

The emergence of advanced artificial intelligence (AI) models has driven the development of frameworks and approaches that focus on automating model training and hyperparameter tuning of end-to-end AI pipelines. However, other crucial stages of these pipelines such as dataset selection, feature engi...

Full description

Saved in:

Bibliographic Details
Main Authors:	Revathy Venkataramanan, Aalap Tripathy, Tarun Kumar, Sergey Serebryakov, Annmary Justine, Arpit Shah, Suparna Bhattacharya, Martin Foltin, Paolo Faraboschi, Kaushik Roy, Amit Sheth
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2025-01-01
Series:	Frontiers in Big Data
Subjects:	AI pipeline metadata graph learning graph recommendation AIMKG metadata knowledge graphs AI pipeline optimization
Online Access:	https://www.frontiersin.org/articles/10.3389/fdata.2024.1476506/full
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841556030315560960
author	Revathy Venkataramanan Revathy Venkataramanan Aalap Tripathy Tarun Kumar Sergey Serebryakov Annmary Justine Arpit Shah Suparna Bhattacharya Martin Foltin Paolo Faraboschi Kaushik Roy Amit Sheth
author_facet	Revathy Venkataramanan Revathy Venkataramanan Aalap Tripathy Tarun Kumar Sergey Serebryakov Annmary Justine Arpit Shah Suparna Bhattacharya Martin Foltin Paolo Faraboschi Kaushik Roy Amit Sheth
author_sort	Revathy Venkataramanan
collection	DOAJ
description	The emergence of advanced artificial intelligence (AI) models has driven the development of frameworks and approaches that focus on automating model training and hyperparameter tuning of end-to-end AI pipelines. However, other crucial stages of these pipelines such as dataset selection, feature engineering, and model optimization for deployment have received less attention. Improving efficiency of end-to-end AI pipelines requires metadata of past executions of AI pipelines and all their stages. Regenerating metadata history by re-executing existing AI pipelines is computationally challenging and impractical. To address this issue, we propose to source AI pipeline metadata from open-source platforms such as Papers-with-Code, OpenML, and Hugging Face. However, integrating and unifying the varying terminologies and data formats from these diverse sources is a challenge. In this study, we present a solution by introducing Common Metadata Ontology (CMO) which is used to construct an extensive AI Pipeline Metadata Knowledge Graph (AIMKG) consisting of 1.6 million pipelines. Through semantic enhancements, the pipeline metadata in AIMKG is also enriched for downstream tasks such as search and recommendation of AI pipelines. We perform quantitative and qualitative evaluations on AIMKG to search and recommend relevant pipelines to user query. For quantitative evaluation, we propose a custom aggregation model that outperforms other baselines by achieving a retrieval accuracy (R@1) of 76.3%. Our qualitative analysis shows that AIMKG-based recommender retrieved relevant pipelines in 78% of test cases compared to the state-of-the-art MLSchema-based recommender which retrieved relevant responses in 51% of the cases. AIMKG serves as an atlas for navigating the evolving AI landscape, providing practitioners with a comprehensive factsheet for their applications. It guides AI pipeline optimization, offers insights and recommendations for improving AI pipelines, and serves as a foundation for data mining and analysis of evolving AI workflows.
format	Article
id	doaj-art-3ef3d419efaa4b87a847f8c48a633303
institution	Kabale University
issn	2624-909X
language	English
publishDate	2025-01-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Big Data
spelling	doaj-art-3ef3d419efaa4b87a847f8c48a6333032025-01-07T14:48:28ZengFrontiers Media S.A.Frontiers in Big Data2624-909X2025-01-01710.3389/fdata.2024.14765061476506Constructing a metadata knowledge graph as an atlas for demystifying AI pipeline optimizationRevathy Venkataramanan0Revathy Venkataramanan1Aalap Tripathy2Tarun Kumar3Sergey Serebryakov4Annmary Justine5Arpit Shah6Suparna Bhattacharya7Martin Foltin8Paolo Faraboschi9Kaushik Roy10Amit Sheth11AI Institute, University of South Carolina, Columbia, SC, United StatesHewlett Packard Enterprise Labs, Houston, TX, United StatesHewlett Packard Enterprise Labs, Houston, TX, United StatesHewlett Packard Enterprise Labs, Houston, TX, United StatesHewlett Packard Enterprise Labs, Houston, TX, United StatesHewlett Packard Enterprise Labs, Houston, TX, United StatesHewlett Packard Enterprise Labs, Houston, TX, United StatesHewlett Packard Enterprise Labs, Houston, TX, United StatesHewlett Packard Enterprise Labs, Houston, TX, United StatesHewlett Packard Enterprise Labs, Houston, TX, United StatesAI Institute, University of South Carolina, Columbia, SC, United StatesAI Institute, University of South Carolina, Columbia, SC, United StatesThe emergence of advanced artificial intelligence (AI) models has driven the development of frameworks and approaches that focus on automating model training and hyperparameter tuning of end-to-end AI pipelines. However, other crucial stages of these pipelines such as dataset selection, feature engineering, and model optimization for deployment have received less attention. Improving efficiency of end-to-end AI pipelines requires metadata of past executions of AI pipelines and all their stages. Regenerating metadata history by re-executing existing AI pipelines is computationally challenging and impractical. To address this issue, we propose to source AI pipeline metadata from open-source platforms such as Papers-with-Code, OpenML, and Hugging Face. However, integrating and unifying the varying terminologies and data formats from these diverse sources is a challenge. In this study, we present a solution by introducing Common Metadata Ontology (CMO) which is used to construct an extensive AI Pipeline Metadata Knowledge Graph (AIMKG) consisting of 1.6 million pipelines. Through semantic enhancements, the pipeline metadata in AIMKG is also enriched for downstream tasks such as search and recommendation of AI pipelines. We perform quantitative and qualitative evaluations on AIMKG to search and recommend relevant pipelines to user query. For quantitative evaluation, we propose a custom aggregation model that outperforms other baselines by achieving a retrieval accuracy (R@1) of 76.3%. Our qualitative analysis shows that AIMKG-based recommender retrieved relevant pipelines in 78% of test cases compared to the state-of-the-art MLSchema-based recommender which retrieved relevant responses in 51% of the cases. AIMKG serves as an atlas for navigating the evolving AI landscape, providing practitioners with a comprehensive factsheet for their applications. It guides AI pipeline optimization, offers insights and recommendations for improving AI pipelines, and serves as a foundation for data mining and analysis of evolving AI workflows.https://www.frontiersin.org/articles/10.3389/fdata.2024.1476506/fullAI pipeline metadatagraph learninggraph recommendationAIMKGmetadata knowledge graphsAI pipeline optimization
spellingShingle	Revathy Venkataramanan Revathy Venkataramanan Aalap Tripathy Tarun Kumar Sergey Serebryakov Annmary Justine Arpit Shah Suparna Bhattacharya Martin Foltin Paolo Faraboschi Kaushik Roy Amit Sheth Constructing a metadata knowledge graph as an atlas for demystifying AI pipeline optimization Frontiers in Big Data AI pipeline metadata graph learning graph recommendation AIMKG metadata knowledge graphs AI pipeline optimization
title	Constructing a metadata knowledge graph as an atlas for demystifying AI pipeline optimization
title_full	Constructing a metadata knowledge graph as an atlas for demystifying AI pipeline optimization
title_fullStr	Constructing a metadata knowledge graph as an atlas for demystifying AI pipeline optimization
title_full_unstemmed	Constructing a metadata knowledge graph as an atlas for demystifying AI pipeline optimization
title_short	Constructing a metadata knowledge graph as an atlas for demystifying AI pipeline optimization
title_sort	constructing a metadata knowledge graph as an atlas for demystifying ai pipeline optimization
topic	AI pipeline metadata graph learning graph recommendation AIMKG metadata knowledge graphs AI pipeline optimization
url	https://www.frontiersin.org/articles/10.3389/fdata.2024.1476506/full
work_keys_str_mv	AT revathyvenkataramanan constructingametadataknowledgegraphasanatlasfordemystifyingaipipelineoptimization AT revathyvenkataramanan constructingametadataknowledgegraphasanatlasfordemystifyingaipipelineoptimization AT aalaptripathy constructingametadataknowledgegraphasanatlasfordemystifyingaipipelineoptimization AT tarunkumar constructingametadataknowledgegraphasanatlasfordemystifyingaipipelineoptimization AT sergeyserebryakov constructingametadataknowledgegraphasanatlasfordemystifyingaipipelineoptimization AT annmaryjustine constructingametadataknowledgegraphasanatlasfordemystifyingaipipelineoptimization AT arpitshah constructingametadataknowledgegraphasanatlasfordemystifyingaipipelineoptimization AT suparnabhattacharya constructingametadataknowledgegraphasanatlasfordemystifyingaipipelineoptimization AT martinfoltin constructingametadataknowledgegraphasanatlasfordemystifyingaipipelineoptimization AT paolofaraboschi constructingametadataknowledgegraphasanatlasfordemystifyingaipipelineoptimization AT kaushikroy constructingametadataknowledgegraphasanatlasfordemystifyingaipipelineoptimization AT amitsheth constructingametadataknowledgegraphasanatlasfordemystifyingaipipelineoptimization

Constructing a metadata knowledge graph as an atlas for demystifying AI pipeline optimization

Similar Items