CWL-PLAS: Task Workflows Assisted by Data Science Cloud Platforms

The Common Workflow Language (CWL) is a platform-independent description language for the representation of data science workflows consisting of a set of tasks that interact with each other to perform scientific analysis. The tasks can be packaged as Linux containers. On the one hand, using containe...

Full description

Saved in:
Bibliographic Details
Main Authors: Andrea Detti, Ludovico Funari, Luca Petrucci, Michele Dorazio, Arianna Mencattini, Eugenio Martinelli
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10114381/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846128654099152896
author Andrea Detti
Ludovico Funari
Luca Petrucci
Michele Dorazio
Arianna Mencattini
Eugenio Martinelli
author_facet Andrea Detti
Ludovico Funari
Luca Petrucci
Michele Dorazio
Arianna Mencattini
Eugenio Martinelli
author_sort Andrea Detti
collection DOAJ
description The Common Workflow Language (CWL) is a platform-independent description language for the representation of data science workflows consisting of a set of tasks that interact with each other to perform scientific analysis. The tasks can be packaged as Linux containers. On the one hand, using containers ensures the reproducibility and portability of workflows. Still, on the other hand, it limits each task to exploiting, at most, the resources of the host where its container runs. In this paper, we propose CWL-PLAS, an extension of CWL that allows a task to instantiate and temporarily use a supporting cloud platform for parallel computing, which is specialized for the task’s activity. In this way, tasks can leverage the resources of multiple hosts in parallel, reducing the duration of the workflow. We implemented an open-source workflow manager that supports CWL-PLAS workflows and exploits a Kubernetes back-end. We used this workflow manager to evaluate the performance of CWL-PLAS in a couple of machine learning workflows.
format Article
id doaj-art-fd0409e08c70445b9723a234c22e0d27
institution Kabale University
issn 2169-3536
language English
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-fd0409e08c70445b9723a234c22e0d272024-12-11T00:01:24ZengIEEEIEEE Access2169-35362023-01-0111440924410610.1109/ACCESS.2023.327261910114381CWL-PLAS: Task Workflows Assisted by Data Science Cloud PlatformsAndrea Detti0https://orcid.org/0000-0002-0803-1392Ludovico Funari1https://orcid.org/0000-0002-2225-2124Luca Petrucci2Michele Dorazio3https://orcid.org/0000-0003-3719-9976Arianna Mencattini4https://orcid.org/0000-0002-3753-0457Eugenio Martinelli5https://orcid.org/0000-0002-6673-2066Department of Electronic Engineering, University of Rome Tor Vergata, Rome, ItalyDepartment of Electronic Engineering, University of Rome Tor Vergata, Rome, ItalyDepartment of Electronic Engineering, University of Rome Tor Vergata, Rome, ItalyDepartment of Electronic Engineering, University of Rome Tor Vergata, Rome, ItalyDepartment of Electronic Engineering, University of Rome Tor Vergata, Rome, ItalyDepartment of Electronic Engineering, University of Rome Tor Vergata, Rome, ItalyThe Common Workflow Language (CWL) is a platform-independent description language for the representation of data science workflows consisting of a set of tasks that interact with each other to perform scientific analysis. The tasks can be packaged as Linux containers. On the one hand, using containers ensures the reproducibility and portability of workflows. Still, on the other hand, it limits each task to exploiting, at most, the resources of the host where its container runs. In this paper, we propose CWL-PLAS, an extension of CWL that allows a task to instantiate and temporarily use a supporting cloud platform for parallel computing, which is specialized for the task’s activity. In this way, tasks can leverage the resources of multiple hosts in parallel, reducing the duration of the workflow. We implemented an open-source workflow manager that supports CWL-PLAS workflows and exploits a Kubernetes back-end. We used this workflow manager to evaluate the performance of CWL-PLAS in a couple of machine learning workflows.https://ieeexplore.ieee.org/document/10114381/Common workflow languageworkflow management softwaredistributed computingcloud
spellingShingle Andrea Detti
Ludovico Funari
Luca Petrucci
Michele Dorazio
Arianna Mencattini
Eugenio Martinelli
CWL-PLAS: Task Workflows Assisted by Data Science Cloud Platforms
IEEE Access
Common workflow language
workflow management software
distributed computing
cloud
title CWL-PLAS: Task Workflows Assisted by Data Science Cloud Platforms
title_full CWL-PLAS: Task Workflows Assisted by Data Science Cloud Platforms
title_fullStr CWL-PLAS: Task Workflows Assisted by Data Science Cloud Platforms
title_full_unstemmed CWL-PLAS: Task Workflows Assisted by Data Science Cloud Platforms
title_short CWL-PLAS: Task Workflows Assisted by Data Science Cloud Platforms
title_sort cwl plas task workflows assisted by data science cloud platforms
topic Common workflow language
workflow management software
distributed computing
cloud
url https://ieeexplore.ieee.org/document/10114381/
work_keys_str_mv AT andreadetti cwlplastaskworkflowsassistedbydatasciencecloudplatforms
AT ludovicofunari cwlplastaskworkflowsassistedbydatasciencecloudplatforms
AT lucapetrucci cwlplastaskworkflowsassistedbydatasciencecloudplatforms
AT micheledorazio cwlplastaskworkflowsassistedbydatasciencecloudplatforms
AT ariannamencattini cwlplastaskworkflowsassistedbydatasciencecloudplatforms
AT eugeniomartinelli cwlplastaskworkflowsassistedbydatasciencecloudplatforms