CWL-PLAS: Task Workflows Assisted by Data Science Cloud Platforms
The Common Workflow Language (CWL) is a platform-independent description language for the representation of data science workflows consisting of a set of tasks that interact with each other to perform scientific analysis. The tasks can be packaged as Linux containers. On the one hand, using containe...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2023-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10114381/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846128654099152896 |
|---|---|
| author | Andrea Detti Ludovico Funari Luca Petrucci Michele Dorazio Arianna Mencattini Eugenio Martinelli |
| author_facet | Andrea Detti Ludovico Funari Luca Petrucci Michele Dorazio Arianna Mencattini Eugenio Martinelli |
| author_sort | Andrea Detti |
| collection | DOAJ |
| description | The Common Workflow Language (CWL) is a platform-independent description language for the representation of data science workflows consisting of a set of tasks that interact with each other to perform scientific analysis. The tasks can be packaged as Linux containers. On the one hand, using containers ensures the reproducibility and portability of workflows. Still, on the other hand, it limits each task to exploiting, at most, the resources of the host where its container runs. In this paper, we propose CWL-PLAS, an extension of CWL that allows a task to instantiate and temporarily use a supporting cloud platform for parallel computing, which is specialized for the task’s activity. In this way, tasks can leverage the resources of multiple hosts in parallel, reducing the duration of the workflow. We implemented an open-source workflow manager that supports CWL-PLAS workflows and exploits a Kubernetes back-end. We used this workflow manager to evaluate the performance of CWL-PLAS in a couple of machine learning workflows. |
| format | Article |
| id | doaj-art-fd0409e08c70445b9723a234c22e0d27 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2023-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-fd0409e08c70445b9723a234c22e0d272024-12-11T00:01:24ZengIEEEIEEE Access2169-35362023-01-0111440924410610.1109/ACCESS.2023.327261910114381CWL-PLAS: Task Workflows Assisted by Data Science Cloud PlatformsAndrea Detti0https://orcid.org/0000-0002-0803-1392Ludovico Funari1https://orcid.org/0000-0002-2225-2124Luca Petrucci2Michele Dorazio3https://orcid.org/0000-0003-3719-9976Arianna Mencattini4https://orcid.org/0000-0002-3753-0457Eugenio Martinelli5https://orcid.org/0000-0002-6673-2066Department of Electronic Engineering, University of Rome Tor Vergata, Rome, ItalyDepartment of Electronic Engineering, University of Rome Tor Vergata, Rome, ItalyDepartment of Electronic Engineering, University of Rome Tor Vergata, Rome, ItalyDepartment of Electronic Engineering, University of Rome Tor Vergata, Rome, ItalyDepartment of Electronic Engineering, University of Rome Tor Vergata, Rome, ItalyDepartment of Electronic Engineering, University of Rome Tor Vergata, Rome, ItalyThe Common Workflow Language (CWL) is a platform-independent description language for the representation of data science workflows consisting of a set of tasks that interact with each other to perform scientific analysis. The tasks can be packaged as Linux containers. On the one hand, using containers ensures the reproducibility and portability of workflows. Still, on the other hand, it limits each task to exploiting, at most, the resources of the host where its container runs. In this paper, we propose CWL-PLAS, an extension of CWL that allows a task to instantiate and temporarily use a supporting cloud platform for parallel computing, which is specialized for the task’s activity. In this way, tasks can leverage the resources of multiple hosts in parallel, reducing the duration of the workflow. We implemented an open-source workflow manager that supports CWL-PLAS workflows and exploits a Kubernetes back-end. We used this workflow manager to evaluate the performance of CWL-PLAS in a couple of machine learning workflows.https://ieeexplore.ieee.org/document/10114381/Common workflow languageworkflow management softwaredistributed computingcloud |
| spellingShingle | Andrea Detti Ludovico Funari Luca Petrucci Michele Dorazio Arianna Mencattini Eugenio Martinelli CWL-PLAS: Task Workflows Assisted by Data Science Cloud Platforms IEEE Access Common workflow language workflow management software distributed computing cloud |
| title | CWL-PLAS: Task Workflows Assisted by Data Science Cloud Platforms |
| title_full | CWL-PLAS: Task Workflows Assisted by Data Science Cloud Platforms |
| title_fullStr | CWL-PLAS: Task Workflows Assisted by Data Science Cloud Platforms |
| title_full_unstemmed | CWL-PLAS: Task Workflows Assisted by Data Science Cloud Platforms |
| title_short | CWL-PLAS: Task Workflows Assisted by Data Science Cloud Platforms |
| title_sort | cwl plas task workflows assisted by data science cloud platforms |
| topic | Common workflow language workflow management software distributed computing cloud |
| url | https://ieeexplore.ieee.org/document/10114381/ |
| work_keys_str_mv | AT andreadetti cwlplastaskworkflowsassistedbydatasciencecloudplatforms AT ludovicofunari cwlplastaskworkflowsassistedbydatasciencecloudplatforms AT lucapetrucci cwlplastaskworkflowsassistedbydatasciencecloudplatforms AT micheledorazio cwlplastaskworkflowsassistedbydatasciencecloudplatforms AT ariannamencattini cwlplastaskworkflowsassistedbydatasciencecloudplatforms AT eugeniomartinelli cwlplastaskworkflowsassistedbydatasciencecloudplatforms |