Towards Efficient Serverless MapReduce Computing on Cloud-Native Platforms

MapReduce is one of the most classic and powerful parallel computing models in the field of big data. It is still active in the big data system ecosystem and is currently evolving towards cloud-native environment. Among them, due to its elasticity and ease-to-use, Serverless computing is one of the...

Full description

Saved in:
Bibliographic Details
Main Authors: Xu Huang, Rong Gu, Yihua Huang
Format: Article
Language:English
Published: Tsinghua University Press 2025-05-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2024.9020084
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:MapReduce is one of the most classic and powerful parallel computing models in the field of big data. It is still active in the big data system ecosystem and is currently evolving towards cloud-native environment. Among them, due to its elasticity and ease-to-use, Serverless computing is one of the most promising directions of cloud-native technology. To support MapReduce big data computing capabilities in a Serverless environment can give full play to Serverless’s advantages. However, due to different underlying system architecture, three issues will be encountered when running MapReduce jobs in the Serverless environment. Firstly, the scheduling strategy is difficult to fully utilize the available resources. Secondly, reading Shuffle index data on cloud storage is inefficient and expensive. Thirdly, cloud storage Input/Output (I/O) request latency has a long tail effect. To solve these problems, this paper proposes three strategies with a MapReduce parallel processing framework in Serverless environment. Experimental results show that compared with cutting-edge systems, our approach shortens job execution time by 25.6% on average and reduces job execution costs by 17.3%.
ISSN:2096-0654
2097-406X