Towards Efficient Serverless MapReduce Computing on Cloud-Native Platforms
MapReduce is one of the most classic and powerful parallel computing models in the field of big data. It is still active in the big data system ecosystem and is currently evolving towards cloud-native environment. Among them, due to its elasticity and ease-to-use, Serverless computing is one of the...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Tsinghua University Press
2025-05-01
|
| Series: | Big Data Mining and Analytics |
| Subjects: | |
| Online Access: | https://www.sciopen.com/article/10.26599/BDMA.2024.9020084 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | MapReduce is one of the most classic and powerful parallel computing models in the field of big data. It is still active in the big data system ecosystem and is currently evolving towards cloud-native environment. Among them, due to its elasticity and ease-to-use, Serverless computing is one of the most promising directions of cloud-native technology. To support MapReduce big data computing capabilities in a Serverless environment can give full play to Serverless’s advantages. However, due to different underlying system architecture, three issues will be encountered when running MapReduce jobs in the Serverless environment. Firstly, the scheduling strategy is difficult to fully utilize the available resources. Secondly, reading Shuffle index data on cloud storage is inefficient and expensive. Thirdly, cloud storage Input/Output (I/O) request latency has a long tail effect. To solve these problems, this paper proposes three strategies with a MapReduce parallel processing framework in Serverless environment. Experimental results show that compared with cutting-edge systems, our approach shortens job execution time by 25.6% on average and reduces job execution costs by 17.3%. |
|---|---|
| ISSN: | 2096-0654 2097-406X |