Adaptive memory reservation strategy for heavy workloads in the Spark environment

The rise of the Internet of Things (IoT) and Industry 2.0 has spurred a growing need for extensive data computing, and Spark emerged as a promising Big Data platform, attributed to its distributed in-memory computing capabilities. However, practical heavy workloads often lead to memory bottleneck is...

Full description

Saved in:
Bibliographic Details
Main Authors: Bohan Li, Xin He, Junyang Yu, Guanghui Wang, Yixin Song, Shunjie Pan, Hangyu Gu
Format: Article
Language:English
Published: PeerJ Inc. 2024-11-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-2460.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846166529962409984
author Bohan Li
Xin He
Junyang Yu
Guanghui Wang
Yixin Song
Shunjie Pan
Hangyu Gu
author_facet Bohan Li
Xin He
Junyang Yu
Guanghui Wang
Yixin Song
Shunjie Pan
Hangyu Gu
author_sort Bohan Li
collection DOAJ
description The rise of the Internet of Things (IoT) and Industry 2.0 has spurred a growing need for extensive data computing, and Spark emerged as a promising Big Data platform, attributed to its distributed in-memory computing capabilities. However, practical heavy workloads often lead to memory bottleneck issues in the Spark platform. This results in resilient distributed datasets (RDD) eviction and, in extreme cases, violent memory contentions, causing a significant degradation in Spark computational efficiency. To tackle this issue, we propose an adaptive memory reservation (AMR) strategy in this article, specifically designed for heavy workloads in the Spark environment. Specifically, we model optimal task parallelism by minimizing the disparity between the number of tasks completed without blocking and the number completed in regular rounds. Optimal memory for task parallelism is determined to establish an efficient execution memory space for computational parallelism. Subsequently, through adaptive execution memory reservation and dynamic adjustments, such as compression or expansion based on task progress, the strategy ensures dynamic task parallelism in the Spark parallel computing process. Considering the cost of RDD cache location and real-time memory space usage, we select suitable storage locations for different RDD types to alleviate execution memory pressure. Finally, we conduct extensive laboratory experiments to validate the effectiveness of AMR. Results indicate that, compared to existing memory management solutions, AMR reduces the execution time by approximately 46.8%.
format Article
id doaj-art-1bcf5af656ba4952bf0098b309ea688d
institution Kabale University
issn 2376-5992
language English
publishDate 2024-11-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj-art-1bcf5af656ba4952bf0098b309ea688d2024-11-15T15:05:23ZengPeerJ Inc.PeerJ Computer Science2376-59922024-11-0110e246010.7717/peerj-cs.2460Adaptive memory reservation strategy for heavy workloads in the Spark environmentBohan Li0Xin He1Junyang Yu2Guanghui Wang3Yixin Song4Shunjie Pan5Hangyu Gu6School of Software, Henan University, Kaifeng, Henan Province, ChinaSchool of Software, Henan University, Kaifeng, Henan Province, ChinaSchool of Software, Henan University, Kaifeng, Henan Province, ChinaSchool of Software, Henan University, Kaifeng, Henan Province, ChinaSchool of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu Province, ChinaSchool of Software, Henan University, Kaifeng, Henan Province, ChinaSchool of Software, Henan University, Kaifeng, Henan Province, ChinaThe rise of the Internet of Things (IoT) and Industry 2.0 has spurred a growing need for extensive data computing, and Spark emerged as a promising Big Data platform, attributed to its distributed in-memory computing capabilities. However, practical heavy workloads often lead to memory bottleneck issues in the Spark platform. This results in resilient distributed datasets (RDD) eviction and, in extreme cases, violent memory contentions, causing a significant degradation in Spark computational efficiency. To tackle this issue, we propose an adaptive memory reservation (AMR) strategy in this article, specifically designed for heavy workloads in the Spark environment. Specifically, we model optimal task parallelism by minimizing the disparity between the number of tasks completed without blocking and the number completed in regular rounds. Optimal memory for task parallelism is determined to establish an efficient execution memory space for computational parallelism. Subsequently, through adaptive execution memory reservation and dynamic adjustments, such as compression or expansion based on task progress, the strategy ensures dynamic task parallelism in the Spark parallel computing process. Considering the cost of RDD cache location and real-time memory space usage, we select suitable storage locations for different RDD types to alleviate execution memory pressure. Finally, we conduct extensive laboratory experiments to validate the effectiveness of AMR. Results indicate that, compared to existing memory management solutions, AMR reduces the execution time by approximately 46.8%.https://peerj.com/articles/cs-2460.pdfAdaptive memory reservationTask parallelismStorage location selectionSpark environment
spellingShingle Bohan Li
Xin He
Junyang Yu
Guanghui Wang
Yixin Song
Shunjie Pan
Hangyu Gu
Adaptive memory reservation strategy for heavy workloads in the Spark environment
PeerJ Computer Science
Adaptive memory reservation
Task parallelism
Storage location selection
Spark environment
title Adaptive memory reservation strategy for heavy workloads in the Spark environment
title_full Adaptive memory reservation strategy for heavy workloads in the Spark environment
title_fullStr Adaptive memory reservation strategy for heavy workloads in the Spark environment
title_full_unstemmed Adaptive memory reservation strategy for heavy workloads in the Spark environment
title_short Adaptive memory reservation strategy for heavy workloads in the Spark environment
title_sort adaptive memory reservation strategy for heavy workloads in the spark environment
topic Adaptive memory reservation
Task parallelism
Storage location selection
Spark environment
url https://peerj.com/articles/cs-2460.pdf
work_keys_str_mv AT bohanli adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment
AT xinhe adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment
AT junyangyu adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment
AT guanghuiwang adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment
AT yixinsong adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment
AT shunjiepan adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment
AT hangyugu adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment