Adaptive memory reservation strategy for heavy workloads in the Spark environment

The rise of the Internet of Things (IoT) and Industry 2.0 has spurred a growing need for extensive data computing, and Spark emerged as a promising Big Data platform, attributed to its distributed in-memory computing capabilities. However, practical heavy workloads often lead to memory bottleneck is...

Full description

Saved in:

Bibliographic Details
Main Authors:	Bohan Li, Xin He, Junyang Yu, Guanghui Wang, Yixin Song, Shunjie Pan, Hangyu Gu
Format:	Article
Language:	English
Published:	PeerJ Inc. 2024-11-01
Series:	PeerJ Computer Science
Subjects:	Adaptive memory reservation Task parallelism Storage location selection Spark environment
Online Access:	https://peerj.com/articles/cs-2460.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846166529962409984
author	Bohan Li Xin He Junyang Yu Guanghui Wang Yixin Song Shunjie Pan Hangyu Gu
author_facet	Bohan Li Xin He Junyang Yu Guanghui Wang Yixin Song Shunjie Pan Hangyu Gu
author_sort	Bohan Li
collection	DOAJ
description	The rise of the Internet of Things (IoT) and Industry 2.0 has spurred a growing need for extensive data computing, and Spark emerged as a promising Big Data platform, attributed to its distributed in-memory computing capabilities. However, practical heavy workloads often lead to memory bottleneck issues in the Spark platform. This results in resilient distributed datasets (RDD) eviction and, in extreme cases, violent memory contentions, causing a significant degradation in Spark computational efficiency. To tackle this issue, we propose an adaptive memory reservation (AMR) strategy in this article, specifically designed for heavy workloads in the Spark environment. Specifically, we model optimal task parallelism by minimizing the disparity between the number of tasks completed without blocking and the number completed in regular rounds. Optimal memory for task parallelism is determined to establish an efficient execution memory space for computational parallelism. Subsequently, through adaptive execution memory reservation and dynamic adjustments, such as compression or expansion based on task progress, the strategy ensures dynamic task parallelism in the Spark parallel computing process. Considering the cost of RDD cache location and real-time memory space usage, we select suitable storage locations for different RDD types to alleviate execution memory pressure. Finally, we conduct extensive laboratory experiments to validate the effectiveness of AMR. Results indicate that, compared to existing memory management solutions, AMR reduces the execution time by approximately 46.8%.
format	Article
id	doaj-art-1bcf5af656ba4952bf0098b309ea688d
institution	Kabale University
issn	2376-5992
language	English
publishDate	2024-11-01
publisher	PeerJ Inc.
record_format	Article
series	PeerJ Computer Science
spelling	doaj-art-1bcf5af656ba4952bf0098b309ea688d2024-11-15T15:05:23ZengPeerJ Inc.PeerJ Computer Science2376-59922024-11-0110e246010.7717/peerj-cs.2460Adaptive memory reservation strategy for heavy workloads in the Spark environmentBohan Li0Xin He1Junyang Yu2Guanghui Wang3Yixin Song4Shunjie Pan5Hangyu Gu6School of Software, Henan University, Kaifeng, Henan Province, ChinaSchool of Software, Henan University, Kaifeng, Henan Province, ChinaSchool of Software, Henan University, Kaifeng, Henan Province, ChinaSchool of Software, Henan University, Kaifeng, Henan Province, ChinaSchool of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu Province, ChinaSchool of Software, Henan University, Kaifeng, Henan Province, ChinaSchool of Software, Henan University, Kaifeng, Henan Province, ChinaThe rise of the Internet of Things (IoT) and Industry 2.0 has spurred a growing need for extensive data computing, and Spark emerged as a promising Big Data platform, attributed to its distributed in-memory computing capabilities. However, practical heavy workloads often lead to memory bottleneck issues in the Spark platform. This results in resilient distributed datasets (RDD) eviction and, in extreme cases, violent memory contentions, causing a significant degradation in Spark computational efficiency. To tackle this issue, we propose an adaptive memory reservation (AMR) strategy in this article, specifically designed for heavy workloads in the Spark environment. Specifically, we model optimal task parallelism by minimizing the disparity between the number of tasks completed without blocking and the number completed in regular rounds. Optimal memory for task parallelism is determined to establish an efficient execution memory space for computational parallelism. Subsequently, through adaptive execution memory reservation and dynamic adjustments, such as compression or expansion based on task progress, the strategy ensures dynamic task parallelism in the Spark parallel computing process. Considering the cost of RDD cache location and real-time memory space usage, we select suitable storage locations for different RDD types to alleviate execution memory pressure. Finally, we conduct extensive laboratory experiments to validate the effectiveness of AMR. Results indicate that, compared to existing memory management solutions, AMR reduces the execution time by approximately 46.8%.https://peerj.com/articles/cs-2460.pdfAdaptive memory reservationTask parallelismStorage location selectionSpark environment
spellingShingle	Bohan Li Xin He Junyang Yu Guanghui Wang Yixin Song Shunjie Pan Hangyu Gu Adaptive memory reservation strategy for heavy workloads in the Spark environment PeerJ Computer Science Adaptive memory reservation Task parallelism Storage location selection Spark environment
title	Adaptive memory reservation strategy for heavy workloads in the Spark environment
title_full	Adaptive memory reservation strategy for heavy workloads in the Spark environment
title_fullStr	Adaptive memory reservation strategy for heavy workloads in the Spark environment
title_full_unstemmed	Adaptive memory reservation strategy for heavy workloads in the Spark environment
title_short	Adaptive memory reservation strategy for heavy workloads in the Spark environment
title_sort	adaptive memory reservation strategy for heavy workloads in the spark environment
topic	Adaptive memory reservation Task parallelism Storage location selection Spark environment
url	https://peerj.com/articles/cs-2460.pdf
work_keys_str_mv	AT bohanli adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment AT xinhe adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment AT junyangyu adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment AT guanghuiwang adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment AT yixinsong adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment AT shunjiepan adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment AT hangyugu adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment

Adaptive memory reservation strategy for heavy workloads in the Spark environment

Similar Items