Adaptive memory reservation strategy for heavy workloads in the Spark environment
The rise of the Internet of Things (IoT) and Industry 2.0 has spurred a growing need for extensive data computing, and Spark emerged as a promising Big Data platform, attributed to its distributed in-memory computing capabilities. However, practical heavy workloads often lead to memory bottleneck is...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
PeerJ Inc.
2024-11-01
|
| Series: | PeerJ Computer Science |
| Subjects: | |
| Online Access: | https://peerj.com/articles/cs-2460.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846166529962409984 |
|---|---|
| author | Bohan Li Xin He Junyang Yu Guanghui Wang Yixin Song Shunjie Pan Hangyu Gu |
| author_facet | Bohan Li Xin He Junyang Yu Guanghui Wang Yixin Song Shunjie Pan Hangyu Gu |
| author_sort | Bohan Li |
| collection | DOAJ |
| description | The rise of the Internet of Things (IoT) and Industry 2.0 has spurred a growing need for extensive data computing, and Spark emerged as a promising Big Data platform, attributed to its distributed in-memory computing capabilities. However, practical heavy workloads often lead to memory bottleneck issues in the Spark platform. This results in resilient distributed datasets (RDD) eviction and, in extreme cases, violent memory contentions, causing a significant degradation in Spark computational efficiency. To tackle this issue, we propose an adaptive memory reservation (AMR) strategy in this article, specifically designed for heavy workloads in the Spark environment. Specifically, we model optimal task parallelism by minimizing the disparity between the number of tasks completed without blocking and the number completed in regular rounds. Optimal memory for task parallelism is determined to establish an efficient execution memory space for computational parallelism. Subsequently, through adaptive execution memory reservation and dynamic adjustments, such as compression or expansion based on task progress, the strategy ensures dynamic task parallelism in the Spark parallel computing process. Considering the cost of RDD cache location and real-time memory space usage, we select suitable storage locations for different RDD types to alleviate execution memory pressure. Finally, we conduct extensive laboratory experiments to validate the effectiveness of AMR. Results indicate that, compared to existing memory management solutions, AMR reduces the execution time by approximately 46.8%. |
| format | Article |
| id | doaj-art-1bcf5af656ba4952bf0098b309ea688d |
| institution | Kabale University |
| issn | 2376-5992 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | PeerJ Inc. |
| record_format | Article |
| series | PeerJ Computer Science |
| spelling | doaj-art-1bcf5af656ba4952bf0098b309ea688d2024-11-15T15:05:23ZengPeerJ Inc.PeerJ Computer Science2376-59922024-11-0110e246010.7717/peerj-cs.2460Adaptive memory reservation strategy for heavy workloads in the Spark environmentBohan Li0Xin He1Junyang Yu2Guanghui Wang3Yixin Song4Shunjie Pan5Hangyu Gu6School of Software, Henan University, Kaifeng, Henan Province, ChinaSchool of Software, Henan University, Kaifeng, Henan Province, ChinaSchool of Software, Henan University, Kaifeng, Henan Province, ChinaSchool of Software, Henan University, Kaifeng, Henan Province, ChinaSchool of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu Province, ChinaSchool of Software, Henan University, Kaifeng, Henan Province, ChinaSchool of Software, Henan University, Kaifeng, Henan Province, ChinaThe rise of the Internet of Things (IoT) and Industry 2.0 has spurred a growing need for extensive data computing, and Spark emerged as a promising Big Data platform, attributed to its distributed in-memory computing capabilities. However, practical heavy workloads often lead to memory bottleneck issues in the Spark platform. This results in resilient distributed datasets (RDD) eviction and, in extreme cases, violent memory contentions, causing a significant degradation in Spark computational efficiency. To tackle this issue, we propose an adaptive memory reservation (AMR) strategy in this article, specifically designed for heavy workloads in the Spark environment. Specifically, we model optimal task parallelism by minimizing the disparity between the number of tasks completed without blocking and the number completed in regular rounds. Optimal memory for task parallelism is determined to establish an efficient execution memory space for computational parallelism. Subsequently, through adaptive execution memory reservation and dynamic adjustments, such as compression or expansion based on task progress, the strategy ensures dynamic task parallelism in the Spark parallel computing process. Considering the cost of RDD cache location and real-time memory space usage, we select suitable storage locations for different RDD types to alleviate execution memory pressure. Finally, we conduct extensive laboratory experiments to validate the effectiveness of AMR. Results indicate that, compared to existing memory management solutions, AMR reduces the execution time by approximately 46.8%.https://peerj.com/articles/cs-2460.pdfAdaptive memory reservationTask parallelismStorage location selectionSpark environment |
| spellingShingle | Bohan Li Xin He Junyang Yu Guanghui Wang Yixin Song Shunjie Pan Hangyu Gu Adaptive memory reservation strategy for heavy workloads in the Spark environment PeerJ Computer Science Adaptive memory reservation Task parallelism Storage location selection Spark environment |
| title | Adaptive memory reservation strategy for heavy workloads in the Spark environment |
| title_full | Adaptive memory reservation strategy for heavy workloads in the Spark environment |
| title_fullStr | Adaptive memory reservation strategy for heavy workloads in the Spark environment |
| title_full_unstemmed | Adaptive memory reservation strategy for heavy workloads in the Spark environment |
| title_short | Adaptive memory reservation strategy for heavy workloads in the Spark environment |
| title_sort | adaptive memory reservation strategy for heavy workloads in the spark environment |
| topic | Adaptive memory reservation Task parallelism Storage location selection Spark environment |
| url | https://peerj.com/articles/cs-2460.pdf |
| work_keys_str_mv | AT bohanli adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment AT xinhe adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment AT junyangyu adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment AT guanghuiwang adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment AT yixinsong adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment AT shunjiepan adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment AT hangyugu adaptivememoryreservationstrategyforheavyworkloadsinthesparkenvironment |