Parallel division clustering algorithm based on Spark framework and ASPSO

To deal with the problems that the partition clustering algorithm for processing massive data encountered problems such as large data dispersion coefficient and poor anti-interference, difficulty to determine the number of local clusters, local cluster centroids randomness, and low efficiency of loc...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yimin MAO, Dejin GAN, Liefa LIAO, Zhigang CHEN
Format:	Article
Language:	zho
Published:	Editorial Department of Journal on Communications 2022-03-01
Series:	Tongxin xuebao
Subjects:	Spark framework parallel division clustering grid division ASPSO parallel merge
Online Access:	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2022054/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841539981412139008
author	Yimin MAO Dejin GAN Liefa LIAO Zhigang CHEN
author_facet	Yimin MAO Dejin GAN Liefa LIAO Zhigang CHEN
author_sort	Yimin MAO
collection	DOAJ
description	To deal with the problems that the partition clustering algorithm for processing massive data encountered problems such as large data dispersion coefficient and poor anti-interference, difficulty to determine the number of local clusters, local cluster centroids randomness, and low efficiency of local cluster parallelization and merging, a parallel partition clustering algorithm based on Spark framework and ASPSO (PDC-SFAS PSO) was proposed.Firstly, a meshing strategy was introduced to reduce the data dispersion coefficient of the data division and improve anti-interference.Secondly, to determine the number of clusters, meshing strategy based on potential function and Gaussian function were proposed, which formed an area with different sample points as the core clusters, and obtained the number of local clusters.Then, to avoid local cluster centroids randomness, ASPSO was proposed.Finally, a local cluster merging strategy based on cluster radius and neighbor nodes was introduced to merge clusters with large similarity based on the Spark parallel computing framework, which improved the efficiency of parallel merging of local clusters.Experimental results showed that the PDC-SFASPSO algorithm has good performance in data partitioning and clustering in a big data environment, and it was suitable for parallel clustering of large-scale data sets.
format	Article
id	doaj-art-8869df6d37e14bfca93b64b4bc1c883e
institution	Kabale University
issn	1000-436X
language	zho
publishDate	2022-03-01
publisher	Editorial Department of Journal on Communications
record_format	Article
series	Tongxin xuebao
spelling	doaj-art-8869df6d37e14bfca93b64b4bc1c883e2025-01-14T06:29:10ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2022-03-014314816359393077Parallel division clustering algorithm based on Spark framework and ASPSOYimin MAODejin GANLiefa LIAOZhigang CHENTo deal with the problems that the partition clustering algorithm for processing massive data encountered problems such as large data dispersion coefficient and poor anti-interference, difficulty to determine the number of local clusters, local cluster centroids randomness, and low efficiency of local cluster parallelization and merging, a parallel partition clustering algorithm based on Spark framework and ASPSO (PDC-SFAS PSO) was proposed.Firstly, a meshing strategy was introduced to reduce the data dispersion coefficient of the data division and improve anti-interference.Secondly, to determine the number of clusters, meshing strategy based on potential function and Gaussian function were proposed, which formed an area with different sample points as the core clusters, and obtained the number of local clusters.Then, to avoid local cluster centroids randomness, ASPSO was proposed.Finally, a local cluster merging strategy based on cluster radius and neighbor nodes was introduced to merge clusters with large similarity based on the Spark parallel computing framework, which improved the efficiency of parallel merging of local clusters.Experimental results showed that the PDC-SFASPSO algorithm has good performance in data partitioning and clustering in a big data environment, and it was suitable for parallel clustering of large-scale data sets.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2022054/Spark frameworkparallel division clusteringgrid divisionASPSOparallel merge
spellingShingle	Yimin MAO Dejin GAN Liefa LIAO Zhigang CHEN Parallel division clustering algorithm based on Spark framework and ASPSO Tongxin xuebao Spark framework parallel division clustering grid division ASPSO parallel merge
title	Parallel division clustering algorithm based on Spark framework and ASPSO
title_full	Parallel division clustering algorithm based on Spark framework and ASPSO
title_fullStr	Parallel division clustering algorithm based on Spark framework and ASPSO
title_full_unstemmed	Parallel division clustering algorithm based on Spark framework and ASPSO
title_short	Parallel division clustering algorithm based on Spark framework and ASPSO
title_sort	parallel division clustering algorithm based on spark framework and aspso
topic	Spark framework parallel division clustering grid division ASPSO parallel merge
url	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2022054/
work_keys_str_mv	AT yiminmao paralleldivisionclusteringalgorithmbasedonsparkframeworkandaspso AT dejingan paralleldivisionclusteringalgorithmbasedonsparkframeworkandaspso AT liefaliao paralleldivisionclusteringalgorithmbasedonsparkframeworkandaspso AT zhigangchen paralleldivisionclusteringalgorithmbasedonsparkframeworkandaspso

Parallel division clustering algorithm based on Spark framework and ASPSO

Similar Items