Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution quality

A well-known clustering problem called Density-Based Spatial Clustering of Applications with Noise~(DBSCAN) involves computing the solutions of at least one disk range query per input point, computing the connected components of a graph, and bichromatic fixed-radius nearest neighbor. MapReduce class...

Full description

Saved in:
Bibliographic Details
Main Authors: Sepideh Aghamolaei, Mohammad Ghodsi
Format: Article
Language:English
Published: University of Isfahan 2024-04-01
Series:Transactions on Combinatorics
Subjects:
Online Access:https://toc.ui.ac.ir/article_28264_25c4b7936d8b67c3489a676b9a960418.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846162392016224256
author Sepideh Aghamolaei
Mohammad Ghodsi
author_facet Sepideh Aghamolaei
Mohammad Ghodsi
author_sort Sepideh Aghamolaei
collection DOAJ
description A well-known clustering problem called Density-Based Spatial Clustering of Applications with Noise~(DBSCAN) involves computing the solutions of at least one disk range query per input point, computing the connected components of a graph, and bichromatic fixed-radius nearest neighbor. MapReduce class is a model where a sublinear number of machines, each with sublinear memory, run for a polylogarithmic number of parallel rounds. Most of these problems either require quadratic time in the sequential model or are hard to compute in a constant number of rounds in MapReduce. In the Euclidean plane, DBSCAN algorithms with near-linear time and a randomized parallel algorithm with a polylogarithmic number of rounds exist. We solve DBSCAN in the Euclidean plane in a constant number of rounds in MapReduce, assuming the minimum number of points in range queries is constant and each connected component fits inside the memory of a single machine and has a constant diameter.
format Article
id doaj-art-3dd6ea04d1d64e14b54a0f5b42327c3b
institution Kabale University
issn 2251-8657
2251-8665
language English
publishDate 2024-04-01
publisher University of Isfahan
record_format Article
series Transactions on Combinatorics
spelling doaj-art-3dd6ea04d1d64e14b54a0f5b42327c3b2024-11-20T11:09:44ZengUniversity of IsfahanTransactions on Combinatorics2251-86572251-86652024-04-0114313515610.22108/toc.2024.138377.209128264Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution qualitySepideh Aghamolaei0Mohammad Ghodsi1Department of Computer Engineering, Sharif University of Technology, Tehran, IranDepartment of Computer Engineering, Sharif University of Technology, Tehran, Iran. School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.A well-known clustering problem called Density-Based Spatial Clustering of Applications with Noise~(DBSCAN) involves computing the solutions of at least one disk range query per input point, computing the connected components of a graph, and bichromatic fixed-radius nearest neighbor. MapReduce class is a model where a sublinear number of machines, each with sublinear memory, run for a polylogarithmic number of parallel rounds. Most of these problems either require quadratic time in the sequential model or are hard to compute in a constant number of rounds in MapReduce. In the Euclidean plane, DBSCAN algorithms with near-linear time and a randomized parallel algorithm with a polylogarithmic number of rounds exist. We solve DBSCAN in the Euclidean plane in a constant number of rounds in MapReduce, assuming the minimum number of points in range queries is constant and each connected component fits inside the memory of a single machine and has a constant diameter.https://toc.ui.ac.ir/article_28264_25c4b7936d8b67c3489a676b9a960418.pdfmassively parallel algorithmsrange searchingunit disk graphnear neighborsclustering
spellingShingle Sepideh Aghamolaei
Mohammad Ghodsi
Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution quality
Transactions on Combinatorics
massively parallel algorithms
range searching
unit disk graph
near neighbors
clustering
title Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution quality
title_full Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution quality
title_fullStr Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution quality
title_full_unstemmed Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution quality
title_short Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution quality
title_sort density based clustering in mapreduce with guarantees on parallel time space and solution quality
topic massively parallel algorithms
range searching
unit disk graph
near neighbors
clustering
url https://toc.ui.ac.ir/article_28264_25c4b7936d8b67c3489a676b9a960418.pdf
work_keys_str_mv AT sepidehaghamolaei densitybasedclusteringinmapreducewithguaranteesonparalleltimespaceandsolutionquality
AT mohammadghodsi densitybasedclusteringinmapreducewithguaranteesonparalleltimespaceandsolutionquality