Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution quality

A well-known clustering problem called Density-Based Spatial Clustering of Applications with Noise~(DBSCAN) involves computing the solutions of at least one disk range query per input point, computing the connected components of a graph, and bichromatic fixed-radius nearest neighbor. MapReduce class...

Full description

Saved in:
Bibliographic Details
Main Authors: Sepideh Aghamolaei, Mohammad Ghodsi
Format: Article
Language:English
Published: University of Isfahan 2024-04-01
Series:Transactions on Combinatorics
Subjects:
Online Access:https://toc.ui.ac.ir/article_28264_25c4b7936d8b67c3489a676b9a960418.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A well-known clustering problem called Density-Based Spatial Clustering of Applications with Noise~(DBSCAN) involves computing the solutions of at least one disk range query per input point, computing the connected components of a graph, and bichromatic fixed-radius nearest neighbor. MapReduce class is a model where a sublinear number of machines, each with sublinear memory, run for a polylogarithmic number of parallel rounds. Most of these problems either require quadratic time in the sequential model or are hard to compute in a constant number of rounds in MapReduce. In the Euclidean plane, DBSCAN algorithms with near-linear time and a randomized parallel algorithm with a polylogarithmic number of rounds exist. We solve DBSCAN in the Euclidean plane in a constant number of rounds in MapReduce, assuming the minimum number of points in range queries is constant and each connected component fits inside the memory of a single machine and has a constant diameter.
ISSN:2251-8657
2251-8665