Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution quality
A well-known clustering problem called Density-Based Spatial Clustering of Applications with Noise~(DBSCAN) involves computing the solutions of at least one disk range query per input point, computing the connected components of a graph, and bichromatic fixed-radius nearest neighbor. MapReduce class...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
University of Isfahan
2024-04-01
|
| Series: | Transactions on Combinatorics |
| Subjects: | |
| Online Access: | https://toc.ui.ac.ir/article_28264_25c4b7936d8b67c3489a676b9a960418.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846162392016224256 |
|---|---|
| author | Sepideh Aghamolaei Mohammad Ghodsi |
| author_facet | Sepideh Aghamolaei Mohammad Ghodsi |
| author_sort | Sepideh Aghamolaei |
| collection | DOAJ |
| description | A well-known clustering problem called Density-Based Spatial Clustering of Applications with Noise~(DBSCAN) involves computing the solutions of at least one disk range query per input point, computing the connected components of a graph, and bichromatic fixed-radius nearest neighbor. MapReduce class is a model where a sublinear number of machines, each with sublinear memory, run for a polylogarithmic number of parallel rounds. Most of these problems either require quadratic time in the sequential model or are hard to compute in a constant number of rounds in MapReduce. In the Euclidean plane, DBSCAN algorithms with near-linear time and a randomized parallel algorithm with a polylogarithmic number of rounds exist. We solve DBSCAN in the Euclidean plane in a constant number of rounds in MapReduce, assuming the minimum number of points in range queries is constant and each connected component fits inside the memory of a single machine and has a constant diameter. |
| format | Article |
| id | doaj-art-3dd6ea04d1d64e14b54a0f5b42327c3b |
| institution | Kabale University |
| issn | 2251-8657 2251-8665 |
| language | English |
| publishDate | 2024-04-01 |
| publisher | University of Isfahan |
| record_format | Article |
| series | Transactions on Combinatorics |
| spelling | doaj-art-3dd6ea04d1d64e14b54a0f5b42327c3b2024-11-20T11:09:44ZengUniversity of IsfahanTransactions on Combinatorics2251-86572251-86652024-04-0114313515610.22108/toc.2024.138377.209128264Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution qualitySepideh Aghamolaei0Mohammad Ghodsi1Department of Computer Engineering, Sharif University of Technology, Tehran, IranDepartment of Computer Engineering, Sharif University of Technology, Tehran, Iran. School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.A well-known clustering problem called Density-Based Spatial Clustering of Applications with Noise~(DBSCAN) involves computing the solutions of at least one disk range query per input point, computing the connected components of a graph, and bichromatic fixed-radius nearest neighbor. MapReduce class is a model where a sublinear number of machines, each with sublinear memory, run for a polylogarithmic number of parallel rounds. Most of these problems either require quadratic time in the sequential model or are hard to compute in a constant number of rounds in MapReduce. In the Euclidean plane, DBSCAN algorithms with near-linear time and a randomized parallel algorithm with a polylogarithmic number of rounds exist. We solve DBSCAN in the Euclidean plane in a constant number of rounds in MapReduce, assuming the minimum number of points in range queries is constant and each connected component fits inside the memory of a single machine and has a constant diameter.https://toc.ui.ac.ir/article_28264_25c4b7936d8b67c3489a676b9a960418.pdfmassively parallel algorithmsrange searchingunit disk graphnear neighborsclustering |
| spellingShingle | Sepideh Aghamolaei Mohammad Ghodsi Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution quality Transactions on Combinatorics massively parallel algorithms range searching unit disk graph near neighbors clustering |
| title | Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution quality |
| title_full | Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution quality |
| title_fullStr | Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution quality |
| title_full_unstemmed | Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution quality |
| title_short | Density-Based clustering in mapReduce with guarantees on parallel time, space, and solution quality |
| title_sort | density based clustering in mapreduce with guarantees on parallel time space and solution quality |
| topic | massively parallel algorithms range searching unit disk graph near neighbors clustering |
| url | https://toc.ui.ac.ir/article_28264_25c4b7936d8b67c3489a676b9a960418.pdf |
| work_keys_str_mv | AT sepidehaghamolaei densitybasedclusteringinmapreducewithguaranteesonparalleltimespaceandsolutionquality AT mohammadghodsi densitybasedclusteringinmapreducewithguaranteesonparalleltimespaceandsolutionquality |