Sinh-Cosh Optimization-Based Efficient Clustering for Big Data Applications

Data clustering is a pivotal aspect of data mining, attracting significant attention in the exploration of cluster analysis through optimization algorithms. However, challenges arise in the application of optimization-based techniques, stemming from the non-linear nature of the objective function an...

Full description

Saved in:
Bibliographic Details
Main Authors: Lahbib Khrissi, Mohammed Es-Sabry, Nabil El Akkad, Hassan Satori, Saad Aldosary, Walid El-Shafai
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10804793/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data clustering is a pivotal aspect of data mining, attracting significant attention in the exploration of cluster analysis through optimization algorithms. However, challenges arise in the application of optimization-based techniques, stemming from the non-linear nature of the objective function and the complexities inherent in the search space. Clustering, serving as an unsupervised data analysis technique, is employed to discern homogeneous groups of objects by evaluating the values of their attributes. To overcome the above-mentioned drawbacks, we have proposed in this article a new metaheuristic based on a mathematical model called Sinh Cosh Optimizer (SCHO). This optimizer is based on the geometric functions Sinh and Cosh and consists of four key stages: the exploitation and exploration phases, the limited search strategy and the implementation of a switching mechanism. To evaluate the effectiveness of the suggested approach, we conduct a comparative analysis against four widely acknowledged metaheuristic methods found in the literature. The assessment is performed using ten standard datasets sourced from the UCI Machine Learning Repository. These artificial intelligence algorithms are renowned for their promising capabilities in addressing machine learning problems, particularly those related to data clustering. Comprehensive computational experiments and result analysis demonstrate that the proposed algorithm consistently outperforms the comparison methods while exhibiting remarkable stability. Specifically, the experimental results show that the algorithm achieves superior F-score performance on 9 out of 10 datasets and outperforms in accuracy on 8 out of 10 datasets.
ISSN:2169-3536