Global-Local Self-Attention-Based Long Short-Term Memory with Optimization Algorithm for Speaker Identification

Speaker identification (SI) involves recognizing a speaker from a group of unknown speakers, while speaker verification (SV) determines if a given voice sample belongs to a particular person. The main drawbacks of SI are session variability, noise in the background, and insufficient information. To...

Full description

Saved in:

Bibliographic Details
Main Authors:	Pravin Marotrao Ghate, Bhagvat D. Jadhav, Shriram Sadashiv Kulkarni, Pravin Balaso Chopade, Prabhakar N. Kota
Format:	Article
Language:	English
Published:	IIUM Press, International Islamic University Malaysia 2025-01-01
Series:	International Islamic University Malaysia Engineering Journal
Subjects:	Exponential Neighborhood – Grey Wolf Optimization lobal-Local Self-Attention Long Short-Term Memory Mel Frequency Cepstral Coefficient Speaker Identification
Online Access:	https://journals.iium.edu.my/ejournal/index.php/iiumej/article/view/3386
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841549913705414656
author	Pravin Marotrao Ghate Bhagvat D. Jadhav Shriram Sadashiv Kulkarni Pravin Balaso Chopade Prabhakar N. Kota
author_facet	Pravin Marotrao Ghate Bhagvat D. Jadhav Shriram Sadashiv Kulkarni Pravin Balaso Chopade Prabhakar N. Kota
author_sort	Pravin Marotrao Ghate
collection	DOAJ
description	Speaker identification (SI) involves recognizing a speaker from a group of unknown speakers, while speaker verification (SV) determines if a given voice sample belongs to a particular person. The main drawbacks of SI are session variability, noise in the background, and insufficient information. To mitigate the limitations mentioned above, this research proposes Global Local Self-Attention (GLSA) based Long Short-Term Memory (LSTM) with Exponential Neighborhood – Grey Wolf Optimization (EN-GWO) method for effective speaker identification using TIMIT and VoxCeleb 1 datasets. The GLSA is incorporated in LSTM, which focuses on the required data, and the hyperparameters are tuned using the EN-GWO, which enhances speaker identification performance. The GLSA-LSTM with EN-GWO method acquires an accuracy of 99.36% on the TIMIT dataset, and an accuracy of 93.45% on the VoxCeleb 1 datasets, while compared to SincNet and Generative Adversarial Network (SincGAN) and Hybrid Neural Network – Support Vector Machine (NN-SVM). ABSTRAK: Pengenalpastian pembicara (Speaker Identification, SI) melibatkan pengenalan pembicara daripada kumpulan pembicara yang tidak dikenali, manakala pengesahan pembicara (Speaker Verification, SV) menentukan sama ada sampel suara tertentu milik seseorang individu. Kekurangan utama dalam SI ialah variasi sesi, bunyi latar belakang, dan maklumat yang tidak mencukupi. Untuk mengatasi kekangan tersebut, kajian ini mencadangkan kaedah Global Local Self-Attention (GLSA) berasaskan Long Short-Term Memory (LSTM) dengan Pengoptimuman Grey Wolf Jiranan Eksponen (EN-GWO) bagi pengenalpastian pembicara yang berkesan menggunakan set data TIMIT dan VoxCeleb 1. GLSA digabungkan dalam LSTM yang memberi tumpuan pada data yang diperlukan, manakala parameter hiper ditala menggunakan EN-GWO untuk meningkatkan prestasi pengenalpastian pembicara. Kaedah GLSA-LSTM dengan EN-GWO mencapai ketepatan 99.36% pada dataset TIMIT dan ketepatan 93.45% pada dataset VoxCeleb 1, berbanding dengan SincNet dan Generative Adversarial Network (SincGAN) serta Hybrid Neural Network – Support Vector Machine (NN-SVM).
format	Article
id	doaj-art-33e2878cbc8c46f5a673c45952bc70ad
institution	Kabale University
issn	1511-788X 2289-7860
language	English
publishDate	2025-01-01
publisher	IIUM Press, International Islamic University Malaysia
record_format	Article
series	International Islamic University Malaysia Engineering Journal
spelling	doaj-art-33e2878cbc8c46f5a673c45952bc70ad2025-01-10T12:40:39ZengIIUM Press, International Islamic University MalaysiaInternational Islamic University Malaysia Engineering Journal1511-788X2289-78602025-01-0126110.31436/iiumej.v26i1.3386Global-Local Self-Attention-Based Long Short-Term Memory with Optimization Algorithm for Speaker IdentificationPravin Marotrao Ghate0Bhagvat D. Jadhav1https://orcid.org/0000-0002-1393-6823Shriram Sadashiv Kulkarni2https://orcid.org/0000-0001-8584-8171Pravin Balaso Chopade3Prabhakar N. Kota4https://orcid.org/0000-0002-7537-8433JSPM's Rajarshi Shahu College of EngineeringJSPM's Rajarshi Shahu College of EngineeringSinhgad Academy of EngineeringM.E.S College of EngineeringM.E.S College of Engineering Speaker identification (SI) involves recognizing a speaker from a group of unknown speakers, while speaker verification (SV) determines if a given voice sample belongs to a particular person. The main drawbacks of SI are session variability, noise in the background, and insufficient information. To mitigate the limitations mentioned above, this research proposes Global Local Self-Attention (GLSA) based Long Short-Term Memory (LSTM) with Exponential Neighborhood – Grey Wolf Optimization (EN-GWO) method for effective speaker identification using TIMIT and VoxCeleb 1 datasets. The GLSA is incorporated in LSTM, which focuses on the required data, and the hyperparameters are tuned using the EN-GWO, which enhances speaker identification performance. The GLSA-LSTM with EN-GWO method acquires an accuracy of 99.36% on the TIMIT dataset, and an accuracy of 93.45% on the VoxCeleb 1 datasets, while compared to SincNet and Generative Adversarial Network (SincGAN) and Hybrid Neural Network – Support Vector Machine (NN-SVM). ABSTRAK: Pengenalpastian pembicara (Speaker Identification, SI) melibatkan pengenalan pembicara daripada kumpulan pembicara yang tidak dikenali, manakala pengesahan pembicara (Speaker Verification, SV) menentukan sama ada sampel suara tertentu milik seseorang individu. Kekurangan utama dalam SI ialah variasi sesi, bunyi latar belakang, dan maklumat yang tidak mencukupi. Untuk mengatasi kekangan tersebut, kajian ini mencadangkan kaedah Global Local Self-Attention (GLSA) berasaskan Long Short-Term Memory (LSTM) dengan Pengoptimuman Grey Wolf Jiranan Eksponen (EN-GWO) bagi pengenalpastian pembicara yang berkesan menggunakan set data TIMIT dan VoxCeleb 1. GLSA digabungkan dalam LSTM yang memberi tumpuan pada data yang diperlukan, manakala parameter hiper ditala menggunakan EN-GWO untuk meningkatkan prestasi pengenalpastian pembicara. Kaedah GLSA-LSTM dengan EN-GWO mencapai ketepatan 99.36% pada dataset TIMIT dan ketepatan 93.45% pada dataset VoxCeleb 1, berbanding dengan SincNet dan Generative Adversarial Network (SincGAN) serta Hybrid Neural Network – Support Vector Machine (NN-SVM). https://journals.iium.edu.my/ejournal/index.php/iiumej/article/view/3386Exponential Neighborhood – Grey Wolf Optimizationlobal-Local Self-AttentionLong Short-Term MemoryMel Frequency Cepstral CoefficientSpeaker Identification
spellingShingle	Pravin Marotrao Ghate Bhagvat D. Jadhav Shriram Sadashiv Kulkarni Pravin Balaso Chopade Prabhakar N. Kota Global-Local Self-Attention-Based Long Short-Term Memory with Optimization Algorithm for Speaker Identification International Islamic University Malaysia Engineering Journal Exponential Neighborhood – Grey Wolf Optimization lobal-Local Self-Attention Long Short-Term Memory Mel Frequency Cepstral Coefficient Speaker Identification
title	Global-Local Self-Attention-Based Long Short-Term Memory with Optimization Algorithm for Speaker Identification
title_full	Global-Local Self-Attention-Based Long Short-Term Memory with Optimization Algorithm for Speaker Identification
title_fullStr	Global-Local Self-Attention-Based Long Short-Term Memory with Optimization Algorithm for Speaker Identification
title_full_unstemmed	Global-Local Self-Attention-Based Long Short-Term Memory with Optimization Algorithm for Speaker Identification
title_short	Global-Local Self-Attention-Based Long Short-Term Memory with Optimization Algorithm for Speaker Identification
title_sort	global local self attention based long short term memory with optimization algorithm for speaker identification
topic	Exponential Neighborhood – Grey Wolf Optimization lobal-Local Self-Attention Long Short-Term Memory Mel Frequency Cepstral Coefficient Speaker Identification
url	https://journals.iium.edu.my/ejournal/index.php/iiumej/article/view/3386
work_keys_str_mv	AT pravinmarotraoghate globallocalselfattentionbasedlongshorttermmemorywithoptimizationalgorithmforspeakeridentification AT bhagvatdjadhav globallocalselfattentionbasedlongshorttermmemorywithoptimizationalgorithmforspeakeridentification AT shriramsadashivkulkarni globallocalselfattentionbasedlongshorttermmemorywithoptimizationalgorithmforspeakeridentification AT pravinbalasochopade globallocalselfattentionbasedlongshorttermmemorywithoptimizationalgorithmforspeakeridentification AT prabhakarnkota globallocalselfattentionbasedlongshorttermmemorywithoptimizationalgorithmforspeakeridentification

Global-Local Self-Attention-Based Long Short-Term Memory with Optimization Algorithm for Speaker Identification

Similar Items