Global-Local Self-Attention-Based Long Short-Term Memory with Optimization Algorithm for Speaker Identification

Speaker identification (SI) involves recognizing a speaker from a group of unknown speakers, while speaker verification (SV) determines if a given voice sample belongs to a particular person. The main drawbacks of SI are session variability, noise in the background, and insufficient information. To...

Full description

Saved in:
Bibliographic Details
Main Authors: Pravin Marotrao Ghate, Bhagvat D. Jadhav, Shriram Sadashiv Kulkarni, Pravin Balaso Chopade, Prabhakar N. Kota
Format: Article
Language:English
Published: IIUM Press, International Islamic University Malaysia 2025-01-01
Series:International Islamic University Malaysia Engineering Journal
Subjects:
Online Access:https://journals.iium.edu.my/ejournal/index.php/iiumej/article/view/3386
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841549913705414656
author Pravin Marotrao Ghate
Bhagvat D. Jadhav
Shriram Sadashiv Kulkarni
Pravin Balaso Chopade
Prabhakar N. Kota
author_facet Pravin Marotrao Ghate
Bhagvat D. Jadhav
Shriram Sadashiv Kulkarni
Pravin Balaso Chopade
Prabhakar N. Kota
author_sort Pravin Marotrao Ghate
collection DOAJ
description Speaker identification (SI) involves recognizing a speaker from a group of unknown speakers, while speaker verification (SV) determines if a given voice sample belongs to a particular person. The main drawbacks of SI are session variability, noise in the background, and insufficient information. To mitigate the limitations mentioned above, this research proposes Global Local Self-Attention (GLSA) based Long Short-Term Memory (LSTM) with Exponential Neighborhood – Grey Wolf Optimization (EN-GWO) method for effective speaker identification using TIMIT and VoxCeleb 1 datasets. The GLSA is incorporated in LSTM, which focuses on the required data, and the hyperparameters are tuned using the EN-GWO, which enhances speaker identification performance. The GLSA-LSTM with EN-GWO method acquires an accuracy of 99.36% on the TIMIT dataset, and an accuracy of 93.45% on the VoxCeleb 1 datasets, while compared to SincNet and Generative Adversarial Network (SincGAN) and Hybrid Neural Network – Support Vector Machine (NN-SVM). ABSTRAK: Pengenalpastian pembicara (Speaker Identification, SI) melibatkan pengenalan pembicara daripada kumpulan pembicara yang tidak dikenali, manakala pengesahan pembicara (Speaker Verification, SV) menentukan sama ada sampel suara tertentu milik seseorang individu. Kekurangan utama dalam SI ialah variasi sesi, bunyi latar belakang, dan maklumat yang tidak mencukupi. Untuk mengatasi kekangan tersebut, kajian ini mencadangkan kaedah Global Local Self-Attention (GLSA) berasaskan Long Short-Term Memory (LSTM) dengan Pengoptimuman Grey Wolf Jiranan Eksponen (EN-GWO) bagi pengenalpastian pembicara yang berkesan menggunakan set data TIMIT dan VoxCeleb 1. GLSA digabungkan dalam LSTM yang memberi tumpuan pada data yang diperlukan, manakala parameter hiper ditala menggunakan EN-GWO untuk meningkatkan prestasi pengenalpastian pembicara. Kaedah GLSA-LSTM dengan EN-GWO mencapai ketepatan 99.36% pada dataset TIMIT dan ketepatan 93.45% pada dataset VoxCeleb 1, berbanding dengan SincNet dan Generative Adversarial Network (SincGAN) serta Hybrid Neural Network – Support Vector Machine (NN-SVM).
format Article
id doaj-art-33e2878cbc8c46f5a673c45952bc70ad
institution Kabale University
issn 1511-788X
2289-7860
language English
publishDate 2025-01-01
publisher IIUM Press, International Islamic University Malaysia
record_format Article
series International Islamic University Malaysia Engineering Journal
spelling doaj-art-33e2878cbc8c46f5a673c45952bc70ad2025-01-10T12:40:39ZengIIUM Press, International Islamic University MalaysiaInternational Islamic University Malaysia Engineering Journal1511-788X2289-78602025-01-0126110.31436/iiumej.v26i1.3386Global-Local Self-Attention-Based Long Short-Term Memory with Optimization Algorithm for Speaker IdentificationPravin Marotrao Ghate0Bhagvat D. Jadhav1https://orcid.org/0000-0002-1393-6823Shriram Sadashiv Kulkarni2https://orcid.org/0000-0001-8584-8171Pravin Balaso Chopade3Prabhakar N. Kota4https://orcid.org/0000-0002-7537-8433JSPM's Rajarshi Shahu College of EngineeringJSPM's Rajarshi Shahu College of EngineeringSinhgad Academy of EngineeringM.E.S College of EngineeringM.E.S College of Engineering Speaker identification (SI) involves recognizing a speaker from a group of unknown speakers, while speaker verification (SV) determines if a given voice sample belongs to a particular person. The main drawbacks of SI are session variability, noise in the background, and insufficient information. To mitigate the limitations mentioned above, this research proposes Global Local Self-Attention (GLSA) based Long Short-Term Memory (LSTM) with Exponential Neighborhood – Grey Wolf Optimization (EN-GWO) method for effective speaker identification using TIMIT and VoxCeleb 1 datasets. The GLSA is incorporated in LSTM, which focuses on the required data, and the hyperparameters are tuned using the EN-GWO, which enhances speaker identification performance. The GLSA-LSTM with EN-GWO method acquires an accuracy of 99.36% on the TIMIT dataset, and an accuracy of 93.45% on the VoxCeleb 1 datasets, while compared to SincNet and Generative Adversarial Network (SincGAN) and Hybrid Neural Network – Support Vector Machine (NN-SVM). ABSTRAK: Pengenalpastian pembicara (Speaker Identification, SI) melibatkan pengenalan pembicara daripada kumpulan pembicara yang tidak dikenali, manakala pengesahan pembicara (Speaker Verification, SV) menentukan sama ada sampel suara tertentu milik seseorang individu. Kekurangan utama dalam SI ialah variasi sesi, bunyi latar belakang, dan maklumat yang tidak mencukupi. Untuk mengatasi kekangan tersebut, kajian ini mencadangkan kaedah Global Local Self-Attention (GLSA) berasaskan Long Short-Term Memory (LSTM) dengan Pengoptimuman Grey Wolf Jiranan Eksponen (EN-GWO) bagi pengenalpastian pembicara yang berkesan menggunakan set data TIMIT dan VoxCeleb 1. GLSA digabungkan dalam LSTM yang memberi tumpuan pada data yang diperlukan, manakala parameter hiper ditala menggunakan EN-GWO untuk meningkatkan prestasi pengenalpastian pembicara. Kaedah GLSA-LSTM dengan EN-GWO mencapai ketepatan 99.36% pada dataset TIMIT dan ketepatan 93.45% pada dataset VoxCeleb 1, berbanding dengan SincNet dan Generative Adversarial Network (SincGAN) serta Hybrid Neural Network – Support Vector Machine (NN-SVM). https://journals.iium.edu.my/ejournal/index.php/iiumej/article/view/3386Exponential Neighborhood – Grey Wolf Optimizationlobal-Local Self-AttentionLong Short-Term MemoryMel Frequency Cepstral CoefficientSpeaker Identification
spellingShingle Pravin Marotrao Ghate
Bhagvat D. Jadhav
Shriram Sadashiv Kulkarni
Pravin Balaso Chopade
Prabhakar N. Kota
Global-Local Self-Attention-Based Long Short-Term Memory with Optimization Algorithm for Speaker Identification
International Islamic University Malaysia Engineering Journal
Exponential Neighborhood – Grey Wolf Optimization
lobal-Local Self-Attention
Long Short-Term Memory
Mel Frequency Cepstral Coefficient
Speaker Identification
title Global-Local Self-Attention-Based Long Short-Term Memory with Optimization Algorithm for Speaker Identification
title_full Global-Local Self-Attention-Based Long Short-Term Memory with Optimization Algorithm for Speaker Identification
title_fullStr Global-Local Self-Attention-Based Long Short-Term Memory with Optimization Algorithm for Speaker Identification
title_full_unstemmed Global-Local Self-Attention-Based Long Short-Term Memory with Optimization Algorithm for Speaker Identification
title_short Global-Local Self-Attention-Based Long Short-Term Memory with Optimization Algorithm for Speaker Identification
title_sort global local self attention based long short term memory with optimization algorithm for speaker identification
topic Exponential Neighborhood – Grey Wolf Optimization
lobal-Local Self-Attention
Long Short-Term Memory
Mel Frequency Cepstral Coefficient
Speaker Identification
url https://journals.iium.edu.my/ejournal/index.php/iiumej/article/view/3386
work_keys_str_mv AT pravinmarotraoghate globallocalselfattentionbasedlongshorttermmemorywithoptimizationalgorithmforspeakeridentification
AT bhagvatdjadhav globallocalselfattentionbasedlongshorttermmemorywithoptimizationalgorithmforspeakeridentification
AT shriramsadashivkulkarni globallocalselfattentionbasedlongshorttermmemorywithoptimizationalgorithmforspeakeridentification
AT pravinbalasochopade globallocalselfattentionbasedlongshorttermmemorywithoptimizationalgorithmforspeakeridentification
AT prabhakarnkota globallocalselfattentionbasedlongshorttermmemorywithoptimizationalgorithmforspeakeridentification