Automated speech therapy through personalized pronunciation correction using reinforcement learning and large language models

Traditional approaches to pronunciation correction often face challenges in personalization, adaptability, and consistent feedback. This study introduces a novel AI-powered system that integrates Reinforcement Learning (RL) and Large Language Models (LLMs) to address these limitations. The system em...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ritika Lakshminarayanan, Ayesha Shaik, Ananthakrishnan Balasundaram
Format:	Article
Language:	English
Published:	Elsevier 2025-03-01
Series:	Results in Engineering
Subjects:	Automatic speech recognition Reinforcement learning Proximal policy optimization Large language model Phonetic transcription Speech synthesis markup language
Online Access:	http://www.sciencedirect.com/science/article/pii/S2590123025000313
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841555671881875456
author	Ritika Lakshminarayanan Ayesha Shaik Ananthakrishnan Balasundaram
author_facet	Ritika Lakshminarayanan Ayesha Shaik Ananthakrishnan Balasundaram
author_sort	Ritika Lakshminarayanan
collection	DOAJ
description	Traditional approaches to pronunciation correction often face challenges in personalization, adaptability, and consistent feedback. This study introduces a novel AI-powered system that integrates Reinforcement Learning (RL) and Large Language Models (LLMs) to address these limitations. The system employs a custom Proximal Policy Optimization (PPO) algorithm for precise pronunciation evaluation and an Large Language Models to deliver detailed, encouraging, and user-specific feedback. It was evaluated using the CMU Sphinx Dictionary dataset, a foundational phonetic resource, alongside dynamically generated user-specific session data for personalized feedback and model refinement. Further validation utilized datasets such as TIMIT, LibriTTS, SpeechOcean762, and the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), enabling direct comparisons with contemporary methods. Results demonstrate the system's robustness in handling diverse phonetic variations. While primarily tested on English data, its modular architecture supports adaptation to other languages and dialects through language-specific phonetic datasets. The system achieved exceptional performance metrics: 97.9 % phoneme-level accuracy, 87.7 % word-level accuracy, 95.2 % syllable count accuracy, and 89.4 % perfect accuracy on the CMU Sphinx dataset. This innovative approach underscores the potential of advanced AI techniques to enhance the personalization and effectiveness of pronunciation correction systems. All findings are quantitatively validated and thoroughly documented.
format	Article
id	doaj-art-4d3c5a63f38149059168a1e6391097b4
institution	Kabale University
issn	2590-1230
language	English
publishDate	2025-03-01
publisher	Elsevier
record_format	Article
series	Results in Engineering
spelling	doaj-art-4d3c5a63f38149059168a1e6391097b42025-01-08T04:53:25ZengElsevierResults in Engineering2590-12302025-03-0125103943Automated speech therapy through personalized pronunciation correction using reinforcement learning and large language modelsRitika Lakshminarayanan0Ayesha Shaik1Ananthakrishnan Balasundaram2School of Computer Science and Engineering, Vellore Institute of Technology, Chennai 600127, IndiaCentre for Cyber Physical Systems, Vellore Institute of Technology, Chennai 600127, India; Corresponding author.Centre for Cyber Physical Systems, Vellore Institute of Technology, Chennai 600127, IndiaTraditional approaches to pronunciation correction often face challenges in personalization, adaptability, and consistent feedback. This study introduces a novel AI-powered system that integrates Reinforcement Learning (RL) and Large Language Models (LLMs) to address these limitations. The system employs a custom Proximal Policy Optimization (PPO) algorithm for precise pronunciation evaluation and an Large Language Models to deliver detailed, encouraging, and user-specific feedback. It was evaluated using the CMU Sphinx Dictionary dataset, a foundational phonetic resource, alongside dynamically generated user-specific session data for personalized feedback and model refinement. Further validation utilized datasets such as TIMIT, LibriTTS, SpeechOcean762, and the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), enabling direct comparisons with contemporary methods. Results demonstrate the system's robustness in handling diverse phonetic variations. While primarily tested on English data, its modular architecture supports adaptation to other languages and dialects through language-specific phonetic datasets. The system achieved exceptional performance metrics: 97.9 % phoneme-level accuracy, 87.7 % word-level accuracy, 95.2 % syllable count accuracy, and 89.4 % perfect accuracy on the CMU Sphinx dataset. This innovative approach underscores the potential of advanced AI techniques to enhance the personalization and effectiveness of pronunciation correction systems. All findings are quantitatively validated and thoroughly documented.http://www.sciencedirect.com/science/article/pii/S2590123025000313Automatic speech recognitionReinforcement learningProximal policy optimizationLarge language modelPhonetic transcriptionSpeech synthesis markup language
spellingShingle	Ritika Lakshminarayanan Ayesha Shaik Ananthakrishnan Balasundaram Automated speech therapy through personalized pronunciation correction using reinforcement learning and large language models Results in Engineering Automatic speech recognition Reinforcement learning Proximal policy optimization Large language model Phonetic transcription Speech synthesis markup language
title	Automated speech therapy through personalized pronunciation correction using reinforcement learning and large language models
title_full	Automated speech therapy through personalized pronunciation correction using reinforcement learning and large language models
title_fullStr	Automated speech therapy through personalized pronunciation correction using reinforcement learning and large language models
title_full_unstemmed	Automated speech therapy through personalized pronunciation correction using reinforcement learning and large language models
title_short	Automated speech therapy through personalized pronunciation correction using reinforcement learning and large language models
title_sort	automated speech therapy through personalized pronunciation correction using reinforcement learning and large language models
topic	Automatic speech recognition Reinforcement learning Proximal policy optimization Large language model Phonetic transcription Speech synthesis markup language
url	http://www.sciencedirect.com/science/article/pii/S2590123025000313
work_keys_str_mv	AT ritikalakshminarayanan automatedspeechtherapythroughpersonalizedpronunciationcorrectionusingreinforcementlearningandlargelanguagemodels AT ayeshashaik automatedspeechtherapythroughpersonalizedpronunciationcorrectionusingreinforcementlearningandlargelanguagemodels AT ananthakrishnanbalasundaram automatedspeechtherapythroughpersonalizedpronunciationcorrectionusingreinforcementlearningandlargelanguagemodels

Automated speech therapy through personalized pronunciation correction using reinforcement learning and large language models

Similar Items