Computing nasalance with MFCCs and Convolutional Neural Networks.

Nasalance is a valuable clinical biomarker for hypernasality. It is computed as the ratio of acoustic energy emitted through the nose to the total energy emitted through the mouth and nose (eNasalance). A new approach is proposed to compute nasalance using Convolutional Neural Networks (CNNs) traine...

Full description

Saved in:

Bibliographic Details
Main Authors:	Andrés Lozano, Enrique Nava, María Dolores García Méndez, Ignacio Moreno-Torres
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2024-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0315452
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841555526575456256
author	Andrés Lozano Enrique Nava María Dolores García Méndez Ignacio Moreno-Torres
author_facet	Andrés Lozano Enrique Nava María Dolores García Méndez Ignacio Moreno-Torres
author_sort	Andrés Lozano
collection	DOAJ
description	Nasalance is a valuable clinical biomarker for hypernasality. It is computed as the ratio of acoustic energy emitted through the nose to the total energy emitted through the mouth and nose (eNasalance). A new approach is proposed to compute nasalance using Convolutional Neural Networks (CNNs) trained with Mel-Frequency Cepstrum Coefficients (mfccNasalance). mfccNasalance is evaluated by examining its accuracy: 1) when the train and test data are from the same or from different dialects; 2) with test data that differs in dynamicity (e.g. rapidly produced diadochokinetic syllables versus short words); and 3) using multiple CNN configurations (i.e. kernel shape and use of 1 × 1 pointwise convolution). Dual-channel Nasometer speech data from healthy speakers from different dialects: Costa Rica, more(+) nasal, Spain and Chile, less(-) nasal, are recorded. The input to the CNN models were sequences of 39 MFCC vectors computed from 250 ms moving windows. The test data were recorded in Spain and included short words (-dynamic), sentences (+dynamic), and diadochokinetic syllables (+dynamic). The accuracy of a CNN model was defined as the Spearman correlation between the mfccNasalance for that model and the perceptual nasality scores of human experts. In the same-dialect condition, mfccNasalance was more accurate than eNasalance independently of the CNN configuration; using a 1 × 1 kernel resulted in increased accuracy for +dynamic utterances (p < .000), though not for -dynamic utterances. The kernel shape had a significant impact for -dynamic utterances (p < .000) exclusively. In the different-dialect condition, the scores were significantly less accurate than in the same-dialect condition, particularly for Costa Rica trained models. We conclude that mfccNasalance is a flexible and useful alternative to eNasalance. Future studies should explore how to optimize mfccNasalance by selecting the most adequate CNN model as a function of the dynamicity of the target speech data.
format	Article
id	doaj-art-96f352eb6be3464c98689792aa92c119
institution	Kabale University
issn	1932-6203
language	English
publishDate	2024-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj-art-96f352eb6be3464c98689792aa92c1192025-01-08T05:32:18ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-011912e031545210.1371/journal.pone.0315452Computing nasalance with MFCCs and Convolutional Neural Networks.Andrés LozanoEnrique NavaMaría Dolores García MéndezIgnacio Moreno-TorresNasalance is a valuable clinical biomarker for hypernasality. It is computed as the ratio of acoustic energy emitted through the nose to the total energy emitted through the mouth and nose (eNasalance). A new approach is proposed to compute nasalance using Convolutional Neural Networks (CNNs) trained with Mel-Frequency Cepstrum Coefficients (mfccNasalance). mfccNasalance is evaluated by examining its accuracy: 1) when the train and test data are from the same or from different dialects; 2) with test data that differs in dynamicity (e.g. rapidly produced diadochokinetic syllables versus short words); and 3) using multiple CNN configurations (i.e. kernel shape and use of 1 × 1 pointwise convolution). Dual-channel Nasometer speech data from healthy speakers from different dialects: Costa Rica, more(+) nasal, Spain and Chile, less(-) nasal, are recorded. The input to the CNN models were sequences of 39 MFCC vectors computed from 250 ms moving windows. The test data were recorded in Spain and included short words (-dynamic), sentences (+dynamic), and diadochokinetic syllables (+dynamic). The accuracy of a CNN model was defined as the Spearman correlation between the mfccNasalance for that model and the perceptual nasality scores of human experts. In the same-dialect condition, mfccNasalance was more accurate than eNasalance independently of the CNN configuration; using a 1 × 1 kernel resulted in increased accuracy for +dynamic utterances (p < .000), though not for -dynamic utterances. The kernel shape had a significant impact for -dynamic utterances (p < .000) exclusively. In the different-dialect condition, the scores were significantly less accurate than in the same-dialect condition, particularly for Costa Rica trained models. We conclude that mfccNasalance is a flexible and useful alternative to eNasalance. Future studies should explore how to optimize mfccNasalance by selecting the most adequate CNN model as a function of the dynamicity of the target speech data.https://doi.org/10.1371/journal.pone.0315452
spellingShingle	Andrés Lozano Enrique Nava María Dolores García Méndez Ignacio Moreno-Torres Computing nasalance with MFCCs and Convolutional Neural Networks. PLoS ONE
title	Computing nasalance with MFCCs and Convolutional Neural Networks.
title_full	Computing nasalance with MFCCs and Convolutional Neural Networks.
title_fullStr	Computing nasalance with MFCCs and Convolutional Neural Networks.
title_full_unstemmed	Computing nasalance with MFCCs and Convolutional Neural Networks.
title_short	Computing nasalance with MFCCs and Convolutional Neural Networks.
title_sort	computing nasalance with mfccs and convolutional neural networks
url	https://doi.org/10.1371/journal.pone.0315452
work_keys_str_mv	AT andreslozano computingnasalancewithmfccsandconvolutionalneuralnetworks AT enriquenava computingnasalancewithmfccsandconvolutionalneuralnetworks AT mariadoloresgarciamendez computingnasalancewithmfccsandconvolutionalneuralnetworks AT ignaciomorenotorres computingnasalancewithmfccsandconvolutionalneuralnetworks

Computing nasalance with MFCCs and Convolutional Neural Networks.

Similar Items