Efficient Gated Convolutional Recurrent Neural Networks for Real-Time Speech Enhancement

Deep learning (DL) networks have grown into powerful alternatives for speech enhancement and have achieved excellent results by improving speech quality, intelligibility, and background noise suppression. Due to high computational load, most of the DL models for speech enhancement are difficult to i...

Full description

Saved in:

Bibliographic Details
Main Authors:	Fazal-E -Wahab, Zhongfu Ye, Nasir Saleem, Hamza Ali, Imad Ali
Format:	Article
Language:	English
Published:	Universidad Internacional de La Rioja (UNIR) 2025-01-01
Series:	International Journal of Interactive Multimedia and Artificial Intelligence
Subjects:	convolutional gated recurrent unit (convolutional gru) deep learning intelligibility long short term memory (lstm) speech enhancement
Online Access:	https://www.ijimai.org/journal/bibcite/reference/3324
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841560762102841344
author	Fazal-E -Wahab Zhongfu Ye Nasir Saleem Hamza Ali Imad Ali
author_facet	Fazal-E -Wahab Zhongfu Ye Nasir Saleem Hamza Ali Imad Ali
author_sort	Fazal-E -Wahab
collection	DOAJ
description	Deep learning (DL) networks have grown into powerful alternatives for speech enhancement and have achieved excellent results by improving speech quality, intelligibility, and background noise suppression. Due to high computational load, most of the DL models for speech enhancement are difficult to implement for realtime processing. It is challenging to formulate resource efficient and compact networks. In order to address this problem, we propose a resource efficient convolutional recurrent network to learn the complex ratio mask for real-time speech enhancement. Convolutional encoder-decoder and gated recurrent units (GRUs) are integrated into the Convolutional recurrent network architecture, thereby formulating a causal system appropriate for real-time speech processing. Parallel GRU grouping and efficient skipped connection techniques are engaged to achieve a compact network. In the proposed network, the causal encoder-decoder is composed of five convolutional (Conv2D) and deconvolutional (Deconv2D) layers. Leaky linear rectified unit (ReLU) is applied to all layers apart from the output layer where softplus activation to confine the network output to positive is utilized. Furthermore, batch normalization is adopted after every convolution (or deconvolution) and prior to activation. In the proposed network, different noise types and speakers can be used in training and testing. With the LibriSpeech dataset, the experiments show that the proposed real-time approach leads to improved objective perceptual quality and intelligibility with much fewer trainable parameters than existing LSTM and GRU models. The proposed model obtained an average of 83.53% STOI scores and 2.52 PESQ scores, respectively. The quality and intelligibility are improved by 31.61% and 17.18% respectively over noisy speech.
format	Article
id	doaj-art-2d97090989654187b23f20ecf8dd03b3
institution	Kabale University
issn	1989-1660
language	English
publishDate	2025-01-01
publisher	Universidad Internacional de La Rioja (UNIR)
record_format	Article
series	International Journal of Interactive Multimedia and Artificial Intelligence
spelling	doaj-art-2d97090989654187b23f20ecf8dd03b32025-01-03T15:20:35ZengUniversidad Internacional de La Rioja (UNIR)International Journal of Interactive Multimedia and Artificial Intelligence1989-16602025-01-0191667410.9781/ijimai.2023.05.007ijimai.2023.05.007Efficient Gated Convolutional Recurrent Neural Networks for Real-Time Speech EnhancementFazal-E -WahabZhongfu YeNasir SaleemHamza AliImad AliDeep learning (DL) networks have grown into powerful alternatives for speech enhancement and have achieved excellent results by improving speech quality, intelligibility, and background noise suppression. Due to high computational load, most of the DL models for speech enhancement are difficult to implement for realtime processing. It is challenging to formulate resource efficient and compact networks. In order to address this problem, we propose a resource efficient convolutional recurrent network to learn the complex ratio mask for real-time speech enhancement. Convolutional encoder-decoder and gated recurrent units (GRUs) are integrated into the Convolutional recurrent network architecture, thereby formulating a causal system appropriate for real-time speech processing. Parallel GRU grouping and efficient skipped connection techniques are engaged to achieve a compact network. In the proposed network, the causal encoder-decoder is composed of five convolutional (Conv2D) and deconvolutional (Deconv2D) layers. Leaky linear rectified unit (ReLU) is applied to all layers apart from the output layer where softplus activation to confine the network output to positive is utilized. Furthermore, batch normalization is adopted after every convolution (or deconvolution) and prior to activation. In the proposed network, different noise types and speakers can be used in training and testing. With the LibriSpeech dataset, the experiments show that the proposed real-time approach leads to improved objective perceptual quality and intelligibility with much fewer trainable parameters than existing LSTM and GRU models. The proposed model obtained an average of 83.53% STOI scores and 2.52 PESQ scores, respectively. The quality and intelligibility are improved by 31.61% and 17.18% respectively over noisy speech.https://www.ijimai.org/journal/bibcite/reference/3324convolutional gated recurrent unit (convolutional gru)deep learningintelligibilitylong short term memory (lstm)speech enhancement
spellingShingle	Fazal-E -Wahab Zhongfu Ye Nasir Saleem Hamza Ali Imad Ali Efficient Gated Convolutional Recurrent Neural Networks for Real-Time Speech Enhancement International Journal of Interactive Multimedia and Artificial Intelligence convolutional gated recurrent unit (convolutional gru) deep learning intelligibility long short term memory (lstm) speech enhancement
title	Efficient Gated Convolutional Recurrent Neural Networks for Real-Time Speech Enhancement
title_full	Efficient Gated Convolutional Recurrent Neural Networks for Real-Time Speech Enhancement
title_fullStr	Efficient Gated Convolutional Recurrent Neural Networks for Real-Time Speech Enhancement
title_full_unstemmed	Efficient Gated Convolutional Recurrent Neural Networks for Real-Time Speech Enhancement
title_short	Efficient Gated Convolutional Recurrent Neural Networks for Real-Time Speech Enhancement
title_sort	efficient gated convolutional recurrent neural networks for real time speech enhancement
topic	convolutional gated recurrent unit (convolutional gru) deep learning intelligibility long short term memory (lstm) speech enhancement
url	https://www.ijimai.org/journal/bibcite/reference/3324
work_keys_str_mv	AT fazalewahab efficientgatedconvolutionalrecurrentneuralnetworksforrealtimespeechenhancement AT zhongfuye efficientgatedconvolutionalrecurrentneuralnetworksforrealtimespeechenhancement AT nasirsaleem efficientgatedconvolutionalrecurrentneuralnetworksforrealtimespeechenhancement AT hamzaali efficientgatedconvolutionalrecurrentneuralnetworksforrealtimespeechenhancement AT imadali efficientgatedconvolutionalrecurrentneuralnetworksforrealtimespeechenhancement

Efficient Gated Convolutional Recurrent Neural Networks for Real-Time Speech Enhancement

Similar Items