TSDCA-BA: An Ultra-Lightweight Speech Enhancement Model for Real-Time Hearing Aids with Multi-Scale STFT Fusion
Lightweight speech denoising models have made remarkable progress in improving both speech quality and computational efficiency. However, most models rely on long temporal windows as input, limiting their applicability in low-latency, real-time scenarios on edge devices. To address this challenge, w...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/15/8183 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Lightweight speech denoising models have made remarkable progress in improving both speech quality and computational efficiency. However, most models rely on long temporal windows as input, limiting their applicability in low-latency, real-time scenarios on edge devices. To address this challenge, we propose a lightweight hybrid module, Temporal Statistics Enhancement, Squeeze-and-Excitation-based Dual Convolutional Attention, and Band-wise Attention (TSE, SDCA, BA) Module. The TSE module enhances single-frame spectral features by concatenating statistical descriptors—mean, standard deviation, maximum, and minimum—thereby capturing richer local information without relying on temporal context. The SDCA and BA module integrates a simplified residual structure and channel attention, while the BA component further strengthens the representation of critical frequency bands through band-wise partitioning and differentiated weighting. The proposed model requires only 0.22 million multiply–accumulate operations (MMACs) and contains a total of 112.3 K parameters, making it well suited for low-latency, real-time speech enhancement applications. Experimental results demonstrate that among lightweight models with fewer than 200K parameters, the proposed approach outperforms most existing methods in both denoising performance and computational efficiency, significantly reducing processing overhead. Furthermore, real-device deployment on an improved hearing aid confirms an inference latency as low as 2 milliseconds, validating its practical potential for real-time edge applications. |
|---|---|
| ISSN: | 2076-3417 |