End-to-End Multi-Speaker FastSpeech2 With Hierarchical Decoder

End-to-End Multi-Speaker FastSpeech2 With Hierarchical Decoder

Multi-speaker text-to-speech (TTS) systems play a crucial role in different applications, such as personalized voice assistants, audiobooks, and multilingual speech synthesis. These systems aim to generate high-quality, natural-sounding speech while preserving the distinct characteristics of differe...

Full description

Saved in:

Bibliographic Details
Main Authors:	Majid Adibian, Hossein Zeinali
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Neural text-to-speech multi-speaker speech synthesis end-to-end deep learning models speaker adaptation in TTS non-autoregressive speech generation
Online Access:	https://ieeexplore.ieee.org/document/11080147/
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ECE-TTS: A Zero-Shot Emotion Text-to-Speech Model with Simplified and Precise Control
by: Shixiong Liang, et al.
Published: (2025-05-01)

Assessment of the Speech Material Usability for Forensic Speaker Identification by Voice and Sounding Speech
by: T. N. Svirava, et al.
Published: (2025-04-01)

In Memoriam Professor Wojciech Majewski
by: Andrzej Bogdan DOBRUCKI
Published: (2021-08-01)

Assessing the effectiveness of diarization algorithms in costa rican children-adult speech according to age group and gender
by: Alejandro Chacón-Vargas, et al.
Published: (2022-11-01)

Privacy-Preserving Deep Speaker Separation for Smartphone-Based Passive Speech Assessment
by: Apiwat Ditthapron, et al.
Published: (2021-01-01)

Speaker Authentication Using Vector Quantization
by: Bushra Q. Al-Abudi, et al.
Published: (2009-12-01)

CONDITIONS FOR SUCCESS IN APOLOGY SPEECH GENRE
by: Oleksandra M. Shumiatska
Published: (2021-12-01)

End-to-end feature fusion for jointly optimized speech enhancement and automatic speech recognition
by: Mohamed Medani, et al.
Published: (2025-07-01)

Cross-linguistic rhythmic patterns in Persian-English bilingual speakers: Implications for speaker recognition
by: Homa Asadi, et al.
Published: (2024-12-01)

Application of Fischer semi discriminant analysis for speaker diarization in costa rican radio broadcasts
by: Roberto Sánchez Cárdenas, et al.
Published: (2022-11-01)

End-to-End Mandarin Speech Reconstruction Based on Ultrasound Tongue Images Using Deep Learning
by: Fengji Li, et al.
Published: (2025-01-01)

An end-to-end text-to-speech system for vehicle-mounted devices
by: LUO Xiao, et al.
Published: (2023-11-01)

Contrastive analysis of politeness strategies in refusal speech acts: A study of Sundanese and Batak language
by: Rossy Halimatun Rosyidah
Published: (2025-01-01)

The Influence of Language Experience on Speech Perception: Heritage Spanish Speaker Perception of Contrastive and Allophonic Consonants
by: Amanda Boomershine, et al.
Published: (2025-04-01)

Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech
by: Yeunju Choi, et al.
Published: (2022-01-01)

Crowd Speaker Identification Methodologies, Datasets And Features: Review
by: Husam Alasadi, et al.
Published: (2024-12-01)

Comparative Analysis of GPT-4 and LLaMA 3.2 Integration With Speech Processing Models for Enhancing Human–Robot Interaction and Motion Control in Real-World Applications
by: Sheeba Uruj, et al.
Published: (2025-01-01)

Automatic development of speech-in-noise hearing tests using machine learning
by: Sigrid Polspoel, et al.
Published: (2025-04-01)

A method for synthetic speech detection using local phase quantization
by: Jia XU, et al.
Published: (2024-02-01)

Leveraging Low-Rank Adaptation for Parameter-Efficient Fine-Tuning in Multi-Speaker Adaptive Text-to-Speech Synthesis
by: Changi Hong, et al.
Published: (2024-01-01)

Components of Speaker’s Rhetoric in Terms of Interpretation of the Text of Public Speaking (based on the Speech of W. Churchill “Their Finest Hour”)
by: T. N. Zubakina
Published: (2020-02-01)

Speech Emotion Recognition Using Transfer Learning: Integration of Advanced Speaker Embeddings and Image Recognition Models
by: Maros Jakubec, et al.
Published: (2024-10-01)

Primacy of mouth over eyes to perceive audiovisual Mandarin lexical tones
by: Biao Zeng, et al.
Published: (2023-11-01)

The Effect of Voice over IP Transmission Degradations on MAP-EM-GMM Speaker Verification Performance
by: Waldemar MACIEJKO
Published: (2015-08-01)

The analysis of transformer end-to-end model in Real-time interactive scene based on speech recognition technology
by: Ping Li, et al.
Published: (2025-05-01)

Relative Applicability of Diverse Automatic Speech Recognition Platforms for Transcription of Psychiatric Treatment Sessions
by: Rana Zeeshan, et al.
Published: (2025-01-01)

Advancing automatic speech recognition for low-resource ghanaian languages: Audio datasets for Akan, Ewe, Dagbani, Dagaare, and IkposoScience Data Bank
by: Isaac Wiafe, et al.
Published: (2025-08-01)

Using casual speech phonology in synthetic speech
by: Linda SHOCKEY
Published: (2014-04-01)

Some comments about the existing theory of sound with comparison to the experimental research of vector effects in real-life acoustic near fields
by: Paweł MRÓWKA, et al.
Published: (2014-09-01)

L’énonciation aphorisante dans l’article de presse : une syntaxe sous contrôle(s)
by: Grégoire LACAZE
Published: (2015-06-01)

From Vision to Voice: A Multi-Modal Assistive Framework for the Physically Impaired
by: Suhas Bhat, et al.
Published: (2025-01-01)

End-to-end scene text detection and recognition algorithm based on Transformer decoders
by: Jinzhi ZHENG, et al.
Published: (2023-05-01)

End-to-end scene text detection and recognition algorithm based on Transformer decoders
by: Jinzhi ZHENG, et al.
Published: (2023-05-01)

Speaking out for speakers: a guide for and analysis of robot speaker design
by: Nnamdi Nwagwu, et al.
Published: (2024-11-01)

DeepLASD countermeasure for logical access audio spoofing
by: Hamed Al-Tairi, et al.
Published: (2025-07-01)

NAIA: A Multi-Technology Virtual Assistant for Boosting Academic Environments—A Case Study
by: Adrian Pabon Mendoza, et al.
Published: (2025-01-01)

Dense-Fusion2Net a more efficient and lightweight short speech speaker recognition system with time-frequency channel attention
by: Fei Deng, et al.
Published: (2025-03-01)

EMPHATIC APOLOGY IN GERMAN LINGUACULTURE
by: Oleksandra M. Shumiatska
Published: (2019-12-01)

Quand la foule devient peuple … avec Léon Gambetta
by: Aude Dontenwille-Gerbaud
Published: (2010-09-01)

Gender and Speech Dısfluency Productıon: a Psycholınguıstıc Analysıs on Turkısh Speakers
by: Ayşe Altıparmak, et al.
Published: (2018-10-01)