The perception of code-switched vs. monolingual sentences in TTS voices

This study examines the intelligibility of English and Spanish lexical items in code-switched utterances across different text-to-speech (TTS) synthesis methods. Using stimuli generated with neural and concatenative TTS, 49 Spanish-English bilingual participants listened to 96 sentences, mixed with...

Full description

Saved in:
Bibliographic Details
Main Authors: Tyler Méndez Kline, Georgia Zellou
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-08-01
Series:Frontiers in Computer Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fcomp.2025.1565604/full
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study examines the intelligibility of English and Spanish lexical items in code-switched utterances across different text-to-speech (TTS) synthesis methods. Using stimuli generated with neural and concatenative TTS, 49 Spanish-English bilingual participants listened to 96 sentences, mixed with noise, and typed the phrase-final keyword. Half of the sentences contained English-Spanish code-switches (equal number of English and Spanish target keywords), and half were monolingual sentences (half English, half Spanish). Accuracy was coded binomially for correct word identification. Results show that intelligibility is lower: (1) when the target words are produced in Spanish, and (2) in code-switched conditions. These results are in contrast with previous work showing intelligibility differences between TTS conditions. Moreover, the lower intelligibility results in the Spanish target word sentences and code-switched conditions present motivations for improving voice-AI speech to include common bilingual practices.
ISSN:2624-9898