Multimodal fusion-powered English speaking robot

IntroductionSpeech recognition and multimodal learning are two critical areas in machine learning. Current multimodal speech recognition systems often encounter challenges such as high computational demands and model complexity.MethodsTo overcome these issues, we propose a novel framework-EnglishAL-...

Full description

Saved in:

Bibliographic Details
Main Author:	Ruiying Pan
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2024-11-01
Series:	Frontiers in Neurorobotics
Subjects:	ALBEF Neural Machine Translation (NMT) cross-attention mechanism multimodal robot speech recognition
Online Access:	https://www.frontiersin.org/articles/10.3389/fnbot.2024.1478181/full
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846166976496402432
author	Ruiying Pan
author_facet	Ruiying Pan
author_sort	Ruiying Pan
collection	DOAJ
description	IntroductionSpeech recognition and multimodal learning are two critical areas in machine learning. Current multimodal speech recognition systems often encounter challenges such as high computational demands and model complexity.MethodsTo overcome these issues, we propose a novel framework-EnglishAL-Net, a Multimodal Fusion-powered English Speaking Robot. This framework leverages the ALBEF model, optimizing it for real-time speech and multimodal interaction, and incorporates a newly designed text and image editor to fuse visual and textual information. The robot processes dynamic spoken input through the integration of Neural Machine Translation (NMT), enhancing its ability to understand and respond to spoken language.Results and discussionIn the experimental section, we constructed a dataset containing various scenarios and oral instructions for testing. The results show that compared to traditional unimodal processing methods, our model significantly improves both language understanding accuracy and response time. This research not only enhances the performance of multimodal interaction in robots but also opens up new possibilities for applications of robotic technology in education, rescue, customer service, and other fields, holding significant theoretical and practical value.
format	Article
id	doaj-art-7efe7699a41747c2aa1a2e8f2c47167d
institution	Kabale University
issn	1662-5218
language	English
publishDate	2024-11-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Neurorobotics
spelling	doaj-art-7efe7699a41747c2aa1a2e8f2c47167d2024-11-15T06:13:38ZengFrontiers Media S.A.Frontiers in Neurorobotics1662-52182024-11-011810.3389/fnbot.2024.14781811478181Multimodal fusion-powered English speaking robotRuiying PanIntroductionSpeech recognition and multimodal learning are two critical areas in machine learning. Current multimodal speech recognition systems often encounter challenges such as high computational demands and model complexity.MethodsTo overcome these issues, we propose a novel framework-EnglishAL-Net, a Multimodal Fusion-powered English Speaking Robot. This framework leverages the ALBEF model, optimizing it for real-time speech and multimodal interaction, and incorporates a newly designed text and image editor to fuse visual and textual information. The robot processes dynamic spoken input through the integration of Neural Machine Translation (NMT), enhancing its ability to understand and respond to spoken language.Results and discussionIn the experimental section, we constructed a dataset containing various scenarios and oral instructions for testing. The results show that compared to traditional unimodal processing methods, our model significantly improves both language understanding accuracy and response time. This research not only enhances the performance of multimodal interaction in robots but also opens up new possibilities for applications of robotic technology in education, rescue, customer service, and other fields, holding significant theoretical and practical value.https://www.frontiersin.org/articles/10.3389/fnbot.2024.1478181/fullALBEFNeural Machine Translation (NMT)cross-attention mechanismmultimodal robotspeech recognition
spellingShingle	Ruiying Pan Multimodal fusion-powered English speaking robot Frontiers in Neurorobotics ALBEF Neural Machine Translation (NMT) cross-attention mechanism multimodal robot speech recognition
title	Multimodal fusion-powered English speaking robot
title_full	Multimodal fusion-powered English speaking robot
title_fullStr	Multimodal fusion-powered English speaking robot
title_full_unstemmed	Multimodal fusion-powered English speaking robot
title_short	Multimodal fusion-powered English speaking robot
title_sort	multimodal fusion powered english speaking robot
topic	ALBEF Neural Machine Translation (NMT) cross-attention mechanism multimodal robot speech recognition
url	https://www.frontiersin.org/articles/10.3389/fnbot.2024.1478181/full
work_keys_str_mv	AT ruiyingpan multimodalfusionpoweredenglishspeakingrobot

Multimodal fusion-powered English speaking robot

Similar Items