Multimodal fusion-powered English speaking robot
IntroductionSpeech recognition and multimodal learning are two critical areas in machine learning. Current multimodal speech recognition systems often encounter challenges such as high computational demands and model complexity.MethodsTo overcome these issues, we propose a novel framework-EnglishAL-...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2024-11-01
|
Series: | Frontiers in Neurorobotics |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fnbot.2024.1478181/full |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1846166976496402432 |
---|---|
author | Ruiying Pan |
author_facet | Ruiying Pan |
author_sort | Ruiying Pan |
collection | DOAJ |
description | IntroductionSpeech recognition and multimodal learning are two critical areas in machine learning. Current multimodal speech recognition systems often encounter challenges such as high computational demands and model complexity.MethodsTo overcome these issues, we propose a novel framework-EnglishAL-Net, a Multimodal Fusion-powered English Speaking Robot. This framework leverages the ALBEF model, optimizing it for real-time speech and multimodal interaction, and incorporates a newly designed text and image editor to fuse visual and textual information. The robot processes dynamic spoken input through the integration of Neural Machine Translation (NMT), enhancing its ability to understand and respond to spoken language.Results and discussionIn the experimental section, we constructed a dataset containing various scenarios and oral instructions for testing. The results show that compared to traditional unimodal processing methods, our model significantly improves both language understanding accuracy and response time. This research not only enhances the performance of multimodal interaction in robots but also opens up new possibilities for applications of robotic technology in education, rescue, customer service, and other fields, holding significant theoretical and practical value. |
format | Article |
id | doaj-art-7efe7699a41747c2aa1a2e8f2c47167d |
institution | Kabale University |
issn | 1662-5218 |
language | English |
publishDate | 2024-11-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Neurorobotics |
spelling | doaj-art-7efe7699a41747c2aa1a2e8f2c47167d2024-11-15T06:13:38ZengFrontiers Media S.A.Frontiers in Neurorobotics1662-52182024-11-011810.3389/fnbot.2024.14781811478181Multimodal fusion-powered English speaking robotRuiying PanIntroductionSpeech recognition and multimodal learning are two critical areas in machine learning. Current multimodal speech recognition systems often encounter challenges such as high computational demands and model complexity.MethodsTo overcome these issues, we propose a novel framework-EnglishAL-Net, a Multimodal Fusion-powered English Speaking Robot. This framework leverages the ALBEF model, optimizing it for real-time speech and multimodal interaction, and incorporates a newly designed text and image editor to fuse visual and textual information. The robot processes dynamic spoken input through the integration of Neural Machine Translation (NMT), enhancing its ability to understand and respond to spoken language.Results and discussionIn the experimental section, we constructed a dataset containing various scenarios and oral instructions for testing. The results show that compared to traditional unimodal processing methods, our model significantly improves both language understanding accuracy and response time. This research not only enhances the performance of multimodal interaction in robots but also opens up new possibilities for applications of robotic technology in education, rescue, customer service, and other fields, holding significant theoretical and practical value.https://www.frontiersin.org/articles/10.3389/fnbot.2024.1478181/fullALBEFNeural Machine Translation (NMT)cross-attention mechanismmultimodal robotspeech recognition |
spellingShingle | Ruiying Pan Multimodal fusion-powered English speaking robot Frontiers in Neurorobotics ALBEF Neural Machine Translation (NMT) cross-attention mechanism multimodal robot speech recognition |
title | Multimodal fusion-powered English speaking robot |
title_full | Multimodal fusion-powered English speaking robot |
title_fullStr | Multimodal fusion-powered English speaking robot |
title_full_unstemmed | Multimodal fusion-powered English speaking robot |
title_short | Multimodal fusion-powered English speaking robot |
title_sort | multimodal fusion powered english speaking robot |
topic | ALBEF Neural Machine Translation (NMT) cross-attention mechanism multimodal robot speech recognition |
url | https://www.frontiersin.org/articles/10.3389/fnbot.2024.1478181/full |
work_keys_str_mv | AT ruiyingpan multimodalfusionpoweredenglishspeakingrobot |