A Multi-Modal Attentive Framework That Can Interpret Text (MMAT)
Deep learning algorithms have demonstrated exceptional performance on various computer vision and natural language processing tasks. However, for machines to learn information signals, they must understand and have enough reasoning power to respond to general questions based on the linguistic featur...
Saved in:
| Main Authors: | Vijay Kumari, Sarthak Gupta, Yashvardhan Sharma, Lavika Goel |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11072709/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Envisioning Answers: Unleashing Deep Learning for Visual Question Answering in Artistic Images
by: Erfan Zolghadriha, et al.
Published: (2024-03-01) -
Designing and Evaluating a Dual-Stream Transformer-Based Architecture for Visual Question Answering
by: Faheem Shehzad, et al.
Published: (2024-01-01) -
A Semantic Weight Adaptive Model Based on Visual Question Answering
by: Li Huimin, et al.
Published: (2025-01-01) -
Seeing and Reasoning: A Simple Deep Learning Approach to Visual Question Answering
by: Rufai Yusuf Zakari, et al.
Published: (2025-04-01) -
Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion
by: Junkai Zhang, et al.
Published: (2025-04-01)