Designing and Evaluating a Dual-Stream Transformer-Based Architecture for Visual Question Answering
In the realm of Visual Question Answering, accurate answers often hinge on the harmonious fusion of textual and visual elements. While these complex architectures are effective, they typically come with a hefty price tag: a large number of parameters that demand significant processing power and leng...
Saved in:
Main Authors: | Faheem Shehzad, Aniello Minutolo, Massimo Esposito |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10811881/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Answer Distillation Network With Bi-Text-Image Attention for Medical Visual Question Answering
by: Hongfang Gong, et al.
Published: (2025-01-01) -
Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
by: Zhongjian Hu, et al.
Published: (2024-09-01) -
Enhancing students’ participation through question and answer on SMAN 2 Sungai Kakap Kubu Raya
by: Clarry Sada, et al.
Published: (2024-02-01) -
Visual Question Answering in Robotic Surgery: A Comprehensive Review
by: Di Ding, et al.
Published: (2025-01-01) -
cLegal-QA: a Chinese legal question answering with natural language generation methods
by: Yizhen Wang, et al.
Published: (2024-12-01)