Designing and Evaluating a Dual-Stream Transformer-Based Architecture for Visual Question Answering

Designing and Evaluating a Dual-Stream Transformer-Based Architecture for Visual Question Answering

In the realm of Visual Question Answering, accurate answers often hinge on the harmonious fusion of textual and visual elements. While these complex architectures are effective, they typically come with a hefty price tag: a large number of parameters that demand significant processing power and leng...

Full description

Saved in:

Bibliographic Details
Main Authors:	Faheem Shehzad, Aniello Minutolo, Massimo Esposito
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Visual question answering (VQA) transformer models natural language processing dual-stream architecture multimodal question answering attention mechanisms
Online Access:	https://ieeexplore.ieee.org/document/10811881/
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Answer Distillation Network With Bi-Text-Image Attention for Medical Visual Question Answering
by: Hongfang Gong, et al.
Published: (2025-01-01)

Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
by: Zhongjian Hu, et al.
Published: (2024-09-01)

Enhancing students’ participation through question and answer on SMAN 2 Sungai Kakap Kubu Raya
by: Clarry Sada, et al.
Published: (2024-02-01)

Visual Question Answering in Robotic Surgery: A Comprehensive Review
by: Di Ding, et al.
Published: (2025-01-01)

cLegal-QA: a Chinese legal question answering with natural language generation methods
by: Yizhen Wang, et al.
Published: (2024-12-01)

QAR (QUESTION ANSWER RELATIONSHIP) AS AN ALTERNATIVE STRATEGY TO TEACH READING
by: Sa’dulloh Muzammil
Published: (2017-01-01)

Application of improved multi-model fusion technology in customer service answering system
by: Guangmin WANG, et al.
Published: (2018-12-01)

Evaluating the effectiveness of prompt engineering for knowledge graph question answering
by: Catherine Kosten, et al.
Published: (2025-01-01)

Expert Detection In Question Answer Communities
by: Hamed Salimian, et al.
Published: (2022-01-01)

Rhetorical questions as aggressive, friendly or sarcastic/ironical questions with imposed answers
by: Špago Džemal
Published: (2020-12-01)

The battle of question formats: a comparative study of retrieval practice using very short answer questions and multiple choice questions
by: Elise V. van Wijk, et al.
Published: (2024-12-01)

Few-shot cybersecurity event detection method by data augmentation with prompting question answering
by: TANG Mengmeng, et al.
Published: (2024-08-01)

Assessing the performance of zero-shot visual question answering in multimodal large language models for 12-lead ECG image interpretation
by: Tomohisa Seki, et al.
Published: (2025-02-01)

AI thousand questions and answers based on large model accurately serve the personalized needs of teachers and students
by: ZHANG Long, et al.
Published: (2024-11-01)

Knowledge Graphs as a source of trust for LLM-powered enterprise question answering
by: Juan Sequeda, et al.
Published: (2025-05-01)

Systematic review of question answering over knowledge bases
by: Arnaldo Pereira, et al.
Published: (2022-02-01)

Research on a traditional Chinese medicine case-based question-answering system integrating large language models and knowledge graphs
by: Yuchen Duan, et al.
Published: (2025-01-01)

Exploring the informational elements of opinion answers: the case of the Russo-Ukrainian war
by: Reijo Savolainen
Published: (2023-06-01)

Statistical Learning for Semantic Parsing: A Survey
by: Qile Zhu, et al.
Published: (2019-12-01)

Two new nonrandomized response models for surveys on sensitive topics
by: Andreas Quatember
Published: (2025-01-01)

Research and practice of AI-based teacher-student service platform: a case study of Nanjing Audit University
by: WU Xin, et al.
Published: (2024-11-01)

Question–Answer Methodology for Vulnerable Source Code Review via Prototype-Based Model-Agnostic Meta-Learning
by: Pablo Corona-Fraga, et al.
Published: (2025-01-01)

Soru-Cevap İlişkisi Stratejisi Öğretiminin Dördüncü Sınıf Öğrencilerinin Okuduğunu Anlama ve Soru Sorma Düzeylerine Etkisi
by: Mehmet Can, et al.
Published: (2025-01-01)

Building a Framework for Visual Question Answering Systems
by: Maya Abu Hamoud, et al.
Published: (2025-01-01)

Праектаванне беларуска- і рускамоўных натуральна-маўленчых інтэрфейсаў для даведкавых сістэм
by: S. A. Hetsevich, et al.
Published: (2021-12-01)

Research and application practice of knowledge graph technology system for telecom-operators
by: Dongming ZHAO
Published: (2022-08-01)

Explorando el potencial de la inteligencia artificial en traumatología: respuestas conversacionales a preguntas específicas
by: F. Canillas del Rey, et al.
Published: (2025-01-01)

Application of Generative Artificial Intelligence Models for Accurate Prescription Label Identification and Information Retrieval for the Elderly in Northern East of Thailand
by: Parinya Thetbanthad, et al.
Published: (2025-01-01)

Development of local knowledge base application using retrieval augmented generation technology
by: ZHU Junyi, et al.
Published: (2024-11-01)

TEACHERS’ QUESTIONS IN INDONESIAN EFL CLASSROOM
by: Ahmadi, et al.
Published: (2020-08-01)

An Answer Verification Approach to Objective Questions in Intelligent Tutoring Systems
by: Wenzu Li
Published: (2023-03-01)

HOW TO ANSWER CHILDREN QUESTIONS
by: O. Brenifier
Published: (2016-03-01)

Spectrum knowledge graph: an intelligent engine facing future spectrum management
by: Jiachen SUN, et al.
Published: (2021-05-01)

Japanese Short Answer Grading for Japanese Language Learners Using the Contextual Representation of BERT
by: Dyah Lalita Luhurkinanti, et al.
Published: (2025-01-01)

Rhetorical questions or rhetorical uses of questions?
by: Špago Džemal
Published: (2016-12-01)

Designing a model for the development of questioning skills based on the school context in primary school students
by: motahare Khosravi Rad, et al.
Published: (2024-09-01)

A systematic evaluation of GPT-4V's multimodal capability for chest X-ray image analysis
by: Yunyi Liu, et al.
Published: (2024-12-01)

Question Sequences and Salience in TED Talks
by: Michele Cardo, et al.
Published: (2024-08-01)

ChatGPT-4 Omni’s superiority in answering multiple-choice oral radiology questions
by: Melek Tassoker
Published: (2025-02-01)

The Secondary Purposes of Rhetorical Questions in the Holy Quran
by: محمود خورسندی, et al.
Published: (2012-06-01)