Text this: Designing and Evaluating a Dual-Stream Transformer-Based Architecture for Visual Question Answering