Text this: BVQA: Connecting Language and Vision Through Multimodal Attention for Open-Ended Question Answering