Text this: Vision-language models for medical report generation and visual question answering: a review