Text this: MVR: Synergizing Large and Vision Transformer for Multimodal Natural Language-Driven Vehicle Retrieval