Text this: Visual language transformer framework for multimodal dance performance evaluation and progression monitoring