Text this: Deep learning driven multi-scale spatiotemporal fusion dance spectrum generation network: A method based on human pose fusion