Text this: Fusion of auto encoders and multi-modal data based video recommendation method