Text this: Residual Vision Transformer and Adaptive Fusion Autoencoders for Monocular Depth Estimation