Text this: Supervised pyramid network based on semantic consistency for object detection