Text this: Video Detection and Fusion Tracking for the Targets in Traffic Scenario