Text this: Fusing Events and Frames with Coordinate Attention Gated Recurrent Unit for Monocular Depth Estimation