Text this: An integrated framework for multi-granular explanation of video summarization