Text this: HitIct:Chinese corpus for the evaluation of lossless compression algorithms