HitIct:Chinese corpus for the evaluation of lossless compression algorithms

HitIct, a Chinese corpus for the evaluation of lossless compression algorithms based on ANSI code, was proposed.In accordance with the principle of application representativeness, Complementary principle and openness principle, a large number of candidate files were obtained from the Internet, and t...

Full description

Saved in:
Bibliographic Details
Main Authors: CHANG Wei-ling1, YUN Xiao-chun2, FANG Bin-xing1, WANG Shu-peng2
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2009-01-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/74651782/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:HitIct, a Chinese corpus for the evaluation of lossless compression algorithms based on ANSI code, was proposed.In accordance with the principle of application representativeness, Complementary principle and openness principle, a large number of candidate files were obtained from the Internet, and then average compression ratio, average correlation coefficient, compression ratio correlation coefficient and standard deviation were used to select the files that give the most accurate indication of the overall performance of compression algorithms.Experimental results show that this collection has a good representativeness and stability, and can be used as the supplementary test set of the main benchmark for comparing compression methods.
ISSN:1000-436X