Study on Chinese spam filtering system based on Bayes algorithm

In view of the shortcoming that high dimension of features in the Chinese spam filtering system, a TF-IDF features extraction algorithm was proposed based on the central word extension, the algorithm improves the expression capacity of the node in the network and reduces the dimension of feature. Fu...

Full description

Saved in:
Bibliographic Details
Main Authors: Haoran LIU, Pan DING, Changjiang GUO, Jinfeng CHANG, Jingchuang CUI
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2018-12-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000−436x.2018281/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In view of the shortcoming that high dimension of features in the Chinese spam filtering system, a TF-IDF features extraction algorithm was proposed based on the central word extension, the algorithm improves the expression capacity of the node in the network and reduces the dimension of feature. Further, a three-layer structure model based on GWO_GA structure learning algorithm was proposed to expand the limit of text features and improve the diversity of text features. The new structure learning algorithm relaxes the conditional independence assumption of feature properties. A fine classification layer was added between class layer and feature layer to increase feature coverage. The experiment demonstrates that the three-layer Bayesian network algorithm with TF-IDF feature extraction based on the central word extension and GWO_GA structure learning improves the effect of Chinese spam filtering.
ISSN:1000-436X