Design and implementation of spam filtering system based on topic model

Spam filtering technology plays a key role in many areas including information security,transmission efficiency,and automatic information classification.However,the emergence of spam affects the user's sense of experience,and can cause unnecessary economic and time loss.The deficiency of spam f...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaohuai KOU, Hua CHENG
Format: Article
Language:zho
Published: Beijing Xintong Media Co., Ltd 2017-11-01
Series:Dianxin kexue
Subjects:
Online Access:http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2017313/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841530228814381056
author Xiaohuai KOU
Hua CHENG
author_facet Xiaohuai KOU
Hua CHENG
author_sort Xiaohuai KOU
collection DOAJ
description Spam filtering technology plays a key role in many areas including information security,transmission efficiency,and automatic information classification.However,the emergence of spam affects the user's sense of experience,and can cause unnecessary economic and time loss.The deficiency of spam filtering technology was researched,and a method of spam classification based on naive Bayesian was put forward based on multiple keywords.In the subject of mail,the theme model was used by LDA to get the related subject and keyword of the message,and Word2Vec was further used to search keyword synonyms and related words,extending the keyword collection.In the classification of mails,the transcendental probability of the words in the training dataset was obtained by statistical learning.Based on the extended keyword collection and its probability,the joint probability of a subject and a message was deduced by the Bayesian formula as a basis for the spam judgment.At the same time,the spam filtering system based on topic model was simple and easy to apply.By comparing experiments with other typical spam filtering method,it is proved that the method of spam classification based on theme model and the improved method based on Word2Vec can effectively improve the accuracy of spam filtering.
format Article
id doaj-art-9a6f026e69da46448b95970071bd8202
institution Kabale University
issn 1000-0801
language zho
publishDate 2017-11-01
publisher Beijing Xintong Media Co., Ltd
record_format Article
series Dianxin kexue
spelling doaj-art-9a6f026e69da46448b95970071bd82022025-01-15T03:05:48ZzhoBeijing Xintong Media Co., LtdDianxin kexue1000-08012017-11-0133738259598975Design and implementation of spam filtering system based on topic modelXiaohuai KOUHua CHENGSpam filtering technology plays a key role in many areas including information security,transmission efficiency,and automatic information classification.However,the emergence of spam affects the user's sense of experience,and can cause unnecessary economic and time loss.The deficiency of spam filtering technology was researched,and a method of spam classification based on naive Bayesian was put forward based on multiple keywords.In the subject of mail,the theme model was used by LDA to get the related subject and keyword of the message,and Word2Vec was further used to search keyword synonyms and related words,extending the keyword collection.In the classification of mails,the transcendental probability of the words in the training dataset was obtained by statistical learning.Based on the extended keyword collection and its probability,the joint probability of a subject and a message was deduced by the Bayesian formula as a basis for the spam judgment.At the same time,the spam filtering system based on topic model was simple and easy to apply.By comparing experiments with other typical spam filtering method,it is proved that the method of spam classification based on theme model and the improved method based on Word2Vec can effectively improve the accuracy of spam filtering.http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2017313/text classificationspamtopic modelBayesian theory
spellingShingle Xiaohuai KOU
Hua CHENG
Design and implementation of spam filtering system based on topic model
Dianxin kexue
text classification
spam
topic model
Bayesian theory
title Design and implementation of spam filtering system based on topic model
title_full Design and implementation of spam filtering system based on topic model
title_fullStr Design and implementation of spam filtering system based on topic model
title_full_unstemmed Design and implementation of spam filtering system based on topic model
title_short Design and implementation of spam filtering system based on topic model
title_sort design and implementation of spam filtering system based on topic model
topic text classification
spam
topic model
Bayesian theory
url http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2017313/
work_keys_str_mv AT xiaohuaikou designandimplementationofspamfilteringsystembasedontopicmodel
AT huacheng designandimplementationofspamfilteringsystembasedontopicmodel