Friend-Guard Textfooler Attack on Text Classification System

Deep neural networks provide good performance for image classification, text classification, speech classification, and pattern analysis. However, such neural networks are vulnerable to adversarial examples. An adversarial example is a sample created by adding a little noise to the original sample d...

Full description

Saved in:

Bibliographic Details
Main Author:	Hyun Kwon
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Machine learning text classification text adversarial example evasion attack deep neural network (DNN)
Online Access:	https://ieeexplore.ieee.org/document/9432814/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841542538211622912
author	Hyun Kwon
author_facet	Hyun Kwon
author_sort	Hyun Kwon
collection	DOAJ
description	Deep neural networks provide good performance for image classification, text classification, speech classification, and pattern analysis. However, such neural networks are vulnerable to adversarial examples. An adversarial example is a sample created by adding a little noise to the original sample data and that, although presenting no change identifiable to human perception, will be misclassified by a deep neural network. Most studies on adversarial examples have focused on images, but research is expanding to include the field of text. Textual adversarial examples can be useful in certain situations, such as when models of both friend and enemy coexist, as in a military scenario. Here, a specific message may be generated as an adversarial example such that no grammatical or semantic problems are apparent to human perception and it will be correctly classified by the friend model but incorrectly classified by the enemy model. In this paper, I propose a “friend-guard” textual adversarial example for a text classification system. Unlike the existing methods for generating image adversarial examples, the proposed method creates adversarial examples designed to be misclassified by an enemy model and correctly classified by a friend model while retaining the meaning and grammar of the original sentence by replacing words of importance with substitutions. Experiments were conducted using a movie review dataset and the TensorFlow library. The experimental results show that the proposed method can generate an adversarial example that will be correctly classified with 88.2% accuracy by the friend model and 26.1% accuracy by the enemy model.
format	Article
id	doaj-art-3e8df493b59f46249f19ecb3cb3f34c5
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-3e8df493b59f46249f19ecb3cb3f34c52025-01-14T00:02:27ZengIEEEIEEE Access2169-35362025-01-01133841384810.1109/ACCESS.2021.30806809432814Friend-Guard Textfooler Attack on Text Classification SystemHyun Kwon0https://orcid.org/0000-0003-1169-9892Department of Artificial Intelligence and Data Science, Korea Military Academy, Seoul, South KoreaDeep neural networks provide good performance for image classification, text classification, speech classification, and pattern analysis. However, such neural networks are vulnerable to adversarial examples. An adversarial example is a sample created by adding a little noise to the original sample data and that, although presenting no change identifiable to human perception, will be misclassified by a deep neural network. Most studies on adversarial examples have focused on images, but research is expanding to include the field of text. Textual adversarial examples can be useful in certain situations, such as when models of both friend and enemy coexist, as in a military scenario. Here, a specific message may be generated as an adversarial example such that no grammatical or semantic problems are apparent to human perception and it will be correctly classified by the friend model but incorrectly classified by the enemy model. In this paper, I propose a “friend-guard” textual adversarial example for a text classification system. Unlike the existing methods for generating image adversarial examples, the proposed method creates adversarial examples designed to be misclassified by an enemy model and correctly classified by a friend model while retaining the meaning and grammar of the original sentence by replacing words of importance with substitutions. Experiments were conducted using a movie review dataset and the TensorFlow library. The experimental results show that the proposed method can generate an adversarial example that will be correctly classified with 88.2% accuracy by the friend model and 26.1% accuracy by the enemy model.https://ieeexplore.ieee.org/document/9432814/Machine learningtext classificationtext adversarial exampleevasion attackdeep neural network (DNN)
spellingShingle	Hyun Kwon Friend-Guard Textfooler Attack on Text Classification System IEEE Access Machine learning text classification text adversarial example evasion attack deep neural network (DNN)
title	Friend-Guard Textfooler Attack on Text Classification System
title_full	Friend-Guard Textfooler Attack on Text Classification System
title_fullStr	Friend-Guard Textfooler Attack on Text Classification System
title_full_unstemmed	Friend-Guard Textfooler Attack on Text Classification System
title_short	Friend-Guard Textfooler Attack on Text Classification System
title_sort	friend guard textfooler attack on text classification system
topic	Machine learning text classification text adversarial example evasion attack deep neural network (DNN)
url	https://ieeexplore.ieee.org/document/9432814/
work_keys_str_mv	AT hyunkwon friendguardtextfoolerattackontextclassificationsystem

Friend-Guard Textfooler Attack on Text Classification System

Similar Items