Friend-Guard Textfooler Attack on Text Classification System

Deep neural networks provide good performance for image classification, text classification, speech classification, and pattern analysis. However, such neural networks are vulnerable to adversarial examples. An adversarial example is a sample created by adding a little noise to the original sample d...

Full description

Saved in:
Bibliographic Details
Main Author: Hyun Kwon
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9432814/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841542538211622912
author Hyun Kwon
author_facet Hyun Kwon
author_sort Hyun Kwon
collection DOAJ
description Deep neural networks provide good performance for image classification, text classification, speech classification, and pattern analysis. However, such neural networks are vulnerable to adversarial examples. An adversarial example is a sample created by adding a little noise to the original sample data and that, although presenting no change identifiable to human perception, will be misclassified by a deep neural network. Most studies on adversarial examples have focused on images, but research is expanding to include the field of text. Textual adversarial examples can be useful in certain situations, such as when models of both friend and enemy coexist, as in a military scenario. Here, a specific message may be generated as an adversarial example such that no grammatical or semantic problems are apparent to human perception and it will be correctly classified by the friend model but incorrectly classified by the enemy model. In this paper, I propose a “friend-guard” textual adversarial example for a text classification system. Unlike the existing methods for generating image adversarial examples, the proposed method creates adversarial examples designed to be misclassified by an enemy model and correctly classified by a friend model while retaining the meaning and grammar of the original sentence by replacing words of importance with substitutions. Experiments were conducted using a movie review dataset and the TensorFlow library. The experimental results show that the proposed method can generate an adversarial example that will be correctly classified with 88.2% accuracy by the friend model and 26.1% accuracy by the enemy model.
format Article
id doaj-art-3e8df493b59f46249f19ecb3cb3f34c5
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-3e8df493b59f46249f19ecb3cb3f34c52025-01-14T00:02:27ZengIEEEIEEE Access2169-35362025-01-01133841384810.1109/ACCESS.2021.30806809432814Friend-Guard Textfooler Attack on Text Classification SystemHyun Kwon0https://orcid.org/0000-0003-1169-9892Department of Artificial Intelligence and Data Science, Korea Military Academy, Seoul, South KoreaDeep neural networks provide good performance for image classification, text classification, speech classification, and pattern analysis. However, such neural networks are vulnerable to adversarial examples. An adversarial example is a sample created by adding a little noise to the original sample data and that, although presenting no change identifiable to human perception, will be misclassified by a deep neural network. Most studies on adversarial examples have focused on images, but research is expanding to include the field of text. Textual adversarial examples can be useful in certain situations, such as when models of both friend and enemy coexist, as in a military scenario. Here, a specific message may be generated as an adversarial example such that no grammatical or semantic problems are apparent to human perception and it will be correctly classified by the friend model but incorrectly classified by the enemy model. In this paper, I propose a “friend-guard” textual adversarial example for a text classification system. Unlike the existing methods for generating image adversarial examples, the proposed method creates adversarial examples designed to be misclassified by an enemy model and correctly classified by a friend model while retaining the meaning and grammar of the original sentence by replacing words of importance with substitutions. Experiments were conducted using a movie review dataset and the TensorFlow library. The experimental results show that the proposed method can generate an adversarial example that will be correctly classified with 88.2% accuracy by the friend model and 26.1% accuracy by the enemy model.https://ieeexplore.ieee.org/document/9432814/Machine learningtext classificationtext adversarial exampleevasion attackdeep neural network (DNN)
spellingShingle Hyun Kwon
Friend-Guard Textfooler Attack on Text Classification System
IEEE Access
Machine learning
text classification
text adversarial example
evasion attack
deep neural network (DNN)
title Friend-Guard Textfooler Attack on Text Classification System
title_full Friend-Guard Textfooler Attack on Text Classification System
title_fullStr Friend-Guard Textfooler Attack on Text Classification System
title_full_unstemmed Friend-Guard Textfooler Attack on Text Classification System
title_short Friend-Guard Textfooler Attack on Text Classification System
title_sort friend guard textfooler attack on text classification system
topic Machine learning
text classification
text adversarial example
evasion attack
deep neural network (DNN)
url https://ieeexplore.ieee.org/document/9432814/
work_keys_str_mv AT hyunkwon friendguardtextfoolerattackontextclassificationsystem