Chinese adversarial text generation method based on punctuation insertion

The susceptibility of natural language processing models to adversarial texts has been a significant concern. Current methods for generating adversarial texts in Chinese were mainly based on replacing characters with visually similar or homophonic ones. However, when faced with robust pre-trained mo...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHANG Qian, YAN Qiao
Format: Article
Language:English
Published: POSTS&TELECOM PRESS Co., LTD 2025-04-01
Series:网络与信息安全学报
Subjects:
Online Access:http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2025026
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The susceptibility of natural language processing models to adversarial texts has been a significant concern. Current methods for generating adversarial texts in Chinese were mainly based on replacing characters with visually similar or homophonic ones. However, when faced with robust pre-trained models, these methods led to increased perturbations in adversarial texts, resulting in reduced fluency and readability, and thus generating low-quality adversarial texts. Moreover, symbol insertion methods used in English adversarial texts were not entirely applicable to Chinese. Additionally, in a black-box scenario, the lack of prior knowledge made it difficult to generate high-quality adversarial texts. A punctuation-based method for generating adversarial texts for Chinese text classification tasks was proposed. Under a black-box setting, a novel part-of-speech importance calculation was utilized and combined with punctuation insertion to design a character-level perturbation approach suitable for Chinese, achieving the generation of adversarial texts. Experiments were conducted, and the results demonstrated that for text classification tasks, the proposed method significantly improved the attack success rate on LSTM and BERT models trained with two real-world datasets. Furthermore, the method successfully avoided direct destruction of the original sentences and maintained the original meaning. In the tests, a semantic similarity of up to 97% was achieved, which was significantly better than the baseline methods.
ISSN:2096-109X