NLP neural network copyright protection based on black box watermark
With the rapid development of natural language processing techniques, the use of language models in text classification and sentiment analysis has been increasing.However, language models are susceptible to piracy and redistribution by adversaries, posing a serious threat to the intellectual propert...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
POSTS&TELECOM PRESS Co., LTD
2023-02-01
|
Series: | 网络与信息安全学报 |
Subjects: | |
Online Access: | http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2023009 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841529595292024832 |
---|---|
author | Long DAI Jing ZHANG Xuefeng FAN Xiaoyi ZHOU |
author_facet | Long DAI Jing ZHANG Xuefeng FAN Xiaoyi ZHOU |
author_sort | Long DAI |
collection | DOAJ |
description | With the rapid development of natural language processing techniques, the use of language models in text classification and sentiment analysis has been increasing.However, language models are susceptible to piracy and redistribution by adversaries, posing a serious threat to the intellectual property of model owners.Therefore, researchers have been working on designing protection mechanisms to identify the copyright information of language models.However, existing watermarking of language models for text classification tasks cannot be associated with the owner’s identity, and they are not robust enough and cannot regenerate trigger sets.To solve these problems, a new model, namely black-box watermarking scheme for text classification tasks, was proposed.It was a scheme that can remotely and quickly verify model ownership.The copyright message and the key of the model owner were obtained through the Hash-based Message Authentication Code (HMAC), and the message digest obtained by HMAC can prevent forgery and had high security.A certain amount of text data was randomly selected from each category of the original training set and the digest was combined with the text data to construct the trigger set, then the watermark was embedded on the language model during the training process.To evaluate the performance of the proposed scheme, watermarks were embedded on three common language models on the IMDB’s movie reviews and CNews text classification datasets.The experimental results show that the accuracy of the proposed watermarking verification scheme can reach 100% without affecting the original model.Even under common attacks such as model fine-tuning and pruning, the proposed watermarking scheme shows strong robustness and resistance to forgery attacks.Meanwhile, the embedding of the watermark does not affect the convergence time of the model and has high embedding efficiency. |
format | Article |
id | doaj-art-2c9d1707a8a54ec6992d75c02988e71a |
institution | Kabale University |
issn | 2096-109X |
language | English |
publishDate | 2023-02-01 |
publisher | POSTS&TELECOM PRESS Co., LTD |
record_format | Article |
series | 网络与信息安全学报 |
spelling | doaj-art-2c9d1707a8a54ec6992d75c02988e71a2025-01-15T03:16:31ZengPOSTS&TELECOM PRESS Co., LTD网络与信息安全学报2096-109X2023-02-01914014959577416NLP neural network copyright protection based on black box watermarkLong DAIJing ZHANGXuefeng FANXiaoyi ZHOUWith the rapid development of natural language processing techniques, the use of language models in text classification and sentiment analysis has been increasing.However, language models are susceptible to piracy and redistribution by adversaries, posing a serious threat to the intellectual property of model owners.Therefore, researchers have been working on designing protection mechanisms to identify the copyright information of language models.However, existing watermarking of language models for text classification tasks cannot be associated with the owner’s identity, and they are not robust enough and cannot regenerate trigger sets.To solve these problems, a new model, namely black-box watermarking scheme for text classification tasks, was proposed.It was a scheme that can remotely and quickly verify model ownership.The copyright message and the key of the model owner were obtained through the Hash-based Message Authentication Code (HMAC), and the message digest obtained by HMAC can prevent forgery and had high security.A certain amount of text data was randomly selected from each category of the original training set and the digest was combined with the text data to construct the trigger set, then the watermark was embedded on the language model during the training process.To evaluate the performance of the proposed scheme, watermarks were embedded on three common language models on the IMDB’s movie reviews and CNews text classification datasets.The experimental results show that the accuracy of the proposed watermarking verification scheme can reach 100% without affecting the original model.Even under common attacks such as model fine-tuning and pruning, the proposed watermarking scheme shows strong robustness and resistance to forgery attacks.Meanwhile, the embedding of the watermark does not affect the convergence time of the model and has high embedding efficiency.http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2023009natural language processingtext classificationcopyright protectionlanguage modelblack box watermarking |
spellingShingle | Long DAI Jing ZHANG Xuefeng FAN Xiaoyi ZHOU NLP neural network copyright protection based on black box watermark 网络与信息安全学报 natural language processing text classification copyright protection language model black box watermarking |
title | NLP neural network copyright protection based on black box watermark |
title_full | NLP neural network copyright protection based on black box watermark |
title_fullStr | NLP neural network copyright protection based on black box watermark |
title_full_unstemmed | NLP neural network copyright protection based on black box watermark |
title_short | NLP neural network copyright protection based on black box watermark |
title_sort | nlp neural network copyright protection based on black box watermark |
topic | natural language processing text classification copyright protection language model black box watermarking |
url | http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2023009 |
work_keys_str_mv | AT longdai nlpneuralnetworkcopyrightprotectionbasedonblackboxwatermark AT jingzhang nlpneuralnetworkcopyrightprotectionbasedonblackboxwatermark AT xuefengfan nlpneuralnetworkcopyrightprotectionbasedonblackboxwatermark AT xiaoyizhou nlpneuralnetworkcopyrightprotectionbasedonblackboxwatermark |