NLP neural network copyright protection based on black box watermark

With the rapid development of natural language processing techniques, the use of language models in text classification and sentiment analysis has been increasing.However, language models are susceptible to piracy and redistribution by adversaries, posing a serious threat to the intellectual propert...

Full description

Saved in:
Bibliographic Details
Main Authors: Long DAI, Jing ZHANG, Xuefeng FAN, Xiaoyi ZHOU
Format: Article
Language:English
Published: POSTS&TELECOM PRESS Co., LTD 2023-02-01
Series:网络与信息安全学报
Subjects:
Online Access:http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2023009
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841529595292024832
author Long DAI
Jing ZHANG
Xuefeng FAN
Xiaoyi ZHOU
author_facet Long DAI
Jing ZHANG
Xuefeng FAN
Xiaoyi ZHOU
author_sort Long DAI
collection DOAJ
description With the rapid development of natural language processing techniques, the use of language models in text classification and sentiment analysis has been increasing.However, language models are susceptible to piracy and redistribution by adversaries, posing a serious threat to the intellectual property of model owners.Therefore, researchers have been working on designing protection mechanisms to identify the copyright information of language models.However, existing watermarking of language models for text classification tasks cannot be associated with the owner’s identity, and they are not robust enough and cannot regenerate trigger sets.To solve these problems, a new model, namely black-box watermarking scheme for text classification tasks, was proposed.It was a scheme that can remotely and quickly verify model ownership.The copyright message and the key of the model owner were obtained through the Hash-based Message Authentication Code (HMAC), and the message digest obtained by HMAC can prevent forgery and had high security.A certain amount of text data was randomly selected from each category of the original training set and the digest was combined with the text data to construct the trigger set, then the watermark was embedded on the language model during the training process.To evaluate the performance of the proposed scheme, watermarks were embedded on three common language models on the IMDB’s movie reviews and CNews text classification datasets.The experimental results show that the accuracy of the proposed watermarking verification scheme can reach 100% without affecting the original model.Even under common attacks such as model fine-tuning and pruning, the proposed watermarking scheme shows strong robustness and resistance to forgery attacks.Meanwhile, the embedding of the watermark does not affect the convergence time of the model and has high embedding efficiency.
format Article
id doaj-art-2c9d1707a8a54ec6992d75c02988e71a
institution Kabale University
issn 2096-109X
language English
publishDate 2023-02-01
publisher POSTS&TELECOM PRESS Co., LTD
record_format Article
series 网络与信息安全学报
spelling doaj-art-2c9d1707a8a54ec6992d75c02988e71a2025-01-15T03:16:31ZengPOSTS&TELECOM PRESS Co., LTD网络与信息安全学报2096-109X2023-02-01914014959577416NLP neural network copyright protection based on black box watermarkLong DAIJing ZHANGXuefeng FANXiaoyi ZHOUWith the rapid development of natural language processing techniques, the use of language models in text classification and sentiment analysis has been increasing.However, language models are susceptible to piracy and redistribution by adversaries, posing a serious threat to the intellectual property of model owners.Therefore, researchers have been working on designing protection mechanisms to identify the copyright information of language models.However, existing watermarking of language models for text classification tasks cannot be associated with the owner’s identity, and they are not robust enough and cannot regenerate trigger sets.To solve these problems, a new model, namely black-box watermarking scheme for text classification tasks, was proposed.It was a scheme that can remotely and quickly verify model ownership.The copyright message and the key of the model owner were obtained through the Hash-based Message Authentication Code (HMAC), and the message digest obtained by HMAC can prevent forgery and had high security.A certain amount of text data was randomly selected from each category of the original training set and the digest was combined with the text data to construct the trigger set, then the watermark was embedded on the language model during the training process.To evaluate the performance of the proposed scheme, watermarks were embedded on three common language models on the IMDB’s movie reviews and CNews text classification datasets.The experimental results show that the accuracy of the proposed watermarking verification scheme can reach 100% without affecting the original model.Even under common attacks such as model fine-tuning and pruning, the proposed watermarking scheme shows strong robustness and resistance to forgery attacks.Meanwhile, the embedding of the watermark does not affect the convergence time of the model and has high embedding efficiency.http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2023009natural language processingtext classificationcopyright protectionlanguage modelblack box watermarking
spellingShingle Long DAI
Jing ZHANG
Xuefeng FAN
Xiaoyi ZHOU
NLP neural network copyright protection based on black box watermark
网络与信息安全学报
natural language processing
text classification
copyright protection
language model
black box watermarking
title NLP neural network copyright protection based on black box watermark
title_full NLP neural network copyright protection based on black box watermark
title_fullStr NLP neural network copyright protection based on black box watermark
title_full_unstemmed NLP neural network copyright protection based on black box watermark
title_short NLP neural network copyright protection based on black box watermark
title_sort nlp neural network copyright protection based on black box watermark
topic natural language processing
text classification
copyright protection
language model
black box watermarking
url http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2023009
work_keys_str_mv AT longdai nlpneuralnetworkcopyrightprotectionbasedonblackboxwatermark
AT jingzhang nlpneuralnetworkcopyrightprotectionbasedonblackboxwatermark
AT xuefengfan nlpneuralnetworkcopyrightprotectionbasedonblackboxwatermark
AT xiaoyizhou nlpneuralnetworkcopyrightprotectionbasedonblackboxwatermark