Privacy-preserving data compression scheme for k-anonymity model based on Huffman coding
The k-anonymity model is widely used as a data anonymization technique for privacy protection during the data release phase.However, with the advent of the big data era, the generation of vast amounts of data poses challenges to data storage.However, it is not feasible to expand the storage space in...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
POSTS&TELECOM PRESS Co., LTD
2023-08-01
|
Series: | 网络与信息安全学报 |
Subjects: | |
Online Access: | http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2023054 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841529633579728896 |
---|---|
author | Yue YU Xianzheng LIN Weihai LI Nenghai YU |
author_facet | Yue YU Xianzheng LIN Weihai LI Nenghai YU |
author_sort | Yue YU |
collection | DOAJ |
description | The k-anonymity model is widely used as a data anonymization technique for privacy protection during the data release phase.However, with the advent of the big data era, the generation of vast amounts of data poses challenges to data storage.However, it is not feasible to expand the storage space infinitely by hardware upgrade, since the cost of memory is high and the storage space is limited.For this reason, data compression techniques can reduce storage costs and communication overhead.In order to reduce the storage space of the data generated by using anonymization techniques in the data publishing phase, a compression scheme was proposed for the original data and anonymized data of the k-anonymity model.For the original data of the k-anonymity model, the difference between the original data and the anonymized data was calculated according to the set rules and the pre-defined generalization level.Huffman coding compression was applied to the difference data according to frequency characteristics.By storing the difference data, the original data can be obtained indirectly, thus reducing the storage space of the original data.For anonymized data of the k-anonymity model, the anonymized data usually have high repeatability according to the generalization rules of the model or the pre-defined generalization hierarchy relations.The larger the value of k, the more generalized and repeatable the anonymized data becomes.The design of Huffman coding compression was implemented for anonymous data to reduce storage space.The experimental results show that the proposed scheme can significantly reduce the original data and the anonymous data compression rate of the k-anonymity model.Across five models and variousk-value settings,the proposed scheme reduces the compression rate of raw and anonymized data by 72.2% and 64.2% on average compared to the Windows 11 zip tool. |
format | Article |
id | doaj-art-a2d30fc92a5e44deaa8995788fe57f7d |
institution | Kabale University |
issn | 2096-109X |
language | English |
publishDate | 2023-08-01 |
publisher | POSTS&TELECOM PRESS Co., LTD |
record_format | Article |
series | 网络与信息安全学报 |
spelling | doaj-art-a2d30fc92a5e44deaa8995788fe57f7d2025-01-15T03:16:44ZengPOSTS&TELECOM PRESS Co., LTD网络与信息安全学报2096-109X2023-08-019647359579553Privacy-preserving data compression scheme for k-anonymity model based on Huffman codingYue YUXianzheng LINWeihai LINenghai YUThe k-anonymity model is widely used as a data anonymization technique for privacy protection during the data release phase.However, with the advent of the big data era, the generation of vast amounts of data poses challenges to data storage.However, it is not feasible to expand the storage space infinitely by hardware upgrade, since the cost of memory is high and the storage space is limited.For this reason, data compression techniques can reduce storage costs and communication overhead.In order to reduce the storage space of the data generated by using anonymization techniques in the data publishing phase, a compression scheme was proposed for the original data and anonymized data of the k-anonymity model.For the original data of the k-anonymity model, the difference between the original data and the anonymized data was calculated according to the set rules and the pre-defined generalization level.Huffman coding compression was applied to the difference data according to frequency characteristics.By storing the difference data, the original data can be obtained indirectly, thus reducing the storage space of the original data.For anonymized data of the k-anonymity model, the anonymized data usually have high repeatability according to the generalization rules of the model or the pre-defined generalization hierarchy relations.The larger the value of k, the more generalized and repeatable the anonymized data becomes.The design of Huffman coding compression was implemented for anonymous data to reduce storage space.The experimental results show that the proposed scheme can significantly reduce the original data and the anonymous data compression rate of the k-anonymity model.Across five models and variousk-value settings,the proposed scheme reduces the compression rate of raw and anonymized data by 72.2% and 64.2% on average compared to the Windows 11 zip tool.http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2023054k-anonymity modelprivacy preservationdata compression storageHuffman coding |
spellingShingle | Yue YU Xianzheng LIN Weihai LI Nenghai YU Privacy-preserving data compression scheme for k-anonymity model based on Huffman coding 网络与信息安全学报 k-anonymity model privacy preservation data compression storage Huffman coding |
title | Privacy-preserving data compression scheme for k-anonymity model based on Huffman coding |
title_full | Privacy-preserving data compression scheme for k-anonymity model based on Huffman coding |
title_fullStr | Privacy-preserving data compression scheme for k-anonymity model based on Huffman coding |
title_full_unstemmed | Privacy-preserving data compression scheme for k-anonymity model based on Huffman coding |
title_short | Privacy-preserving data compression scheme for k-anonymity model based on Huffman coding |
title_sort | privacy preserving data compression scheme for k anonymity model based on huffman coding |
topic | k-anonymity model privacy preservation data compression storage Huffman coding |
url | http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2023054 |
work_keys_str_mv | AT yueyu privacypreservingdatacompressionschemeforkanonymitymodelbasedonhuffmancoding AT xianzhenglin privacypreservingdatacompressionschemeforkanonymitymodelbasedonhuffmancoding AT weihaili privacypreservingdatacompressionschemeforkanonymitymodelbasedonhuffmancoding AT nenghaiyu privacypreservingdatacompressionschemeforkanonymitymodelbasedonhuffmancoding |