Privacy-preserving data compression scheme for k-anonymity model based on Huffman coding

The k-anonymity model is widely used as a data anonymization technique for privacy protection during the data release phase.However, with the advent of the big data era, the generation of vast amounts of data poses challenges to data storage.However, it is not feasible to expand the storage space in...

Full description

Saved in:
Bibliographic Details
Main Authors: Yue YU, Xianzheng LIN, Weihai LI, Nenghai YU
Format: Article
Language:English
Published: POSTS&TELECOM PRESS Co., LTD 2023-08-01
Series:网络与信息安全学报
Subjects:
Online Access:http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2023054
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841529633579728896
author Yue YU
Xianzheng LIN
Weihai LI
Nenghai YU
author_facet Yue YU
Xianzheng LIN
Weihai LI
Nenghai YU
author_sort Yue YU
collection DOAJ
description The k-anonymity model is widely used as a data anonymization technique for privacy protection during the data release phase.However, with the advent of the big data era, the generation of vast amounts of data poses challenges to data storage.However, it is not feasible to expand the storage space infinitely by hardware upgrade, since the cost of memory is high and the storage space is limited.For this reason, data compression techniques can reduce storage costs and communication overhead.In order to reduce the storage space of the data generated by using anonymization techniques in the data publishing phase, a compression scheme was proposed for the original data and anonymized data of the k-anonymity model.For the original data of the k-anonymity model, the difference between the original data and the anonymized data was calculated according to the set rules and the pre-defined generalization level.Huffman coding compression was applied to the difference data according to frequency characteristics.By storing the difference data, the original data can be obtained indirectly, thus reducing the storage space of the original data.For anonymized data of the k-anonymity model, the anonymized data usually have high repeatability according to the generalization rules of the model or the pre-defined generalization hierarchy relations.The larger the value of k, the more generalized and repeatable the anonymized data becomes.The design of Huffman coding compression was implemented for anonymous data to reduce storage space.The experimental results show that the proposed scheme can significantly reduce the original data and the anonymous data compression rate of the k-anonymity model.Across five models and variousk-value settings,the proposed scheme reduces the compression rate of raw and anonymized data by 72.2% and 64.2% on average compared to the Windows 11 zip tool.
format Article
id doaj-art-a2d30fc92a5e44deaa8995788fe57f7d
institution Kabale University
issn 2096-109X
language English
publishDate 2023-08-01
publisher POSTS&TELECOM PRESS Co., LTD
record_format Article
series 网络与信息安全学报
spelling doaj-art-a2d30fc92a5e44deaa8995788fe57f7d2025-01-15T03:16:44ZengPOSTS&TELECOM PRESS Co., LTD网络与信息安全学报2096-109X2023-08-019647359579553Privacy-preserving data compression scheme for k-anonymity model based on Huffman codingYue YUXianzheng LINWeihai LINenghai YUThe k-anonymity model is widely used as a data anonymization technique for privacy protection during the data release phase.However, with the advent of the big data era, the generation of vast amounts of data poses challenges to data storage.However, it is not feasible to expand the storage space infinitely by hardware upgrade, since the cost of memory is high and the storage space is limited.For this reason, data compression techniques can reduce storage costs and communication overhead.In order to reduce the storage space of the data generated by using anonymization techniques in the data publishing phase, a compression scheme was proposed for the original data and anonymized data of the k-anonymity model.For the original data of the k-anonymity model, the difference between the original data and the anonymized data was calculated according to the set rules and the pre-defined generalization level.Huffman coding compression was applied to the difference data according to frequency characteristics.By storing the difference data, the original data can be obtained indirectly, thus reducing the storage space of the original data.For anonymized data of the k-anonymity model, the anonymized data usually have high repeatability according to the generalization rules of the model or the pre-defined generalization hierarchy relations.The larger the value of k, the more generalized and repeatable the anonymized data becomes.The design of Huffman coding compression was implemented for anonymous data to reduce storage space.The experimental results show that the proposed scheme can significantly reduce the original data and the anonymous data compression rate of the k-anonymity model.Across five models and variousk-value settings,the proposed scheme reduces the compression rate of raw and anonymized data by 72.2% and 64.2% on average compared to the Windows 11 zip tool.http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2023054k-anonymity modelprivacy preservationdata compression storageHuffman coding
spellingShingle Yue YU
Xianzheng LIN
Weihai LI
Nenghai YU
Privacy-preserving data compression scheme for k-anonymity model based on Huffman coding
网络与信息安全学报
k-anonymity model
privacy preservation
data compression storage
Huffman coding
title Privacy-preserving data compression scheme for k-anonymity model based on Huffman coding
title_full Privacy-preserving data compression scheme for k-anonymity model based on Huffman coding
title_fullStr Privacy-preserving data compression scheme for k-anonymity model based on Huffman coding
title_full_unstemmed Privacy-preserving data compression scheme for k-anonymity model based on Huffman coding
title_short Privacy-preserving data compression scheme for k-anonymity model based on Huffman coding
title_sort privacy preserving data compression scheme for k anonymity model based on huffman coding
topic k-anonymity model
privacy preservation
data compression storage
Huffman coding
url http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2023054
work_keys_str_mv AT yueyu privacypreservingdatacompressionschemeforkanonymitymodelbasedonhuffmancoding
AT xianzhenglin privacypreservingdatacompressionschemeforkanonymitymodelbasedonhuffmancoding
AT weihaili privacypreservingdatacompressionschemeforkanonymitymodelbasedonhuffmancoding
AT nenghaiyu privacypreservingdatacompressionschemeforkanonymitymodelbasedonhuffmancoding