Deduplication algorithm based on condensed nearest neighbor rule for deduplication metadata
Building effective deduplication index in the memory could reduce disk access times and enhance chunk fingerprint lookup speed,which was a big challenge for deduplication algorithms in massive data environments.As deduplication data set had many samples with high similarity,a deduplication algorithm...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Editorial Department of Journal on Communications
2015-08-01
|
Series: | Tongxin xuebao |
Subjects: | |
Online Access: | http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2015226/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841539637139472384 |
---|---|
author | Wen-bin YAO Peng-di YE Xiao-yong LI Jing-kun CHANG |
author_facet | Wen-bin YAO Peng-di YE Xiao-yong LI Jing-kun CHANG |
author_sort | Wen-bin YAO |
collection | DOAJ |
description | Building effective deduplication index in the memory could reduce disk access times and enhance chunk fingerprint lookup speed,which was a big challenge for deduplication algorithms in massive data environments.As deduplication data set had many samples with high similarity,a deduplication algorithm based on condensed nearest neighbor rule,which was called Dedup<sup>2</sup>,was proposed.Dedup<sup>2</sup>uses clustering algorithm to divide the original deduplication metadata into several categories.According to these categories,it employs condensed nearest neighbor rule to remove the highest similar data in the deduplication metadata.After that it can get the subset of deduplication metadata.Based on this subset,new data objects will be deduplicated based on the principle of data similarity.The results of experiments show that Dedup<sup>2</sup>can reduce the size of deduplication data set more than 50% effectively while maintain similar deduplication ratio. |
format | Article |
id | doaj-art-2340c113a4ae4a3c8b6c89a3164065ae |
institution | Kabale University |
issn | 1000-436X |
language | zho |
publishDate | 2015-08-01 |
publisher | Editorial Department of Journal on Communications |
record_format | Article |
series | Tongxin xuebao |
spelling | doaj-art-2340c113a4ae4a3c8b6c89a3164065ae2025-01-14T06:46:52ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2015-08-01361759694625Deduplication algorithm based on condensed nearest neighbor rule for deduplication metadataWen-bin YAOPeng-di YEXiao-yong LIJing-kun CHANGBuilding effective deduplication index in the memory could reduce disk access times and enhance chunk fingerprint lookup speed,which was a big challenge for deduplication algorithms in massive data environments.As deduplication data set had many samples with high similarity,a deduplication algorithm based on condensed nearest neighbor rule,which was called Dedup<sup>2</sup>,was proposed.Dedup<sup>2</sup>uses clustering algorithm to divide the original deduplication metadata into several categories.According to these categories,it employs condensed nearest neighbor rule to remove the highest similar data in the deduplication metadata.After that it can get the subset of deduplication metadata.Based on this subset,new data objects will be deduplicated based on the principle of data similarity.The results of experiments show that Dedup<sup>2</sup>can reduce the size of deduplication data set more than 50% effectively while maintain similar deduplication ratio.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2015226/deduplicationdeduplication metadatacondensed nearest neighbor rule |
spellingShingle | Wen-bin YAO Peng-di YE Xiao-yong LI Jing-kun CHANG Deduplication algorithm based on condensed nearest neighbor rule for deduplication metadata Tongxin xuebao deduplication deduplication metadata condensed nearest neighbor rule |
title | Deduplication algorithm based on condensed nearest neighbor rule for deduplication metadata |
title_full | Deduplication algorithm based on condensed nearest neighbor rule for deduplication metadata |
title_fullStr | Deduplication algorithm based on condensed nearest neighbor rule for deduplication metadata |
title_full_unstemmed | Deduplication algorithm based on condensed nearest neighbor rule for deduplication metadata |
title_short | Deduplication algorithm based on condensed nearest neighbor rule for deduplication metadata |
title_sort | deduplication algorithm based on condensed nearest neighbor rule for deduplication metadata |
topic | deduplication deduplication metadata condensed nearest neighbor rule |
url | http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2015226/ |
work_keys_str_mv | AT wenbinyao deduplicationalgorithmbasedoncondensednearestneighborrulefordeduplicationmetadata AT pengdiye deduplicationalgorithmbasedoncondensednearestneighborrulefordeduplicationmetadata AT xiaoyongli deduplicationalgorithmbasedoncondensednearestneighborrulefordeduplicationmetadata AT jingkunchang deduplicationalgorithmbasedoncondensednearestneighborrulefordeduplicationmetadata |