Deduplication algorithm based on condensed nearest neighbor rule for deduplication metadata

Building effective deduplication index in the memory could reduce disk access times and enhance chunk fingerprint lookup speed,which was a big challenge for deduplication algorithms in massive data environments.As deduplication data set had many samples with high similarity,a deduplication algorithm...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wen-bin YAO, Peng-di YE, Xiao-yong LI, Jing-kun CHANG
Format:	Article
Language:	zho
Published:	Editorial Department of Journal on Communications 2015-08-01
Series:	Tongxin xuebao
Subjects:	deduplication deduplication metadata condensed nearest neighbor rule
Online Access:	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2015226/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841539637139472384
author	Wen-bin YAO Peng-di YE Xiao-yong LI Jing-kun CHANG
author_facet	Wen-bin YAO Peng-di YE Xiao-yong LI Jing-kun CHANG
author_sort	Wen-bin YAO
collection	DOAJ
description	Building effective deduplication index in the memory could reduce disk access times and enhance chunk fingerprint lookup speed,which was a big challenge for deduplication algorithms in massive data environments.As deduplication data set had many samples with high similarity,a deduplication algorithm based on condensed nearest neighbor rule,which was called Dedup<sup>2</sup>,was proposed.Dedup<sup>2</sup>uses clustering algorithm to divide the original deduplication metadata into several categories.According to these categories,it employs condensed nearest neighbor rule to remove the highest similar data in the deduplication metadata.After that it can get the subset of deduplication metadata.Based on this subset,new data objects will be deduplicated based on the principle of data similarity.The results of experiments show that Dedup<sup>2</sup>can reduce the size of deduplication data set more than 50% effectively while maintain similar deduplication ratio.
format	Article
id	doaj-art-2340c113a4ae4a3c8b6c89a3164065ae
institution	Kabale University
issn	1000-436X
language	zho
publishDate	2015-08-01
publisher	Editorial Department of Journal on Communications
record_format	Article
series	Tongxin xuebao
spelling	doaj-art-2340c113a4ae4a3c8b6c89a3164065ae2025-01-14T06:46:52ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2015-08-01361759694625Deduplication algorithm based on condensed nearest neighbor rule for deduplication metadataWen-bin YAOPeng-di YEXiao-yong LIJing-kun CHANGBuilding effective deduplication index in the memory could reduce disk access times and enhance chunk fingerprint lookup speed,which was a big challenge for deduplication algorithms in massive data environments.As deduplication data set had many samples with high similarity,a deduplication algorithm based on condensed nearest neighbor rule,which was called Dedup<sup>2</sup>,was proposed.Dedup<sup>2</sup>uses clustering algorithm to divide the original deduplication metadata into several categories.According to these categories,it employs condensed nearest neighbor rule to remove the highest similar data in the deduplication metadata.After that it can get the subset of deduplication metadata.Based on this subset,new data objects will be deduplicated based on the principle of data similarity.The results of experiments show that Dedup<sup>2</sup>can reduce the size of deduplication data set more than 50% effectively while maintain similar deduplication ratio.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2015226/deduplicationdeduplication metadatacondensed nearest neighbor rule
spellingShingle	Wen-bin YAO Peng-di YE Xiao-yong LI Jing-kun CHANG Deduplication algorithm based on condensed nearest neighbor rule for deduplication metadata Tongxin xuebao deduplication deduplication metadata condensed nearest neighbor rule
title	Deduplication algorithm based on condensed nearest neighbor rule for deduplication metadata
title_full	Deduplication algorithm based on condensed nearest neighbor rule for deduplication metadata
title_fullStr	Deduplication algorithm based on condensed nearest neighbor rule for deduplication metadata
title_full_unstemmed	Deduplication algorithm based on condensed nearest neighbor rule for deduplication metadata
title_short	Deduplication algorithm based on condensed nearest neighbor rule for deduplication metadata
title_sort	deduplication algorithm based on condensed nearest neighbor rule for deduplication metadata
topic	deduplication deduplication metadata condensed nearest neighbor rule
url	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2015226/
work_keys_str_mv	AT wenbinyao deduplicationalgorithmbasedoncondensednearestneighborrulefordeduplicationmetadata AT pengdiye deduplicationalgorithmbasedoncondensednearestneighborrulefordeduplicationmetadata AT xiaoyongli deduplicationalgorithmbasedoncondensednearestneighborrulefordeduplicationmetadata AT jingkunchang deduplicationalgorithmbasedoncondensednearestneighborrulefordeduplicationmetadata

Deduplication algorithm based on condensed nearest neighbor rule for deduplication metadata

Similar Items