A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework

The related work in parallel ETL and common methods to deal with multiple MapReduce jobs were introduced. Then an improved chain-MapReduce framework was presented, based on this framework,a parallel ETL tool was designed. Several optimization rules on ETL which will make the ETL process generate les...

Full description

Saved in:
Bibliographic Details
Main Authors: Bin Wu, Xinguang Liu
Format: Article
Language:zho
Published: Beijing Xintong Media Co., Ltd 2013-12-01
Series:Dianxin kexue
Subjects:
Online Access:http://www.telecomsci.com/zh/article/doi/10.3969/j.issn.1000-0801.2013.12.001/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841529270594174976
author Bin Wu
Xinguang Liu
author_facet Bin Wu
Xinguang Liu
author_sort Bin Wu
collection DOAJ
description The related work in parallel ETL and common methods to deal with multiple MapReduce jobs were introduced. Then an improved chain-MapReduce framework was presented, based on this framework,a parallel ETL tool was designed. Several optimization rules on ETL which will make the ETL process generate less MapReduce jobs to avoid unnecessary I/O and network cost were presented. The ETL tool on real queries and real big datasets were evaluated. Compared with Hive, the tool reduces time on average by 10% to 20%.
format Article
id doaj-art-81fc6c48a4ea4e8381482957c93db5f9
institution Kabale University
issn 1000-0801
language zho
publishDate 2013-12-01
publisher Beijing Xintong Media Co., Ltd
record_format Article
series Dianxin kexue
spelling doaj-art-81fc6c48a4ea4e8381482957c93db5f92025-01-15T03:20:54ZzhoBeijing Xintong Media Co., LtdDianxin kexue1000-08012013-12-01291859624529A Parallel ETL Tool Based on an Improved Chain-MapReduce FrameworkBin WuXinguang LiuThe related work in parallel ETL and common methods to deal with multiple MapReduce jobs were introduced. Then an improved chain-MapReduce framework was presented, based on this framework,a parallel ETL tool was designed. Several optimization rules on ETL which will make the ETL process generate less MapReduce jobs to avoid unnecessary I/O and network cost were presented. The ETL tool on real queries and real big datasets were evaluated. Compared with Hive, the tool reduces time on average by 10% to 20%.http://www.telecomsci.com/zh/article/doi/10.3969/j.issn.1000-0801.2013.12.001/improved chain-MapReduceETLoptimization rule
spellingShingle Bin Wu
Xinguang Liu
A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework
Dianxin kexue
improved chain-MapReduce
ETL
optimization rule
title A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework
title_full A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework
title_fullStr A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework
title_full_unstemmed A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework
title_short A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework
title_sort parallel etl tool based on an improved chain mapreduce framework
topic improved chain-MapReduce
ETL
optimization rule
url http://www.telecomsci.com/zh/article/doi/10.3969/j.issn.1000-0801.2013.12.001/
work_keys_str_mv AT binwu aparalleletltoolbasedonanimprovedchainmapreduceframework
AT xinguangliu aparalleletltoolbasedonanimprovedchainmapreduceframework
AT binwu paralleletltoolbasedonanimprovedchainmapreduceframework
AT xinguangliu paralleletltoolbasedonanimprovedchainmapreduceframework