A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework
The related work in parallel ETL and common methods to deal with multiple MapReduce jobs were introduced. Then an improved chain-MapReduce framework was presented, based on this framework,a parallel ETL tool was designed. Several optimization rules on ETL which will make the ETL process generate les...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Beijing Xintong Media Co., Ltd
2013-12-01
|
Series: | Dianxin kexue |
Subjects: | |
Online Access: | http://www.telecomsci.com/zh/article/doi/10.3969/j.issn.1000-0801.2013.12.001/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841529270594174976 |
---|---|
author | Bin Wu Xinguang Liu |
author_facet | Bin Wu Xinguang Liu |
author_sort | Bin Wu |
collection | DOAJ |
description | The related work in parallel ETL and common methods to deal with multiple MapReduce jobs were introduced. Then an improved chain-MapReduce framework was presented, based on this framework,a parallel ETL tool was designed. Several optimization rules on ETL which will make the ETL process generate less MapReduce jobs to avoid unnecessary I/O and network cost were presented. The ETL tool on real queries and real big datasets were evaluated. Compared with Hive, the tool reduces time on average by 10% to 20%. |
format | Article |
id | doaj-art-81fc6c48a4ea4e8381482957c93db5f9 |
institution | Kabale University |
issn | 1000-0801 |
language | zho |
publishDate | 2013-12-01 |
publisher | Beijing Xintong Media Co., Ltd |
record_format | Article |
series | Dianxin kexue |
spelling | doaj-art-81fc6c48a4ea4e8381482957c93db5f92025-01-15T03:20:54ZzhoBeijing Xintong Media Co., LtdDianxin kexue1000-08012013-12-01291859624529A Parallel ETL Tool Based on an Improved Chain-MapReduce FrameworkBin WuXinguang LiuThe related work in parallel ETL and common methods to deal with multiple MapReduce jobs were introduced. Then an improved chain-MapReduce framework was presented, based on this framework,a parallel ETL tool was designed. Several optimization rules on ETL which will make the ETL process generate less MapReduce jobs to avoid unnecessary I/O and network cost were presented. The ETL tool on real queries and real big datasets were evaluated. Compared with Hive, the tool reduces time on average by 10% to 20%.http://www.telecomsci.com/zh/article/doi/10.3969/j.issn.1000-0801.2013.12.001/improved chain-MapReduceETLoptimization rule |
spellingShingle | Bin Wu Xinguang Liu A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework Dianxin kexue improved chain-MapReduce ETL optimization rule |
title | A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework |
title_full | A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework |
title_fullStr | A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework |
title_full_unstemmed | A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework |
title_short | A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework |
title_sort | parallel etl tool based on an improved chain mapreduce framework |
topic | improved chain-MapReduce ETL optimization rule |
url | http://www.telecomsci.com/zh/article/doi/10.3969/j.issn.1000-0801.2013.12.001/ |
work_keys_str_mv | AT binwu aparalleletltoolbasedonanimprovedchainmapreduceframework AT xinguangliu aparalleletltoolbasedonanimprovedchainmapreduceframework AT binwu paralleletltoolbasedonanimprovedchainmapreduceframework AT xinguangliu paralleletltoolbasedonanimprovedchainmapreduceframework |