Information extraction from massive Web pages based on node property and text content
To address the problem of extracting valuable information from massive Web pages in big data environments,a novel information extraction method based on node property and text content for massive Web pages was put forward.Web pages were converted into a document object model (DOM) tree,and a pruning...
Saved in:
Main Authors: | Hai-yan WANG, Pan CAO |
---|---|
Format: | Article |
Language: | zho |
Published: |
Editorial Department of Journal on Communications
2016-10-01
|
Series: | Tongxin xuebao |
Subjects: | |
Online Access: | http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2016190/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Applying MapReduce frameworks to a virtualization platform for Deep Web data source discovery
by: XIN Jie, et al.
Published: (2011-01-01) -
A Large Scale Network Traffic Analysis System Design Based on the MapReduce Platform
by: Hong Tang
Published: (2013-12-01) -
Comparison of Open-Source Distributed Computing Framework for Big Data
by: Ai Fang, et al.
Published: (2015-07-01) -
A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework
by: Bin Wu, et al.
Published: (2013-12-01) -
Diffluent Internet Traffic and Characteristics Computation Based on Hadoop
by: Yong Liu, et al.
Published: (2014-12-01)