Information extraction from massive Web pages based on node property and text content

Information extraction from massive Web pages based on node property and text content

To address the problem of extracting valuable information from massive Web pages in big data environments,a novel information extraction method based on node property and text content for massive Web pages was put forward.Web pages were converted into a document object model (DOM) tree,and a pruning...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hai-yan WANG, Pan CAO
Format:	Article
Language:	zho
Published:	Editorial Department of Journal on Communications 2016-10-01
Series:	Tongxin xuebao
Subjects:	Web information extraction MapReduce DOM tree
Online Access:	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2016190/
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Applying MapReduce frameworks to a virtualization platform for Deep Web data source discovery
by: XIN Jie, et al.
Published: (2011-01-01)

A Large Scale Network Traffic Analysis System Design Based on the MapReduce Platform
by: Hong Tang
Published: (2013-12-01)

Comparison of Open-Source Distributed Computing Framework for Big Data
by: Ai Fang, et al.
Published: (2015-07-01)

A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework
by: Bin Wu, et al.
Published: (2013-12-01)

Diffluent Internet Traffic and Characteristics Computation Based on Hadoop
by: Yong Liu, et al.
Published: (2014-12-01)

CC-MRSJ:Cache Conscious Star Join Algorithm on Hadoop Platform
by: Guoliang Zhou, et al.
Published: (2013-10-01)

Temperature aware energy-efficient task scheduling strategies for mapreduce
by: Bin LIAO, et al.
Published: (2016-01-01)

A distributed high efficiency similarity matrix computation method based on users’ mobile network access location
by: Yuan WANG, et al.
Published: (2018-05-01)

Design and application research on data service platform for big data
by: Yun-feng LIU, et al.
Published: (2013-09-01)

Continuous Skyline Queries Based on MapReduce
by: Guanmin Shan, et al.
Published: (2014-05-01)

Evaluating MapReduce for seismic data processing using a practical application
by: Chang-hai ZHAO, et al.
Published: (2012-11-01)

Securely redundant scheduling policy for MapReduce based on dynamic domains partition
by: Qing-ni SHEN, et al.
Published: (2014-01-01)

Stochastic algorithm for HDFS data theft detection based on MapReduce
by: Yuanzhao GAO, et al.
Published: (2018-10-01)

An AkNN Algorithm for High-Dimensional Big Data
by: Zhongwei Wang, et al.
Published: (2015-07-01)

k-means clustering method preserving differential privacy in MapReduce framework
by: Hong-cheng LI, et al.
Published: (2016-02-01)

Stochastic gradient descent algorithm preserving differential privacy in MapReduce framework
by: Yihan YU, et al.
Published: (2018-01-01)

HATAY İLİ ANTAKYA İLÇESİNDE YAŞAYAN DOMLARIN MÜZİK KÜLTÜRLERİ
by: Timur Vural, et al.
Published: (2018-08-01)

Research on real-time fusion method of multi-source heterogeneous flight trajectory data stream
by: Zhuxi ZHANG, et al.
Published: (2020-09-01)

An Adaptive Subspace Similarity Search Approach
by: Jianxin Ren, et al.
Published: (2015-07-01)

Some features of alt texts associated with images in Web pages
by: Timothy C. Craven
Published: (2006-01-01)

Research and application of prediction model based on ensemble BP neural network
by: Huimin ZHAO, et al.
Published: (2016-02-01)

Mathematical Model and Algorithm for Accurate Main Content Extraction From News Websites
by: Hamza Salem, et al.
Published: (2025-01-01)

Domy zdrowotno-wypoczynkowe Stowarzyszenia Chrześcijańsko–Narodowego Nauczycielstwa Szkół Powszechnych w okresie międzywojennym
by: Joanna Zagdańska
Published: (2018-03-01)

Study of high-speed malicious Web page detection system based on two-step classifier
by: Zheng-qi WANG, et al.
Published: (2017-08-01)

Behandling af betinget dømte alkoholmisbrugere.
by: Hellmut Sørensen, et al.
Published: (1954-06-01)

Summary of Large-Scale Grapb Partitioning Algoritbms
by: Jinfeng Xu, et al.
Published: (2014-07-01)

A longitudinal study of Web pages continued: a consideration of document persistence. Web documents, Half-life, Linkrot, Persistence, Web citations
by: Wallace Koehler
Published: (2004-01-01)

A dynamic detection method based on Web crawler and page code behavior for XSS vulnerability
by: Yi LIU, et al.
Published: (2016-03-01)

Web Pages Ranking Algorithms: A Survey
by: Ayad Abdulrahman
Published: (2021-07-01)

Collaborative defending scheme against malicious Web pages based on social trust
by: Xin LIU, et al.
Published: (2012-12-01)

Exploring Cartographic Differences in Web Map Applications: Evaluating Design, Scale, and Usability
by: Jakub Zejdlik, et al.
Published: (2024-12-01)

Slow task scheduling algorithm based on node identification
by: Yun-fei CUI, et al.
Published: (2014-07-01)

Automatic extraction and geographization of urban traffic event based on natural Chinese text data
by: Chenyu Hu, et al.
Published: (2025-12-01)

Web information seeking by pages. World Wide Web, Information seeking, Personal development, Navigation
by: Jarkko Kari
Published: (2004-01-01)

Evolution of the Web: from Web 1.0 to 4.0
by: Asaad Khaleel Ibrahim
Published: (2021-06-01)

Digital libraries and World Wide Web sites and page persistence.
by: Wallace Koehler
Published: (1999-01-01)

What is the title of a Web page? A study of Webography practice
by: Timothy C. Craven
Published: (2002-01-01)

Big Data Analytics for Healthcare Industry: Impact, Applications, and Tools
by: Sunil Kumar, et al.
Published: (2019-03-01)

Intelligent and Adaptive Web Data Extraction System Using Convolutional and Long Short-Term Memory Deep Learning Networks
by: Sudhir Kumar Patnaik, et al.
Published: (2021-12-01)

Dom jako miejsce kształtowania świadomości ekologicznej Polaków
by: Monika Podkowińska, et al.
Published: (2024-12-01)