A Distributed Data-Crawling Technology for Microblog API

As more and more users begin to use microblog,people eagerly want to dig interesting patterns from the microblog data.How to efficiently collect data from the service provider is one of the main challenges.To address this issue,a distributed crawling solution based on microblog API was present.The d...

Full description

Saved in:
Bibliographic Details
Main Authors: Shunhua Chen, Xiaotong Wang, Zhifeng Hao, Ruichu Cai, Xiaojun Xiao, Yu Lu
Format: Article
Language:zho
Published: Beijing Xintong Media Co., Ltd 2013-08-01
Series:Dianxin kexue
Subjects:
Online Access:http://www.telecomsci.com/zh/article/doi/10.3969/j.issn.1000-0801.2013.08.025/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:As more and more users begin to use microblog,people eagerly want to dig interesting patterns from the microblog data.How to efficiently collect data from the service provider is one of the main challenges.To address this issue,a distributed crawling solution based on microblog API was present.The distributed crawling solution simulates microblog login,automatically gets authorized,and control the invoked frequency of the API with a task controller.A time trigger method with memory database was also proposed to avoid extra trivial data duplication and improve efficiency of the system.In the distributed framework,the crawling tasks can be assigned to distributed clients independently,which ensures the high scalability and flexibility of the crawling procedure.The feasibility of the crawler technology according to Sina microblog instance was verified.
ISSN:1000-0801