A Distributed Data-Crawling Technology for Microblog API

As more and more users begin to use microblog,people eagerly want to dig interesting patterns from the microblog data.How to efficiently collect data from the service provider is one of the main challenges.To address this issue,a distributed crawling solution based on microblog API was present.The d...

Full description

Saved in:
Bibliographic Details
Main Authors: Shunhua Chen, Xiaotong Wang, Zhifeng Hao, Ruichu Cai, Xiaojun Xiao, Yu Lu
Format: Article
Language:zho
Published: Beijing Xintong Media Co., Ltd 2013-08-01
Series:Dianxin kexue
Subjects:
Online Access:http://www.telecomsci.com/zh/article/doi/10.3969/j.issn.1000-0801.2013.08.025/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841529303869685760
author Shunhua Chen
Xiaotong Wang
Zhifeng Hao
Ruichu Cai
Xiaojun Xiao
Yu Lu
author_facet Shunhua Chen
Xiaotong Wang
Zhifeng Hao
Ruichu Cai
Xiaojun Xiao
Yu Lu
author_sort Shunhua Chen
collection DOAJ
description As more and more users begin to use microblog,people eagerly want to dig interesting patterns from the microblog data.How to efficiently collect data from the service provider is one of the main challenges.To address this issue,a distributed crawling solution based on microblog API was present.The distributed crawling solution simulates microblog login,automatically gets authorized,and control the invoked frequency of the API with a task controller.A time trigger method with memory database was also proposed to avoid extra trivial data duplication and improve efficiency of the system.In the distributed framework,the crawling tasks can be assigned to distributed clients independently,which ensures the high scalability and flexibility of the crawling procedure.The feasibility of the crawler technology according to Sina microblog instance was verified.
format Article
id doaj-art-fc8057ec4e6c4a90a760855fa878e4ff
institution Kabale University
issn 1000-0801
language zho
publishDate 2013-08-01
publisher Beijing Xintong Media Co., Ltd
record_format Article
series Dianxin kexue
spelling doaj-art-fc8057ec4e6c4a90a760855fa878e4ff2025-01-15T03:22:16ZzhoBeijing Xintong Media Co., LtdDianxin kexue1000-08012013-08-012914615059627761A Distributed Data-Crawling Technology for Microblog APIShunhua ChenXiaotong WangZhifeng HaoRuichu CaiXiaojun XiaoYu LuAs more and more users begin to use microblog,people eagerly want to dig interesting patterns from the microblog data.How to efficiently collect data from the service provider is one of the main challenges.To address this issue,a distributed crawling solution based on microblog API was present.The distributed crawling solution simulates microblog login,automatically gets authorized,and control the invoked frequency of the API with a task controller.A time trigger method with memory database was also proposed to avoid extra trivial data duplication and improve efficiency of the system.In the distributed framework,the crawling tasks can be assigned to distributed clients independently,which ensures the high scalability and flexibility of the crawling procedure.The feasibility of the crawler technology according to Sina microblog instance was verified.http://www.telecomsci.com/zh/article/doi/10.3969/j.issn.1000-0801.2013.08.025/Sina microblogcrawling strategydistributed crawlmicroblog API
spellingShingle Shunhua Chen
Xiaotong Wang
Zhifeng Hao
Ruichu Cai
Xiaojun Xiao
Yu Lu
A Distributed Data-Crawling Technology for Microblog API
Dianxin kexue
Sina microblog
crawling strategy
distributed crawl
microblog API
title A Distributed Data-Crawling Technology for Microblog API
title_full A Distributed Data-Crawling Technology for Microblog API
title_fullStr A Distributed Data-Crawling Technology for Microblog API
title_full_unstemmed A Distributed Data-Crawling Technology for Microblog API
title_short A Distributed Data-Crawling Technology for Microblog API
title_sort distributed data crawling technology for microblog api
topic Sina microblog
crawling strategy
distributed crawl
microblog API
url http://www.telecomsci.com/zh/article/doi/10.3969/j.issn.1000-0801.2013.08.025/
work_keys_str_mv AT shunhuachen adistributeddatacrawlingtechnologyformicroblogapi
AT xiaotongwang adistributeddatacrawlingtechnologyformicroblogapi
AT zhifenghao adistributeddatacrawlingtechnologyformicroblogapi
AT ruichucai adistributeddatacrawlingtechnologyformicroblogapi
AT xiaojunxiao adistributeddatacrawlingtechnologyformicroblogapi
AT yulu adistributeddatacrawlingtechnologyformicroblogapi
AT shunhuachen distributeddatacrawlingtechnologyformicroblogapi
AT xiaotongwang distributeddatacrawlingtechnologyformicroblogapi
AT zhifenghao distributeddatacrawlingtechnologyformicroblogapi
AT ruichucai distributeddatacrawlingtechnologyformicroblogapi
AT xiaojunxiao distributeddatacrawlingtechnologyformicroblogapi
AT yulu distributeddatacrawlingtechnologyformicroblogapi