A Distributed Data-Crawling Technology for Microblog API
As more and more users begin to use microblog,people eagerly want to dig interesting patterns from the microblog data.How to efficiently collect data from the service provider is one of the main challenges.To address this issue,a distributed crawling solution based on microblog API was present.The d...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Beijing Xintong Media Co., Ltd
2013-08-01
|
Series: | Dianxin kexue |
Subjects: | |
Online Access: | http://www.telecomsci.com/zh/article/doi/10.3969/j.issn.1000-0801.2013.08.025/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841529303869685760 |
---|---|
author | Shunhua Chen Xiaotong Wang Zhifeng Hao Ruichu Cai Xiaojun Xiao Yu Lu |
author_facet | Shunhua Chen Xiaotong Wang Zhifeng Hao Ruichu Cai Xiaojun Xiao Yu Lu |
author_sort | Shunhua Chen |
collection | DOAJ |
description | As more and more users begin to use microblog,people eagerly want to dig interesting patterns from the microblog data.How to efficiently collect data from the service provider is one of the main challenges.To address this issue,a distributed crawling solution based on microblog API was present.The distributed crawling solution simulates microblog login,automatically gets authorized,and control the invoked frequency of the API with a task controller.A time trigger method with memory database was also proposed to avoid extra trivial data duplication and improve efficiency of the system.In the distributed framework,the crawling tasks can be assigned to distributed clients independently,which ensures the high scalability and flexibility of the crawling procedure.The feasibility of the crawler technology according to Sina microblog instance was verified. |
format | Article |
id | doaj-art-fc8057ec4e6c4a90a760855fa878e4ff |
institution | Kabale University |
issn | 1000-0801 |
language | zho |
publishDate | 2013-08-01 |
publisher | Beijing Xintong Media Co., Ltd |
record_format | Article |
series | Dianxin kexue |
spelling | doaj-art-fc8057ec4e6c4a90a760855fa878e4ff2025-01-15T03:22:16ZzhoBeijing Xintong Media Co., LtdDianxin kexue1000-08012013-08-012914615059627761A Distributed Data-Crawling Technology for Microblog APIShunhua ChenXiaotong WangZhifeng HaoRuichu CaiXiaojun XiaoYu LuAs more and more users begin to use microblog,people eagerly want to dig interesting patterns from the microblog data.How to efficiently collect data from the service provider is one of the main challenges.To address this issue,a distributed crawling solution based on microblog API was present.The distributed crawling solution simulates microblog login,automatically gets authorized,and control the invoked frequency of the API with a task controller.A time trigger method with memory database was also proposed to avoid extra trivial data duplication and improve efficiency of the system.In the distributed framework,the crawling tasks can be assigned to distributed clients independently,which ensures the high scalability and flexibility of the crawling procedure.The feasibility of the crawler technology according to Sina microblog instance was verified.http://www.telecomsci.com/zh/article/doi/10.3969/j.issn.1000-0801.2013.08.025/Sina microblogcrawling strategydistributed crawlmicroblog API |
spellingShingle | Shunhua Chen Xiaotong Wang Zhifeng Hao Ruichu Cai Xiaojun Xiao Yu Lu A Distributed Data-Crawling Technology for Microblog API Dianxin kexue Sina microblog crawling strategy distributed crawl microblog API |
title | A Distributed Data-Crawling Technology for Microblog API |
title_full | A Distributed Data-Crawling Technology for Microblog API |
title_fullStr | A Distributed Data-Crawling Technology for Microblog API |
title_full_unstemmed | A Distributed Data-Crawling Technology for Microblog API |
title_short | A Distributed Data-Crawling Technology for Microblog API |
title_sort | distributed data crawling technology for microblog api |
topic | Sina microblog crawling strategy distributed crawl microblog API |
url | http://www.telecomsci.com/zh/article/doi/10.3969/j.issn.1000-0801.2013.08.025/ |
work_keys_str_mv | AT shunhuachen adistributeddatacrawlingtechnologyformicroblogapi AT xiaotongwang adistributeddatacrawlingtechnologyformicroblogapi AT zhifenghao adistributeddatacrawlingtechnologyformicroblogapi AT ruichucai adistributeddatacrawlingtechnologyformicroblogapi AT xiaojunxiao adistributeddatacrawlingtechnologyformicroblogapi AT yulu adistributeddatacrawlingtechnologyformicroblogapi AT shunhuachen distributeddatacrawlingtechnologyformicroblogapi AT xiaotongwang distributeddatacrawlingtechnologyformicroblogapi AT zhifenghao distributeddatacrawlingtechnologyformicroblogapi AT ruichucai distributeddatacrawlingtechnologyformicroblogapi AT xiaojunxiao distributeddatacrawlingtechnologyformicroblogapi AT yulu distributeddatacrawlingtechnologyformicroblogapi |