Research and design of distributed high-performance network reptiles based on cloud platform
With the arrival of large data age,data has become the most valuable resource.And web crawler technology as an important means of external data collection,has become a standard tool for data analysis.A high-performance,convenient cloud-based crawler architecture design was introduced.The overall str...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Beijing Xintong Media Co., Ltd
2017-08-01
|
Series: | Dianxin kexue |
Subjects: | |
Online Access: | http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2017234/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | With the arrival of large data age,data has become the most valuable resource.And web crawler technology as an important means of external data collection,has become a standard tool for data analysis.A high-performance,convenient cloud-based crawler architecture design was introduced.The overall structure of the reptile to the distributed design and the design of the sub-module was described in detail.Each module of the crawler was encapsulated in Docker,and Kubernetes was used as the resource scheduling and management of the cluster.In the performance of optimization,the MD5 reset tree algorithm,DNS optimization and asynchronous I/O were adopted.Experimental results show that the performance of crawler has obvious advantages compared with the UN optimized scheme. |
---|---|
ISSN: | 1000-0801 |