Research on a real-time receiving scheme of streaming data

Discussing the common scenarios in modern data warehouse systems that need to receive a large amount of streaming data, connect it with the existing data on the disk, and then store it in the warehouse.By rationally setting disk paging and applying cache modules to disperse the disk I/O pressure, a...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaoyan ZHANG, Zhihao LIU, Xiaofeng DU, Tianbo LU
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2022-04-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2022080/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Discussing the common scenarios in modern data warehouse systems that need to receive a large amount of streaming data, connect it with the existing data on the disk, and then store it in the warehouse.By rationally setting disk paging and applying cache modules to disperse the disk I/O pressure, a more efficient data receiving scheme was proposed based on the existing research, and a consistent Hash function was introduced and extended to distributed environment and a D-CACHEJOIN algorithm applied to distributed environment was proposed.The cost model of the algorithm was calculated by theory and simulation experiment was performed using data that obey the Zipfian distribution.The experiment results show that the proposed algorithm has higher efficiency than existing algorithms in practical application scenarios close to reality, and can be quickly and easily extended to distributed environments.
ISSN:1000-436X