Application of random forest in big data completion

Telecom operators have a lot of data, but in view of a variety of reasons, the quality of the data is not ideal, there are a lot of data is not complete or even missing. For existing data mining, it is necessary to carry out the data to meet the quality of the data and to achieve sufficient sampling...

Full description

Saved in:
Bibliographic Details
Main Authors: Zheng WANG, Hua REN, Yanping FANG
Format: Article
Language:zho
Published: Beijing Xintong Media Co., Ltd 2016-12-01
Series:Dianxin kexue
Subjects:
Online Access:http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2016317/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Telecom operators have a lot of data, but in view of a variety of reasons, the quality of the data is not ideal, there are a lot of data is not complete or even missing. For existing data mining, it is necessary to carry out the data to meet the quality of the data and to achieve sufficient sampling proportion. Relying on the country's existing log retention system, template library design data integrity, authentication could not meet the quality requirements of the data, using the random forest algorithm, the same data with or related data was found, data was completed and data quality was improved, and the template library was extended by optimization of feedback. The construction of completion data subsystem in the system log retained end-to-end data quality guaranteed and improved quality, completed and improved the real-time data and historical data, and ultimately met the requirements of data processing and mining operators, improved data quality and value.
ISSN:1000-0801