RiskTree: Decision trees for asset and process risk assessment quantification in big data platforms

Currently, big data platforms are widely applied across various industries. These platforms are characterized by large scale, diverse forms, high update frequency, and rapid data flow, making it challenging to directly apply existing risk quantification methods to them. Additionally, the composition...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhan Haomou, Yang Jiawei, Guo Zhenyang, Cao Jin, Zhang Dong, Zhao Xingwen, You Wei, Li Hui
Format: Article
Language:English
Published: EDP Sciences 2024-01-01
Series:Security and Safety
Subjects:
Online Access:https://sands.edpsciences.org/articles/sands/full_html/2024/01/sands20240012/sands20240012.html
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841554682372161536
author Zhan Haomou
Yang Jiawei
Guo Zhenyang
Cao Jin
Zhang Dong
Zhao Xingwen
You Wei
Li Hui
author_facet Zhan Haomou
Yang Jiawei
Guo Zhenyang
Cao Jin
Zhang Dong
Zhao Xingwen
You Wei
Li Hui
author_sort Zhan Haomou
collection DOAJ
description Currently, big data platforms are widely applied across various industries. These platforms are characterized by large scale, diverse forms, high update frequency, and rapid data flow, making it challenging to directly apply existing risk quantification methods to them. Additionally, the composition of big data platforms varies among enterprises due to factors such as industry, economic capability, and technical proficiency. To address this, we first developed a risk quantification assessment process tailored to different types of big data platforms, taking into account relevant laws, regulations, and standards. Subsequently, we developed RiskTree, a risk quantification system for big data platforms, which supports automated detection of configuration files, traffic, and vulnerabilities. For situations where automated detection is not feasible or permitted, we provide a customized questionnaire system to collect assets and data processing procedures. We utilize a knowledge graph (KG) to integrate and analyze the collected data. Finally, we apply a random forest algorithm to compute risk index weights, risk values, and risk levels, enabling the quantification of risks on big data platforms. To validate the proposed process, we conducted experiments on an educational big data platform. The results demonstrate that the risk index system presented in this paper objectively and comprehensively reflects the risks faced by big data platforms. Furthermore, the proposed risk assessment process not only effectively identifies and quantifies risks but also provides highly interpretable evaluation results.
format Article
id doaj-art-0f0c1d8900a74b43bd4d09e572d4af65
institution Kabale University
issn 2826-1275
language English
publishDate 2024-01-01
publisher EDP Sciences
record_format Article
series Security and Safety
spelling doaj-art-0f0c1d8900a74b43bd4d09e572d4af652025-01-08T11:21:32ZengEDP SciencesSecurity and Safety2826-12752024-01-013202400910.1051/sands/2024009sands20240012RiskTree: Decision trees for asset and process risk assessment quantification in big data platformsZhan Haomou0https://orcid.org/0009-0002-6680-7694Yang Jiawei1Guo Zhenyang2https://orcid.org/0000-0003-3519-2674Cao Jin3https://orcid.org/0000-0003-1372-7252Zhang Dong4Zhao Xingwen5You Wei6Li Hui7https://orcid.org/0000-0001-8310-7169School of Cyber Engineering, Xidian UniversitySchool of Cyber Engineering, Xidian UniversitySchool of Cyber Engineering, Xidian UniversitySchool of Cyber Engineering, Xidian UniversityNational Computer Network Emergency Response Technical Team/Coordination Center of China (CNCERT/CC)School of Cyber Engineering, Xidian UniversitySchool of Cyber Engineering, Xidian UniversitySchool of Cyber Engineering, Xidian UniversityCurrently, big data platforms are widely applied across various industries. These platforms are characterized by large scale, diverse forms, high update frequency, and rapid data flow, making it challenging to directly apply existing risk quantification methods to them. Additionally, the composition of big data platforms varies among enterprises due to factors such as industry, economic capability, and technical proficiency. To address this, we first developed a risk quantification assessment process tailored to different types of big data platforms, taking into account relevant laws, regulations, and standards. Subsequently, we developed RiskTree, a risk quantification system for big data platforms, which supports automated detection of configuration files, traffic, and vulnerabilities. For situations where automated detection is not feasible or permitted, we provide a customized questionnaire system to collect assets and data processing procedures. We utilize a knowledge graph (KG) to integrate and analyze the collected data. Finally, we apply a random forest algorithm to compute risk index weights, risk values, and risk levels, enabling the quantification of risks on big data platforms. To validate the proposed process, we conducted experiments on an educational big data platform. The results demonstrate that the risk index system presented in this paper objectively and comprehensively reflects the risks faced by big data platforms. Furthermore, the proposed risk assessment process not only effectively identifies and quantifies risks but also provides highly interpretable evaluation results.https://sands.edpsciences.org/articles/sands/full_html/2024/01/sands20240012/sands20240012.htmlbig data platformquantitative risk assessmentmachine learningbig data platformquantitative risk assessmentmachine learning
spellingShingle Zhan Haomou
Yang Jiawei
Guo Zhenyang
Cao Jin
Zhang Dong
Zhao Xingwen
You Wei
Li Hui
RiskTree: Decision trees for asset and process risk assessment quantification in big data platforms
Security and Safety
big data platform
quantitative risk assessment
machine learning
big data platform
quantitative risk assessment
machine learning
title RiskTree: Decision trees for asset and process risk assessment quantification in big data platforms
title_full RiskTree: Decision trees for asset and process risk assessment quantification in big data platforms
title_fullStr RiskTree: Decision trees for asset and process risk assessment quantification in big data platforms
title_full_unstemmed RiskTree: Decision trees for asset and process risk assessment quantification in big data platforms
title_short RiskTree: Decision trees for asset and process risk assessment quantification in big data platforms
title_sort risktree decision trees for asset and process risk assessment quantification in big data platforms
topic big data platform
quantitative risk assessment
machine learning
big data platform
quantitative risk assessment
machine learning
url https://sands.edpsciences.org/articles/sands/full_html/2024/01/sands20240012/sands20240012.html
work_keys_str_mv AT zhanhaomou risktreedecisiontreesforassetandprocessriskassessmentquantificationinbigdataplatforms
AT yangjiawei risktreedecisiontreesforassetandprocessriskassessmentquantificationinbigdataplatforms
AT guozhenyang risktreedecisiontreesforassetandprocessriskassessmentquantificationinbigdataplatforms
AT caojin risktreedecisiontreesforassetandprocessriskassessmentquantificationinbigdataplatforms
AT zhangdong risktreedecisiontreesforassetandprocessriskassessmentquantificationinbigdataplatforms
AT zhaoxingwen risktreedecisiontreesforassetandprocessriskassessmentquantificationinbigdataplatforms
AT youwei risktreedecisiontreesforassetandprocessriskassessmentquantificationinbigdataplatforms
AT lihui risktreedecisiontreesforassetandprocessriskassessmentquantificationinbigdataplatforms