Discovering latent themes in aviation safety reports using text mining and network analytics

Aviation accidents, referring to unexpected and undesirable events involving aircraft, often cause great damage to property and human life. Learning from historical accidents is pivotal for improving safety in aviation. However, aviation accidents are typically documented and stored as unstructured...

Full description

Saved in:
Bibliographic Details
Main Authors: Yingying Xing, Yutong Wu, Shiwen Zhang, Ling Wang, Haoyuan Cui, Bo Jia, Hongwei Wang
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2024-12-01
Series:International Journal of Transportation Science and Technology
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2046043024000297
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846100634917404672
author Yingying Xing
Yutong Wu
Shiwen Zhang
Ling Wang
Haoyuan Cui
Bo Jia
Hongwei Wang
author_facet Yingying Xing
Yutong Wu
Shiwen Zhang
Ling Wang
Haoyuan Cui
Bo Jia
Hongwei Wang
author_sort Yingying Xing
collection DOAJ
description Aviation accidents, referring to unexpected and undesirable events involving aircraft, often cause great damage to property and human life. Learning from historical accidents is pivotal for improving safety in aviation. However, aviation accidents are typically documented and stored as unstructured or semi-structured free-text, rendering the ability to analyze such data a difficult task. This study presents a novel framework that combines text mining and network analytics techniques to provide the ability to analyze aviation accident reports automatically. The framework comprises a four-step modelling approach to: (1) the transformation of unstructured aviation safety report texts into structured numeric matrices using the TF-IDF matrix; (2) the identification of aviation accident topics using a structural topic model (STM); (3) the production of a word co-occurrence network (WCN) to determine the interrelations between aviation safety risk factors; and (4) quantitative analysis by technology of keywords to pinpoint key causal factors in aviation safety events. The proposed framework is validated by analyzing aviation accident reports collected by the National Transportation Safety Board (NTSB). The results indicate that STM provides a more granular partitioning of topics and better distinguishes between similar events compared to traditional latent dirichlet allocation (LDA). Among the identified topics, “Fuel and Power” and “En-route Phase” have the highest occurrence rate according to STM. Additionally, “Aircraft Crash” is the most prevalent topic in aviation accidents that resulted in fatal injuries, whereas the “Landing phase” is the most prevalent topic in non-fatal injuries on accidents. Based on the WCN, three centrality measures highlight “inspection of equipment” and “take off” as the most important risk factors in aviation safety. The proposed framework provides a comprehensive solution for in-depth analysis of aviation safety reports, offering decision support for aviation safety management and accident prevention, thereby reducing risks and strengthening safety measures.
format Article
id doaj-art-b508b9cc97a44fdbad9762b09ecfc5c8
institution Kabale University
issn 2046-0430
language English
publishDate 2024-12-01
publisher KeAi Communications Co., Ltd.
record_format Article
series International Journal of Transportation Science and Technology
spelling doaj-art-b508b9cc97a44fdbad9762b09ecfc5c82024-12-30T04:15:39ZengKeAi Communications Co., Ltd.International Journal of Transportation Science and Technology2046-04302024-12-0116292316Discovering latent themes in aviation safety reports using text mining and network analyticsYingying Xing0Yutong Wu1Shiwen Zhang2Ling Wang3Haoyuan Cui4Bo Jia5Hongwei Wang6The Key Laboratory of Road and Traffic Engineering of Ministry of Education, Tongji University, Shanghai, 201804, ChinaThe Key Laboratory of Road and Traffic Engineering of Ministry of Education, Tongji University, Shanghai, 201804, ChinaInstitute of Safety Operation Research Institute, China Eastern Technology Application R&D Center, Shanghai 201707, ChinaThe Key Laboratory of Road and Traffic Engineering of Ministry of Education, Tongji University, Shanghai, 201804, China; Corresponding author.The Key Laboratory of Road and Traffic Engineering of Ministry of Education, Tongji University, Shanghai, 201804, ChinaInstitute of Safety Operation Research Institute, China Eastern Technology Application R&D Center, Shanghai 201707, ChinaInstitute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Singapore 138632, SingaporeAviation accidents, referring to unexpected and undesirable events involving aircraft, often cause great damage to property and human life. Learning from historical accidents is pivotal for improving safety in aviation. However, aviation accidents are typically documented and stored as unstructured or semi-structured free-text, rendering the ability to analyze such data a difficult task. This study presents a novel framework that combines text mining and network analytics techniques to provide the ability to analyze aviation accident reports automatically. The framework comprises a four-step modelling approach to: (1) the transformation of unstructured aviation safety report texts into structured numeric matrices using the TF-IDF matrix; (2) the identification of aviation accident topics using a structural topic model (STM); (3) the production of a word co-occurrence network (WCN) to determine the interrelations between aviation safety risk factors; and (4) quantitative analysis by technology of keywords to pinpoint key causal factors in aviation safety events. The proposed framework is validated by analyzing aviation accident reports collected by the National Transportation Safety Board (NTSB). The results indicate that STM provides a more granular partitioning of topics and better distinguishes between similar events compared to traditional latent dirichlet allocation (LDA). Among the identified topics, “Fuel and Power” and “En-route Phase” have the highest occurrence rate according to STM. Additionally, “Aircraft Crash” is the most prevalent topic in aviation accidents that resulted in fatal injuries, whereas the “Landing phase” is the most prevalent topic in non-fatal injuries on accidents. Based on the WCN, three centrality measures highlight “inspection of equipment” and “take off” as the most important risk factors in aviation safety. The proposed framework provides a comprehensive solution for in-depth analysis of aviation safety reports, offering decision support for aviation safety management and accident prevention, thereby reducing risks and strengthening safety measures.http://www.sciencedirect.com/science/article/pii/S2046043024000297Aviation safetyAviation accident reportText miningTopic modelingNetwork analysis
spellingShingle Yingying Xing
Yutong Wu
Shiwen Zhang
Ling Wang
Haoyuan Cui
Bo Jia
Hongwei Wang
Discovering latent themes in aviation safety reports using text mining and network analytics
International Journal of Transportation Science and Technology
Aviation safety
Aviation accident report
Text mining
Topic modeling
Network analysis
title Discovering latent themes in aviation safety reports using text mining and network analytics
title_full Discovering latent themes in aviation safety reports using text mining and network analytics
title_fullStr Discovering latent themes in aviation safety reports using text mining and network analytics
title_full_unstemmed Discovering latent themes in aviation safety reports using text mining and network analytics
title_short Discovering latent themes in aviation safety reports using text mining and network analytics
title_sort discovering latent themes in aviation safety reports using text mining and network analytics
topic Aviation safety
Aviation accident report
Text mining
Topic modeling
Network analysis
url http://www.sciencedirect.com/science/article/pii/S2046043024000297
work_keys_str_mv AT yingyingxing discoveringlatentthemesinaviationsafetyreportsusingtextminingandnetworkanalytics
AT yutongwu discoveringlatentthemesinaviationsafetyreportsusingtextminingandnetworkanalytics
AT shiwenzhang discoveringlatentthemesinaviationsafetyreportsusingtextminingandnetworkanalytics
AT lingwang discoveringlatentthemesinaviationsafetyreportsusingtextminingandnetworkanalytics
AT haoyuancui discoveringlatentthemesinaviationsafetyreportsusingtextminingandnetworkanalytics
AT bojia discoveringlatentthemesinaviationsafetyreportsusingtextminingandnetworkanalytics
AT hongweiwang discoveringlatentthemesinaviationsafetyreportsusingtextminingandnetworkanalytics