Machine learning reveals CAT gene as a novel potential diagnostic and prognostic biomarker in non-small cell lung cancer

Abstract Background Non-small cell lung cancer (NSCLC) represents one of the most prevalent forms of lung cancer, with a five-year survival rate of 21.7%. There is an urgent need to identify pertinent biomarkers to inform the diagnosis and prognosis of tumors, particularly those that can be applied...

Full description

Saved in:
Bibliographic Details
Main Authors: Yi Tian, Wen-ya Zhao, Yi-ru Liu, Wen-wen Song, Qiao-xin Lin, Yan-na Gong, Yi-ting Deng, Dian-na Gu, Ling Tian
Format: Article
Language:English
Published: Springer 2024-12-01
Series:Discover Oncology
Subjects:
Online Access:https://doi.org/10.1007/s12672-024-01670-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846112420409376768
author Yi Tian
Wen-ya Zhao
Yi-ru Liu
Wen-wen Song
Qiao-xin Lin
Yan-na Gong
Yi-ting Deng
Dian-na Gu
Ling Tian
author_facet Yi Tian
Wen-ya Zhao
Yi-ru Liu
Wen-wen Song
Qiao-xin Lin
Yan-na Gong
Yi-ting Deng
Dian-na Gu
Ling Tian
author_sort Yi Tian
collection DOAJ
description Abstract Background Non-small cell lung cancer (NSCLC) represents one of the most prevalent forms of lung cancer, with a five-year survival rate of 21.7%. There is an urgent need to identify pertinent biomarkers to inform the diagnosis and prognosis of tumors, particularly those that can be applied to different age groups. Herein, we would apply machine learning methods to specifically analyze the issue of biomarker applicability across different age groups in NSCLC. Methods Studies have shown a higher incidence of NSCLC in people over 40 years of age, and due to the limitations of data set, studies of individuals under 40 years of age were not included in this study. To simulate the human aging model as closely as possible, we gathered corresponding non-small cell lung cancer (NSCLC) samples from the UCSC Xena database based on patient age information. These samples were then categorized into three groups: 40–60, 60–80, and over 80 years old. Subsequently, we employed four machine learning methods—Random Forest, LASSO regression analysis, XGBoost, and GBM—to identify gene sets with significant diagnostic value for each age group. By taking the intersection of these sets, we identified the optimal gene and assessed its prognostic significance in NSCLC. Then, the diagnostic value of CAT gene was validated using global public databases, including the GSE32863, GSE43458, GSE68571, GSE10072, and GSE63459 datasets from the Americas, the GSE30219 and GSE102511 datasets from Europe, and the GSE31210 and GSE19804 datasets from Asia. Furthermore, immunohistochemical staining was performed in an independent cohort from a tissue microarray. Additionally, cell culture and RT-qPCR were employed for external validation. Results Through the implementation of machine learning methods, we successfully identified the catalase (CAT) gene. Our analysis revealed that individuals with high expression of the CAT gene experienced improved survival rates. Additionally, these individuals exhibited elevated immune scores. We further discovered that the CAT gene synergizes with multiple components of neutrophils, including TLRs, FcRn, and the selective GEF of Rho-family GTPases. In addition, we identified a potential immune checkpoint, TNFSF15, which is applicable to the human aging model. Finally, we validated the CAT gene's diagnostic value using databases encompassing the Americas, Europe, and Asia regions. Through external RT-qPCR validation, we verified that CAT expression in BEAS-2B was higher than that of A549. In an independent human cohort, we also verified that CAT is lowly expressed in lung cancer tissues. In addition, higher CAT levels were associated with improved survival in the 40–60 and 60–80 age groups. Conclusions In our analysis of the NSCLC database, we pinpointed the CAT gene, which holds promise for potential diagnostic and prognostic applications in the context of human aging. Furthermore, it may offer insights into addressing age-related heterogeneity of NSCLC.
format Article
id doaj-art-7a85ac9969fd452fbd7c7d96d8b7921c
institution Kabale University
issn 2730-6011
language English
publishDate 2024-12-01
publisher Springer
record_format Article
series Discover Oncology
spelling doaj-art-7a85ac9969fd452fbd7c7d96d8b7921c2024-12-22T12:35:22ZengSpringerDiscover Oncology2730-60112024-12-0115111810.1007/s12672-024-01670-1Machine learning reveals CAT gene as a novel potential diagnostic and prognostic biomarker in non-small cell lung cancerYi Tian0Wen-ya Zhao1Yi-ru Liu2Wen-wen Song3Qiao-xin Lin4Yan-na Gong5Yi-ting Deng6Dian-na Gu7Ling Tian8Department of Central Laboratory, Shanghai Chest Hospital, Shanghai Jiao Tong University School of MedicineDepartment of Central Laboratory, Shanghai Chest Hospital, Shanghai Jiao Tong University School of MedicineDepartment of Medical Oncology, The First Affiliated Hospital of Wenzhou Medical UniversityDepartment of Medical Oncology, The First Affiliated Hospital of Wenzhou Medical UniversityDepartment of Medical Oncology, The First Affiliated Hospital of Wenzhou Medical UniversityDepartment of Medical Oncology, The First Affiliated Hospital of Wenzhou Medical UniversityDepartment of Medical Oncology, The First Affiliated Hospital of Wenzhou Medical UniversityDepartment of Medical Oncology, The First Affiliated Hospital of Wenzhou Medical UniversityDepartment of Central Laboratory, Shanghai Chest Hospital, Shanghai Jiao Tong University School of MedicineAbstract Background Non-small cell lung cancer (NSCLC) represents one of the most prevalent forms of lung cancer, with a five-year survival rate of 21.7%. There is an urgent need to identify pertinent biomarkers to inform the diagnosis and prognosis of tumors, particularly those that can be applied to different age groups. Herein, we would apply machine learning methods to specifically analyze the issue of biomarker applicability across different age groups in NSCLC. Methods Studies have shown a higher incidence of NSCLC in people over 40 years of age, and due to the limitations of data set, studies of individuals under 40 years of age were not included in this study. To simulate the human aging model as closely as possible, we gathered corresponding non-small cell lung cancer (NSCLC) samples from the UCSC Xena database based on patient age information. These samples were then categorized into three groups: 40–60, 60–80, and over 80 years old. Subsequently, we employed four machine learning methods—Random Forest, LASSO regression analysis, XGBoost, and GBM—to identify gene sets with significant diagnostic value for each age group. By taking the intersection of these sets, we identified the optimal gene and assessed its prognostic significance in NSCLC. Then, the diagnostic value of CAT gene was validated using global public databases, including the GSE32863, GSE43458, GSE68571, GSE10072, and GSE63459 datasets from the Americas, the GSE30219 and GSE102511 datasets from Europe, and the GSE31210 and GSE19804 datasets from Asia. Furthermore, immunohistochemical staining was performed in an independent cohort from a tissue microarray. Additionally, cell culture and RT-qPCR were employed for external validation. Results Through the implementation of machine learning methods, we successfully identified the catalase (CAT) gene. Our analysis revealed that individuals with high expression of the CAT gene experienced improved survival rates. Additionally, these individuals exhibited elevated immune scores. We further discovered that the CAT gene synergizes with multiple components of neutrophils, including TLRs, FcRn, and the selective GEF of Rho-family GTPases. In addition, we identified a potential immune checkpoint, TNFSF15, which is applicable to the human aging model. Finally, we validated the CAT gene's diagnostic value using databases encompassing the Americas, Europe, and Asia regions. Through external RT-qPCR validation, we verified that CAT expression in BEAS-2B was higher than that of A549. In an independent human cohort, we also verified that CAT is lowly expressed in lung cancer tissues. In addition, higher CAT levels were associated with improved survival in the 40–60 and 60–80 age groups. Conclusions In our analysis of the NSCLC database, we pinpointed the CAT gene, which holds promise for potential diagnostic and prognostic applications in the context of human aging. Furthermore, it may offer insights into addressing age-related heterogeneity of NSCLC.https://doi.org/10.1007/s12672-024-01670-1Machine learningCatalase geneBiomarkerAging-related genesAge heterogeneityNon-small cell lung cancer
spellingShingle Yi Tian
Wen-ya Zhao
Yi-ru Liu
Wen-wen Song
Qiao-xin Lin
Yan-na Gong
Yi-ting Deng
Dian-na Gu
Ling Tian
Machine learning reveals CAT gene as a novel potential diagnostic and prognostic biomarker in non-small cell lung cancer
Discover Oncology
Machine learning
Catalase gene
Biomarker
Aging-related genes
Age heterogeneity
Non-small cell lung cancer
title Machine learning reveals CAT gene as a novel potential diagnostic and prognostic biomarker in non-small cell lung cancer
title_full Machine learning reveals CAT gene as a novel potential diagnostic and prognostic biomarker in non-small cell lung cancer
title_fullStr Machine learning reveals CAT gene as a novel potential diagnostic and prognostic biomarker in non-small cell lung cancer
title_full_unstemmed Machine learning reveals CAT gene as a novel potential diagnostic and prognostic biomarker in non-small cell lung cancer
title_short Machine learning reveals CAT gene as a novel potential diagnostic and prognostic biomarker in non-small cell lung cancer
title_sort machine learning reveals cat gene as a novel potential diagnostic and prognostic biomarker in non small cell lung cancer
topic Machine learning
Catalase gene
Biomarker
Aging-related genes
Age heterogeneity
Non-small cell lung cancer
url https://doi.org/10.1007/s12672-024-01670-1
work_keys_str_mv AT yitian machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer
AT wenyazhao machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer
AT yiruliu machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer
AT wenwensong machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer
AT qiaoxinlin machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer
AT yannagong machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer
AT yitingdeng machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer
AT diannagu machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer
AT lingtian machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer