Machine learning reveals CAT gene as a novel potential diagnostic and prognostic biomarker in non-small cell lung cancer
Abstract Background Non-small cell lung cancer (NSCLC) represents one of the most prevalent forms of lung cancer, with a five-year survival rate of 21.7%. There is an urgent need to identify pertinent biomarkers to inform the diagnosis and prognosis of tumors, particularly those that can be applied...
Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2024-12-01
|
| Series: | Discover Oncology |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s12672-024-01670-1 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846112420409376768 |
|---|---|
| author | Yi Tian Wen-ya Zhao Yi-ru Liu Wen-wen Song Qiao-xin Lin Yan-na Gong Yi-ting Deng Dian-na Gu Ling Tian |
| author_facet | Yi Tian Wen-ya Zhao Yi-ru Liu Wen-wen Song Qiao-xin Lin Yan-na Gong Yi-ting Deng Dian-na Gu Ling Tian |
| author_sort | Yi Tian |
| collection | DOAJ |
| description | Abstract Background Non-small cell lung cancer (NSCLC) represents one of the most prevalent forms of lung cancer, with a five-year survival rate of 21.7%. There is an urgent need to identify pertinent biomarkers to inform the diagnosis and prognosis of tumors, particularly those that can be applied to different age groups. Herein, we would apply machine learning methods to specifically analyze the issue of biomarker applicability across different age groups in NSCLC. Methods Studies have shown a higher incidence of NSCLC in people over 40 years of age, and due to the limitations of data set, studies of individuals under 40 years of age were not included in this study. To simulate the human aging model as closely as possible, we gathered corresponding non-small cell lung cancer (NSCLC) samples from the UCSC Xena database based on patient age information. These samples were then categorized into three groups: 40–60, 60–80, and over 80 years old. Subsequently, we employed four machine learning methods—Random Forest, LASSO regression analysis, XGBoost, and GBM—to identify gene sets with significant diagnostic value for each age group. By taking the intersection of these sets, we identified the optimal gene and assessed its prognostic significance in NSCLC. Then, the diagnostic value of CAT gene was validated using global public databases, including the GSE32863, GSE43458, GSE68571, GSE10072, and GSE63459 datasets from the Americas, the GSE30219 and GSE102511 datasets from Europe, and the GSE31210 and GSE19804 datasets from Asia. Furthermore, immunohistochemical staining was performed in an independent cohort from a tissue microarray. Additionally, cell culture and RT-qPCR were employed for external validation. Results Through the implementation of machine learning methods, we successfully identified the catalase (CAT) gene. Our analysis revealed that individuals with high expression of the CAT gene experienced improved survival rates. Additionally, these individuals exhibited elevated immune scores. We further discovered that the CAT gene synergizes with multiple components of neutrophils, including TLRs, FcRn, and the selective GEF of Rho-family GTPases. In addition, we identified a potential immune checkpoint, TNFSF15, which is applicable to the human aging model. Finally, we validated the CAT gene's diagnostic value using databases encompassing the Americas, Europe, and Asia regions. Through external RT-qPCR validation, we verified that CAT expression in BEAS-2B was higher than that of A549. In an independent human cohort, we also verified that CAT is lowly expressed in lung cancer tissues. In addition, higher CAT levels were associated with improved survival in the 40–60 and 60–80 age groups. Conclusions In our analysis of the NSCLC database, we pinpointed the CAT gene, which holds promise for potential diagnostic and prognostic applications in the context of human aging. Furthermore, it may offer insights into addressing age-related heterogeneity of NSCLC. |
| format | Article |
| id | doaj-art-7a85ac9969fd452fbd7c7d96d8b7921c |
| institution | Kabale University |
| issn | 2730-6011 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Springer |
| record_format | Article |
| series | Discover Oncology |
| spelling | doaj-art-7a85ac9969fd452fbd7c7d96d8b7921c2024-12-22T12:35:22ZengSpringerDiscover Oncology2730-60112024-12-0115111810.1007/s12672-024-01670-1Machine learning reveals CAT gene as a novel potential diagnostic and prognostic biomarker in non-small cell lung cancerYi Tian0Wen-ya Zhao1Yi-ru Liu2Wen-wen Song3Qiao-xin Lin4Yan-na Gong5Yi-ting Deng6Dian-na Gu7Ling Tian8Department of Central Laboratory, Shanghai Chest Hospital, Shanghai Jiao Tong University School of MedicineDepartment of Central Laboratory, Shanghai Chest Hospital, Shanghai Jiao Tong University School of MedicineDepartment of Medical Oncology, The First Affiliated Hospital of Wenzhou Medical UniversityDepartment of Medical Oncology, The First Affiliated Hospital of Wenzhou Medical UniversityDepartment of Medical Oncology, The First Affiliated Hospital of Wenzhou Medical UniversityDepartment of Medical Oncology, The First Affiliated Hospital of Wenzhou Medical UniversityDepartment of Medical Oncology, The First Affiliated Hospital of Wenzhou Medical UniversityDepartment of Medical Oncology, The First Affiliated Hospital of Wenzhou Medical UniversityDepartment of Central Laboratory, Shanghai Chest Hospital, Shanghai Jiao Tong University School of MedicineAbstract Background Non-small cell lung cancer (NSCLC) represents one of the most prevalent forms of lung cancer, with a five-year survival rate of 21.7%. There is an urgent need to identify pertinent biomarkers to inform the diagnosis and prognosis of tumors, particularly those that can be applied to different age groups. Herein, we would apply machine learning methods to specifically analyze the issue of biomarker applicability across different age groups in NSCLC. Methods Studies have shown a higher incidence of NSCLC in people over 40 years of age, and due to the limitations of data set, studies of individuals under 40 years of age were not included in this study. To simulate the human aging model as closely as possible, we gathered corresponding non-small cell lung cancer (NSCLC) samples from the UCSC Xena database based on patient age information. These samples were then categorized into three groups: 40–60, 60–80, and over 80 years old. Subsequently, we employed four machine learning methods—Random Forest, LASSO regression analysis, XGBoost, and GBM—to identify gene sets with significant diagnostic value for each age group. By taking the intersection of these sets, we identified the optimal gene and assessed its prognostic significance in NSCLC. Then, the diagnostic value of CAT gene was validated using global public databases, including the GSE32863, GSE43458, GSE68571, GSE10072, and GSE63459 datasets from the Americas, the GSE30219 and GSE102511 datasets from Europe, and the GSE31210 and GSE19804 datasets from Asia. Furthermore, immunohistochemical staining was performed in an independent cohort from a tissue microarray. Additionally, cell culture and RT-qPCR were employed for external validation. Results Through the implementation of machine learning methods, we successfully identified the catalase (CAT) gene. Our analysis revealed that individuals with high expression of the CAT gene experienced improved survival rates. Additionally, these individuals exhibited elevated immune scores. We further discovered that the CAT gene synergizes with multiple components of neutrophils, including TLRs, FcRn, and the selective GEF of Rho-family GTPases. In addition, we identified a potential immune checkpoint, TNFSF15, which is applicable to the human aging model. Finally, we validated the CAT gene's diagnostic value using databases encompassing the Americas, Europe, and Asia regions. Through external RT-qPCR validation, we verified that CAT expression in BEAS-2B was higher than that of A549. In an independent human cohort, we also verified that CAT is lowly expressed in lung cancer tissues. In addition, higher CAT levels were associated with improved survival in the 40–60 and 60–80 age groups. Conclusions In our analysis of the NSCLC database, we pinpointed the CAT gene, which holds promise for potential diagnostic and prognostic applications in the context of human aging. Furthermore, it may offer insights into addressing age-related heterogeneity of NSCLC.https://doi.org/10.1007/s12672-024-01670-1Machine learningCatalase geneBiomarkerAging-related genesAge heterogeneityNon-small cell lung cancer |
| spellingShingle | Yi Tian Wen-ya Zhao Yi-ru Liu Wen-wen Song Qiao-xin Lin Yan-na Gong Yi-ting Deng Dian-na Gu Ling Tian Machine learning reveals CAT gene as a novel potential diagnostic and prognostic biomarker in non-small cell lung cancer Discover Oncology Machine learning Catalase gene Biomarker Aging-related genes Age heterogeneity Non-small cell lung cancer |
| title | Machine learning reveals CAT gene as a novel potential diagnostic and prognostic biomarker in non-small cell lung cancer |
| title_full | Machine learning reveals CAT gene as a novel potential diagnostic and prognostic biomarker in non-small cell lung cancer |
| title_fullStr | Machine learning reveals CAT gene as a novel potential diagnostic and prognostic biomarker in non-small cell lung cancer |
| title_full_unstemmed | Machine learning reveals CAT gene as a novel potential diagnostic and prognostic biomarker in non-small cell lung cancer |
| title_short | Machine learning reveals CAT gene as a novel potential diagnostic and prognostic biomarker in non-small cell lung cancer |
| title_sort | machine learning reveals cat gene as a novel potential diagnostic and prognostic biomarker in non small cell lung cancer |
| topic | Machine learning Catalase gene Biomarker Aging-related genes Age heterogeneity Non-small cell lung cancer |
| url | https://doi.org/10.1007/s12672-024-01670-1 |
| work_keys_str_mv | AT yitian machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer AT wenyazhao machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer AT yiruliu machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer AT wenwensong machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer AT qiaoxinlin machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer AT yannagong machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer AT yitingdeng machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer AT diannagu machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer AT lingtian machinelearningrevealscatgeneasanovelpotentialdiagnosticandprognosticbiomarkerinnonsmallcelllungcancer |