Continuous prediction for tumor mutation burden based on transcriptional data in gastrointestinal cancers

Abstract Background Tumor mutation burden (TMB) has been considered a biomarker for utilization of immune checkpoint inhibitors(ICIs), but whole exome sequencing(WES) and cancer gene panel(CGP) based on next generation sequencing for TMB detection are costly. Here, we use transcriptome data of TCGA...

Full description

Saved in:
Bibliographic Details
Main Authors: Beibei Hu, Guohui Yin, Jialin Zhu, Yi Bai, Xuren Sun
Format: Article
Language:English
Published: BMC 2024-12-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-024-02794-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846112585804414976
author Beibei Hu
Guohui Yin
Jialin Zhu
Yi Bai
Xuren Sun
author_facet Beibei Hu
Guohui Yin
Jialin Zhu
Yi Bai
Xuren Sun
author_sort Beibei Hu
collection DOAJ
description Abstract Background Tumor mutation burden (TMB) has been considered a biomarker for utilization of immune checkpoint inhibitors(ICIs), but whole exome sequencing(WES) and cancer gene panel(CGP) based on next generation sequencing for TMB detection are costly. Here, we use transcriptome data of TCGA to construct a model for TMB prediction in gastrointestinal tumors. Methods Transcriptome data, somatic mutation data and clinical data of four gastrointestinal tumors from TCGA, including esophageal cancer (ESCA), stomach adenocarcinoma (STAD), colon adenocarcinoma (COAD) and rectal adenocarcinoma (READ). Using R, we performed visual analysis of somatic mutation data, differentially expressed genes (DEGs) function enrichment analysis, gene set enrichment analysis (GSEA), and estimated TMB value in clinic. Finally, a deep neural network (DNN) model was constructed for TMB prediction. Results Visualization of somatic mutation data summarized the classification of mutation, frequency of each mutation type, and top-mutated genes. GSEA showed the enrichment of CD4+/CD8+ T cells in the high TMB group and the activation of tumor suppressing pathways. Single-sample GSEA (ssGSEA) manifested that the high-TMB group had higher level of multiple immune cells infiltration. In addition, distribution of TMB was related to clinical parameters. Like age, M stage, N stage, AJCC stage, and overall survival(OS). After model optimization using genetic algorithm, in the training set, validation set, and testing set, the Pearson relevance coefficient r between predicted values and actual values reaches 0.98, 0.82, and 0.92, respectively; the coefficient of determination R2 is 0.95, 0.82, and 0.7, respectively. Conclusion TMB correlates with clinicopathological parameters in gastrointestinal carcinoma, and patients with high TMB have higher levels of immune infiltration. In addition, the DNN model based on 31 genes predicts TMB of gastrointestinal tumors in a high accuracy.
format Article
id doaj-art-c04e348e8a7b49b1a4774404bc9173e0
institution Kabale University
issn 1472-6947
language English
publishDate 2024-12-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj-art-c04e348e8a7b49b1a4774404bc9173e02024-12-22T12:30:05ZengBMCBMC Medical Informatics and Decision Making1472-69472024-12-0124111410.1186/s12911-024-02794-8Continuous prediction for tumor mutation burden based on transcriptional data in gastrointestinal cancersBeibei Hu0Guohui Yin1Jialin Zhu2Yi Bai3Xuren Sun4Department of Gastroenterology, First Affiliated Hospital of China Medical UniversityKey Laboratory of Traffic Safety On Track (Central South University), Ministry of Education, School of Traffic and Transportation Engineering, Central South UniversityDepartment of Gastroenterology, First Affiliated Hospital of China Medical UniversityDepartment of Gastroenterology, First Affiliated Hospital of China Medical UniversityDepartment of Gastroenterology, First Affiliated Hospital of China Medical UniversityAbstract Background Tumor mutation burden (TMB) has been considered a biomarker for utilization of immune checkpoint inhibitors(ICIs), but whole exome sequencing(WES) and cancer gene panel(CGP) based on next generation sequencing for TMB detection are costly. Here, we use transcriptome data of TCGA to construct a model for TMB prediction in gastrointestinal tumors. Methods Transcriptome data, somatic mutation data and clinical data of four gastrointestinal tumors from TCGA, including esophageal cancer (ESCA), stomach adenocarcinoma (STAD), colon adenocarcinoma (COAD) and rectal adenocarcinoma (READ). Using R, we performed visual analysis of somatic mutation data, differentially expressed genes (DEGs) function enrichment analysis, gene set enrichment analysis (GSEA), and estimated TMB value in clinic. Finally, a deep neural network (DNN) model was constructed for TMB prediction. Results Visualization of somatic mutation data summarized the classification of mutation, frequency of each mutation type, and top-mutated genes. GSEA showed the enrichment of CD4+/CD8+ T cells in the high TMB group and the activation of tumor suppressing pathways. Single-sample GSEA (ssGSEA) manifested that the high-TMB group had higher level of multiple immune cells infiltration. In addition, distribution of TMB was related to clinical parameters. Like age, M stage, N stage, AJCC stage, and overall survival(OS). After model optimization using genetic algorithm, in the training set, validation set, and testing set, the Pearson relevance coefficient r between predicted values and actual values reaches 0.98, 0.82, and 0.92, respectively; the coefficient of determination R2 is 0.95, 0.82, and 0.7, respectively. Conclusion TMB correlates with clinicopathological parameters in gastrointestinal carcinoma, and patients with high TMB have higher levels of immune infiltration. In addition, the DNN model based on 31 genes predicts TMB of gastrointestinal tumors in a high accuracy.https://doi.org/10.1186/s12911-024-02794-8Gastrointestinal cancersTumor mutation burdenDeep neural networkTranscriptomePrediction model
spellingShingle Beibei Hu
Guohui Yin
Jialin Zhu
Yi Bai
Xuren Sun
Continuous prediction for tumor mutation burden based on transcriptional data in gastrointestinal cancers
BMC Medical Informatics and Decision Making
Gastrointestinal cancers
Tumor mutation burden
Deep neural network
Transcriptome
Prediction model
title Continuous prediction for tumor mutation burden based on transcriptional data in gastrointestinal cancers
title_full Continuous prediction for tumor mutation burden based on transcriptional data in gastrointestinal cancers
title_fullStr Continuous prediction for tumor mutation burden based on transcriptional data in gastrointestinal cancers
title_full_unstemmed Continuous prediction for tumor mutation burden based on transcriptional data in gastrointestinal cancers
title_short Continuous prediction for tumor mutation burden based on transcriptional data in gastrointestinal cancers
title_sort continuous prediction for tumor mutation burden based on transcriptional data in gastrointestinal cancers
topic Gastrointestinal cancers
Tumor mutation burden
Deep neural network
Transcriptome
Prediction model
url https://doi.org/10.1186/s12911-024-02794-8
work_keys_str_mv AT beibeihu continuouspredictionfortumormutationburdenbasedontranscriptionaldataingastrointestinalcancers
AT guohuiyin continuouspredictionfortumormutationburdenbasedontranscriptionaldataingastrointestinalcancers
AT jialinzhu continuouspredictionfortumormutationburdenbasedontranscriptionaldataingastrointestinalcancers
AT yibai continuouspredictionfortumormutationburdenbasedontranscriptionaldataingastrointestinalcancers
AT xurensun continuouspredictionfortumormutationburdenbasedontranscriptionaldataingastrointestinalcancers