High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images

Abstract Background Lung cancer is widely recognized as a prevalent malignant neoplasm. Traditional genetic testing methods face limitations such as high costs and lengthy procedures. The prediction of clinically relevant genetic mutations via histopathological images could facilitate the expedited...

Full description

Saved in:
Bibliographic Details
Main Authors: Lingyu Zhao, Na Zhao, Ruiqi Zhong, Yiru Niu, Ziyi Chang, Peng Su, Zhihui Wang, Lifang Cui, Bei Wang, Huang Chen, Xiaowen Wang, Xiangbing Kong, Baolin Du, Fei Ren, Dingrong Zhong
Format: Article
Language:English
Published: BMC 2025-06-01
Series:Diagnostic Pathology
Subjects:
Online Access:https://doi.org/10.1186/s13000-025-01663-w
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background Lung cancer is widely recognized as a prevalent malignant neoplasm. Traditional genetic testing methods face limitations such as high costs and lengthy procedures. The prediction of clinically relevant genetic mutations via histopathological images could facilitate the expedited identification of genetic mutations in clinical settings. Methods We collected 2,221 slides from 1999 patients diagnosed with lung adenocarcinoma. The data include whole-slide images data as well as information on gene mutations in EGFR, KRAS, ALK, HER2, and other rare genes (ROS1, RET, BRAF, PIK3CA, NRAS), and related clinical information. The self-supervised model DINO and the two-stage multi-instance network GAMIL were employed to accurately identify mutation statuses in 9 genes linked to tumorigenesis and cancer progression. The comparison of model performance involves the utilization of various foundation model (UNI), classification models (CLAM and Inception v3), external datasets (TCGA and other medical institutions), and comparative analysis with human pathologists. Results Our approach outperforms the CLAM and inception v3 model, achieving AUC values ranging from 0.825 to 0.987 for predicting gene mutations. The AUC value on the external test data set is 0.516–0.843. Furthermore, when comparing EGFR gene mutation prediction between pathologists and the GAMIL model, GAMIL exhibited a significantly higher AUC value of 0.810, exceeding the average AUC value of 0.508 achieved by pathologists. Conclusion The GAMIL models exhibit outstanding performance in delineating tumor regions in lung adenocarcinoma and in forecasting gene mutations. The utilization of these models presents substantial potential for markedly improving molecular testing efficiency and opening novel pathways for personalized treatment. Trial registration Not applicable.
ISSN:1746-1596