Screening and validation of long non-coding RNAs associated with colorectal cancer based on random forest and LASSO regression algorithm

Abstract Objective Colorectal cancer (CRC) ranks as the third most prevalent contributor to global disease burden and represents the second highest mortality rate among all malignancies worldwide. Long non-coding RNAs (lncRNAs) are a new class of regulatory RNAs, which play a crucial role in the occ...

Full description

Saved in:
Bibliographic Details
Main Authors: Yujia Zhao, Qian Li, Xintong Cui, Zhiyu Zhang, Yong You, Xiaowen Hou, Yan Wang, Xu Feng
Format: Article
Language:English
Published: Springer 2025-07-01
Series:Discover Oncology
Subjects:
Online Access:https://doi.org/10.1007/s12672-025-03048-3
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Objective Colorectal cancer (CRC) ranks as the third most prevalent contributor to global disease burden and represents the second highest mortality rate among all malignancies worldwide. Long non-coding RNAs (lncRNAs) are a new class of regulatory RNAs, which play a crucial role in the occurrence and development of colorectal cancer. Therefore, it is potentially important to use bioinformatics and machine learning methods to study novel biomarkers for CRC. Methods The RNA-seq data of colorectal cancer and normal colorectal tissue were downloaded from the GEO database. Random forest (RF) and LASSO (Least Absolute Shrinkage and Selection Operator (LASSO) regression algorithms were constructed to screen lncRNAs closely related to CRC, and their screening efficiency was verified. Predict the regulatory genes of lncRNA and construct the ceRNA regulatory network of lncRNA-miRNA-mRNA. Quantitative real-time PCR (qRT-PCR) was used to verify its expression in colorectal cancer tissues and adjacent tissues, as well as its relationship with clinical features of CRC patients. Result A total of 3028 CRC-related lncRNAs were initially screened from the GEO database, and 55 differentially expressed lncRNAs (DE lncRNAs) were finally selected through difference analysis. The key lncRNAs were further screened using RF and LASSO. The same gene in the screening results of the above two methods was selected as the key lncRNA of CRC. Finally, five key lncRNAs (NCAL1, CRNDE, HMGA1P4, EPIST and MT1JP) were selected, among them, the expressions of NCAL1, CRNDE and HMGA1P4 were upregulated compared with normal CRC tissues, while the expressions of EPIST and MT1JP were downregulated compared with normal colorectal tissues. The expression of 5 key CRC lncRNAs was verified, and each AUC is greater than 0.7, indicating a good screening effect. Since CRNDE has been studied by members of this research group before, it will not be further studied. It was predicted that 4 lncRNAs would interact with 16 miRNAs and 57 mRNAs. Four key lncRNAs, namely NCAL1, HMGA1P4, EPIST and MT1JP, were experimentally verified. qRT-PCR results showed that the expression of four key lncRNAs in CRC tissues and adjacent tissues had statistical significance (p < 0.001). Conclusion In summary, we obtained 5 lncRNAs that may be closely related to colorectal cancer, including NCAL1, CRNDE, HMGA1P4, EPIST and MT1JP. This study found that NCAL1, HMGA1P4, EPIST and MT1JP may be candidate biomarkers for colorectal cancer.
ISSN:2730-6011