Prediction of transcript isoforms and identification of tissue-specific genes in cucumber

Abstract Background Identification of global transcriptional events is crucial for genome annotation, as accurate annotation enhances the efficiency and comparability of genomic information across species. However, the annotation of transcripts in the cucumber genome remains to be improved, and many...

Full description

Saved in:
Bibliographic Details
Main Authors: Wenjiao Wang, Chengcheng Shen, Xinqiang Wen, Anqi Li, Qi Gao, Zhaoying Xu, Yuping Wei, Yushun Li, Dailu Guan, Bin Liu
Format: Article
Language:English
Published: BMC 2025-01-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-025-11212-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841545010208571392
author Wenjiao Wang
Chengcheng Shen
Xinqiang Wen
Anqi Li
Qi Gao
Zhaoying Xu
Yuping Wei
Yushun Li
Dailu Guan
Bin Liu
author_facet Wenjiao Wang
Chengcheng Shen
Xinqiang Wen
Anqi Li
Qi Gao
Zhaoying Xu
Yuping Wei
Yushun Li
Dailu Guan
Bin Liu
author_sort Wenjiao Wang
collection DOAJ
description Abstract Background Identification of global transcriptional events is crucial for genome annotation, as accurate annotation enhances the efficiency and comparability of genomic information across species. However, the annotation of transcripts in the cucumber genome remains to be improved, and many transcriptional events have not been well studied. Results We collected 1,904 high-quality public cucumber transcriptome samples from the National Center for Biotechnology Information (NCBI) to identify and annotate transcript isoforms in the cucumber genome. Over 44.26 billion Q30 clean reads were mapped to the cucumber genome with an average mapping rate of 92.75%. Transcriptome assembly identified 151,453 transcripts spanning 20,442 loci. Among these, 12.7% of transcripts exactly matched annotated genes in the cucumber reference genome. More than 80% of the transcripts were classified as novel isoforms. Approximately 96.6% of these isoforms originated from known gene loci, while around 3.3% were derived from novel gene loci. Coding potential prediction identified 4,543 long non-coding RNAs (lncRNAs) across 3,376 loci. Building on these results, we identified tissue-specific transcripts in 10 tissues. Among that, 1,655 annotated genes and 4,214 predicted transcripts were considered as tissue-specific. The root exhibited the highest number of tissue-specific transcripts, followed by shoot apex. Subsequent selective pressure analysis revealed that tissue-specific regions experienced stronger directional selection compared to non-specific regions. Conclusions By analyzing thousands of published transcriptome data, we identified abundant transcriptional events and tissue-specific transcripts in cucumbers. This study presented here adds the great value to the public data and offers insights for further exploration of a more comprehensive tissue regulatory network in cucumber.
format Article
id doaj-art-dec203996671415798a22381637ecbb1
institution Kabale University
issn 1471-2164
language English
publishDate 2025-01-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj-art-dec203996671415798a22381637ecbb12025-01-12T12:09:10ZengBMCBMC Genomics1471-21642025-01-0126111210.1186/s12864-025-11212-wPrediction of transcript isoforms and identification of tissue-specific genes in cucumberWenjiao Wang0Chengcheng Shen1Xinqiang Wen2Anqi Li3Qi Gao4Zhaoying Xu5Yuping Wei6Yushun Li7Dailu Guan8Bin Liu9College of Horticulture, Shanxi Agricultural UniversityCollege of Horticulture, Shanxi Agricultural UniversityCollege of Horticulture, Shanxi Agricultural UniversityCollege of Horticulture, Shanxi Agricultural UniversityCollege of Horticulture, Shanxi Agricultural UniversityCollege of Horticulture, Shanxi Agricultural UniversityCollege of Horticulture, Shanxi Agricultural UniversityHami-melon Research Center, Xinjiang Academy of Agricultural SciencesDepartment of Animal Science, University of California DavisHami-melon Research Center, Xinjiang Academy of Agricultural SciencesAbstract Background Identification of global transcriptional events is crucial for genome annotation, as accurate annotation enhances the efficiency and comparability of genomic information across species. However, the annotation of transcripts in the cucumber genome remains to be improved, and many transcriptional events have not been well studied. Results We collected 1,904 high-quality public cucumber transcriptome samples from the National Center for Biotechnology Information (NCBI) to identify and annotate transcript isoforms in the cucumber genome. Over 44.26 billion Q30 clean reads were mapped to the cucumber genome with an average mapping rate of 92.75%. Transcriptome assembly identified 151,453 transcripts spanning 20,442 loci. Among these, 12.7% of transcripts exactly matched annotated genes in the cucumber reference genome. More than 80% of the transcripts were classified as novel isoforms. Approximately 96.6% of these isoforms originated from known gene loci, while around 3.3% were derived from novel gene loci. Coding potential prediction identified 4,543 long non-coding RNAs (lncRNAs) across 3,376 loci. Building on these results, we identified tissue-specific transcripts in 10 tissues. Among that, 1,655 annotated genes and 4,214 predicted transcripts were considered as tissue-specific. The root exhibited the highest number of tissue-specific transcripts, followed by shoot apex. Subsequent selective pressure analysis revealed that tissue-specific regions experienced stronger directional selection compared to non-specific regions. Conclusions By analyzing thousands of published transcriptome data, we identified abundant transcriptional events and tissue-specific transcripts in cucumbers. This study presented here adds the great value to the public data and offers insights for further exploration of a more comprehensive tissue regulatory network in cucumber.https://doi.org/10.1186/s12864-025-11212-wCucumberRNA-seqTranscript isoformTissue-specific
spellingShingle Wenjiao Wang
Chengcheng Shen
Xinqiang Wen
Anqi Li
Qi Gao
Zhaoying Xu
Yuping Wei
Yushun Li
Dailu Guan
Bin Liu
Prediction of transcript isoforms and identification of tissue-specific genes in cucumber
BMC Genomics
Cucumber
RNA-seq
Transcript isoform
Tissue-specific
title Prediction of transcript isoforms and identification of tissue-specific genes in cucumber
title_full Prediction of transcript isoforms and identification of tissue-specific genes in cucumber
title_fullStr Prediction of transcript isoforms and identification of tissue-specific genes in cucumber
title_full_unstemmed Prediction of transcript isoforms and identification of tissue-specific genes in cucumber
title_short Prediction of transcript isoforms and identification of tissue-specific genes in cucumber
title_sort prediction of transcript isoforms and identification of tissue specific genes in cucumber
topic Cucumber
RNA-seq
Transcript isoform
Tissue-specific
url https://doi.org/10.1186/s12864-025-11212-w
work_keys_str_mv AT wenjiaowang predictionoftranscriptisoformsandidentificationoftissuespecificgenesincucumber
AT chengchengshen predictionoftranscriptisoformsandidentificationoftissuespecificgenesincucumber
AT xinqiangwen predictionoftranscriptisoformsandidentificationoftissuespecificgenesincucumber
AT anqili predictionoftranscriptisoformsandidentificationoftissuespecificgenesincucumber
AT qigao predictionoftranscriptisoformsandidentificationoftissuespecificgenesincucumber
AT zhaoyingxu predictionoftranscriptisoformsandidentificationoftissuespecificgenesincucumber
AT yupingwei predictionoftranscriptisoformsandidentificationoftissuespecificgenesincucumber
AT yushunli predictionoftranscriptisoformsandidentificationoftissuespecificgenesincucumber
AT dailuguan predictionoftranscriptisoformsandidentificationoftissuespecificgenesincucumber
AT binliu predictionoftranscriptisoformsandidentificationoftissuespecificgenesincucumber