Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species.

In eukaryotes, polyadenylation (poly(A)) is an essential process during mRNA maturation. Identifying the cis-determinants of poly(A) signal (PAS) on the DNA sequence is the key to understand the mechanism of translation regulation and mRNA metabolism. Although machine learning methods were widely us...

Full description

Saved in:
Bibliographic Details
Main Authors: Yumin Zheng, Haohan Wang, Yang Zhang, Xin Gao, Eric P Xing, Min Xu
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2020-11-01
Series:PLoS Computational Biology
Online Access:https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008297&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841533259369938944
author Yumin Zheng
Haohan Wang
Yang Zhang
Xin Gao
Eric P Xing
Min Xu
author_facet Yumin Zheng
Haohan Wang
Yang Zhang
Xin Gao
Eric P Xing
Min Xu
author_sort Yumin Zheng
collection DOAJ
description In eukaryotes, polyadenylation (poly(A)) is an essential process during mRNA maturation. Identifying the cis-determinants of poly(A) signal (PAS) on the DNA sequence is the key to understand the mechanism of translation regulation and mRNA metabolism. Although machine learning methods were widely used in computationally identifying PAS, the need for tremendous amounts of annotation data hinder applications of existing methods in species without experimental data on PAS. Therefore, cross-species PAS identification, which enables the possibility to predict PAS from untrained species, naturally becomes a promising direction. In our works, we propose a novel deep learning method named Poly(A)-DG for cross-species PAS identification. Poly(A)-DG consists of a Convolution Neural Network-Multilayer Perceptron (CNN-MLP) network and a domain generalization technique. It learns PAS patterns from the training species and identifies PAS in target species without re-training. To test our method, we use four species and build cross-species training sets with two of them and evaluate the performance of the remaining ones. Moreover, we test our method against insufficient data and imbalanced data issues and demonstrate that Poly(A)-DG not only outperforms state-of-the-art methods but also maintains relatively high accuracy when it comes to a smaller or imbalanced training set.
format Article
id doaj-art-bab2dcfbd2c042b09ed2b8a7a81f2f08
institution Kabale University
issn 1553-734X
1553-7358
language English
publishDate 2020-11-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj-art-bab2dcfbd2c042b09ed2b8a7a81f2f082025-01-17T05:30:57ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582020-11-011611e100829710.1371/journal.pcbi.1008297Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species.Yumin ZhengHaohan WangYang ZhangXin GaoEric P XingMin XuIn eukaryotes, polyadenylation (poly(A)) is an essential process during mRNA maturation. Identifying the cis-determinants of poly(A) signal (PAS) on the DNA sequence is the key to understand the mechanism of translation regulation and mRNA metabolism. Although machine learning methods were widely used in computationally identifying PAS, the need for tremendous amounts of annotation data hinder applications of existing methods in species without experimental data on PAS. Therefore, cross-species PAS identification, which enables the possibility to predict PAS from untrained species, naturally becomes a promising direction. In our works, we propose a novel deep learning method named Poly(A)-DG for cross-species PAS identification. Poly(A)-DG consists of a Convolution Neural Network-Multilayer Perceptron (CNN-MLP) network and a domain generalization technique. It learns PAS patterns from the training species and identifies PAS in target species without re-training. To test our method, we use four species and build cross-species training sets with two of them and evaluate the performance of the remaining ones. Moreover, we test our method against insufficient data and imbalanced data issues and demonstrate that Poly(A)-DG not only outperforms state-of-the-art methods but also maintains relatively high accuracy when it comes to a smaller or imbalanced training set.https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008297&type=printable
spellingShingle Yumin Zheng
Haohan Wang
Yang Zhang
Xin Gao
Eric P Xing
Min Xu
Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species.
PLoS Computational Biology
title Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species.
title_full Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species.
title_fullStr Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species.
title_full_unstemmed Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species.
title_short Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species.
title_sort poly a dg a deep learning based domain generalization method to identify cross species poly a signal without prior knowledge from target species
url https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008297&type=printable
work_keys_str_mv AT yuminzheng polyadgadeeplearningbaseddomaingeneralizationmethodtoidentifycrossspeciespolyasignalwithoutpriorknowledgefromtargetspecies
AT haohanwang polyadgadeeplearningbaseddomaingeneralizationmethodtoidentifycrossspeciespolyasignalwithoutpriorknowledgefromtargetspecies
AT yangzhang polyadgadeeplearningbaseddomaingeneralizationmethodtoidentifycrossspeciespolyasignalwithoutpriorknowledgefromtargetspecies
AT xingao polyadgadeeplearningbaseddomaingeneralizationmethodtoidentifycrossspeciespolyasignalwithoutpriorknowledgefromtargetspecies
AT ericpxing polyadgadeeplearningbaseddomaingeneralizationmethodtoidentifycrossspeciespolyasignalwithoutpriorknowledgefromtargetspecies
AT minxu polyadgadeeplearningbaseddomaingeneralizationmethodtoidentifycrossspeciespolyasignalwithoutpriorknowledgefromtargetspecies