Parallel algorithm for sensitive sequence recognition from long-read genome data with high error rate

To solve the problem that existing algorithms were difficult to effectively identify sensitive sequences in genomic data for long-read with high error rate, a recognition algorithm using hybrid CPU and GPU parallel computing, called CGPU-F3SR, was proposed.Firstly, the long-read in genomic data were...

Full description

Saved in:

Bibliographic Details
Main Authors:	Cheng ZHONG, Hui SUN
Format:	Article
Language:	zho
Published:	Editorial Department of Journal on Communications 2023-02-01
Series:	Tongxin xuebao
Subjects:	sensitive sequence recognition filtering similarity calculation sequence alignment parallel computing
Online Access:	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2023009/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	To solve the problem that existing algorithms were difficult to effectively identify sensitive sequences in genomic data for long-read with high error rate, a recognition algorithm using hybrid CPU and GPU parallel computing, called CGPU-F3SR, was proposed.Firstly, the long-read in genomic data were partitioned into multiple short-read, and the Bloom filtering mechanism was used to avoid repeated calculation of the short-read.Secondly, the k-mer coding strategy was used to extract in parallel the error information of all short-read, the recognition accuracy was promoted by improving the sequence similarity calculation model.Finally, CPU and GPU were used to coordinate and parallel to accelerate the calculation of short-read similarity to improve recognition efficiency.As a result, both two types of sensitive sequences including short tandem repeats and disease related sequences could be identified efficiently and accurately from genome data for long-read with high error rate.The experimental results of recognizing sensitive sequences from genomic data for long-read with length 100～400 kbp each show that, compared with existing parallel algorithm, the average recognition accuracy and precision rate of proposed CPU/GPU parallel algorithm CGPU-F3SR are increased by 7.77% and 43.07% respectively, its average false positive rate is reduced by 7.41%, and its average recognition throughput is increased by 2.44 times.
ISSN:	1000-436X

Parallel algorithm for sensitive sequence recognition from long-read genome data with high error rate

Similar Items