Parallel algorithm for sensitive sequence recognition from long-read genome data with high error rate

To solve the problem that existing algorithms were difficult to effectively identify sensitive sequences in genomic data for long-read with high error rate, a recognition algorithm using hybrid CPU and GPU parallel computing, called CGPU-F3SR, was proposed.Firstly, the long-read in genomic data were...

Full description

Saved in:
Bibliographic Details
Main Authors: Cheng ZHONG, Hui SUN
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2023-02-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2023009/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:To solve the problem that existing algorithms were difficult to effectively identify sensitive sequences in genomic data for long-read with high error rate, a recognition algorithm using hybrid CPU and GPU parallel computing, called CGPU-F3SR, was proposed.Firstly, the long-read in genomic data were partitioned into multiple short-read, and the Bloom filtering mechanism was used to avoid repeated calculation of the short-read.Secondly, the k-mer coding strategy was used to extract in parallel the error information of all short-read, the recognition accuracy was promoted by improving the sequence similarity calculation model.Finally, CPU and GPU were used to coordinate and parallel to accelerate the calculation of short-read similarity to improve recognition efficiency.As a result, both two types of sensitive sequences including short tandem repeats and disease related sequences could be identified efficiently and accurately from genome data for long-read with high error rate.The experimental results of recognizing sensitive sequences from genomic data for long-read with length 100~400 kbp each show that, compared with existing parallel algorithm, the average recognition accuracy and precision rate of proposed CPU/GPU parallel algorithm CGPU-F3SR are increased by 7.77% and 43.07% respectively, its average false positive rate is reduced by 7.41%, and its average recognition throughput is increased by 2.44 times.
ISSN:1000-436X