Parallel algorithm for sensitive sequence recognition from long-read genome data with high error rate
To solve the problem that existing algorithms were difficult to effectively identify sensitive sequences in genomic data for long-read with high error rate, a recognition algorithm using hybrid CPU and GPU parallel computing, called CGPU-F3SR, was proposed.Firstly, the long-read in genomic data were...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Editorial Department of Journal on Communications
2023-02-01
|
Series: | Tongxin xuebao |
Subjects: | |
Online Access: | http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2023009/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | To solve the problem that existing algorithms were difficult to effectively identify sensitive sequences in genomic data for long-read with high error rate, a recognition algorithm using hybrid CPU and GPU parallel computing, called CGPU-F3SR, was proposed.Firstly, the long-read in genomic data were partitioned into multiple short-read, and the Bloom filtering mechanism was used to avoid repeated calculation of the short-read.Secondly, the k-mer coding strategy was used to extract in parallel the error information of all short-read, the recognition accuracy was promoted by improving the sequence similarity calculation model.Finally, CPU and GPU were used to coordinate and parallel to accelerate the calculation of short-read similarity to improve recognition efficiency.As a result, both two types of sensitive sequences including short tandem repeats and disease related sequences could be identified efficiently and accurately from genome data for long-read with high error rate.The experimental results of recognizing sensitive sequences from genomic data for long-read with length 100~400 kbp each show that, compared with existing parallel algorithm, the average recognition accuracy and precision rate of proposed CPU/GPU parallel algorithm CGPU-F3SR are increased by 7.77% and 43.07% respectively, its average false positive rate is reduced by 7.41%, and its average recognition throughput is increased by 2.44 times. |
---|---|
ISSN: | 1000-436X |