A generative adversarial network for multiple reads reconstruction in DNA storage

Abstract DNA storage is widely considered as a promising solution to the data explosion problem. However, the synthesis, PCR and sequencing processes usually result in erroneous reads involving base insertions, deletions, and substitutions. Specially this situation is even more serious in the 3rd ge...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaodong Zheng, Ranze Xie, Xiangyu Yao, Yanqing Su, Ling Chu, Peng Xu, Wenbin Liu
Format: Article
Language:English
Published: Nature Portfolio 2024-12-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-024-83806-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559457595654144
author Xiaodong Zheng
Ranze Xie
Xiangyu Yao
Yanqing Su
Ling Chu
Peng Xu
Wenbin Liu
author_facet Xiaodong Zheng
Ranze Xie
Xiangyu Yao
Yanqing Su
Ling Chu
Peng Xu
Wenbin Liu
author_sort Xiaodong Zheng
collection DOAJ
description Abstract DNA storage is widely considered as a promising solution to the data explosion problem. However, the synthesis, PCR and sequencing processes usually result in erroneous reads involving base insertions, deletions, and substitutions. Specially this situation is even more serious in the 3rd generation of sequencing technologies. Different from previous error-correction and multiple sequence alignment methods, we first transform the multiple reads into a noisy mage, and then construct a conditional generative adversarial network to produce a “smooth” image which refers to the consensus sequence. Results on two real datasets demonstrate that our model can completely reconstruct the tested sequences with as high as 5.9% errors. This means that the proposed DNA-GAN can be applied on 3rd generation nanopore sequencing environments, while the transformer-based models are only tested on next-generation sequencing datasets. Furthermore, DNA-GAN exhibits excellent robustness even when as much as 20% of the clusters are contaminated with irrelevant reads. To the best of our knowledge, this work is the first to use GAN for multi-reads reconstruction in DNA-based storage.
format Article
id doaj-art-a82a33f1c88c48dc8e4a994892fab0c8
institution Kabale University
issn 2045-2322
language English
publishDate 2024-12-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-a82a33f1c88c48dc8e4a994892fab0c82025-01-05T12:28:09ZengNature PortfolioScientific Reports2045-23222024-12-0114111010.1038/s41598-024-83806-5A generative adversarial network for multiple reads reconstruction in DNA storageXiaodong Zheng0Ranze Xie1Xiangyu Yao2Yanqing Su3Ling Chu4Peng Xu5Wenbin Liu6Institution of Computational Science and Technology, Guangzhou UniversityInstitution of Computational Science and Technology, Guangzhou UniversityInstitution of Computational Science and Technology, Guangzhou UniversityInstitution of Computational Science and Technology, Guangzhou UniversityInstitution of Computational Science and Technology, Guangzhou UniversityInstitution of Computational Science and Technology, Guangzhou UniversityInstitution of Computational Science and Technology, Guangzhou UniversityAbstract DNA storage is widely considered as a promising solution to the data explosion problem. However, the synthesis, PCR and sequencing processes usually result in erroneous reads involving base insertions, deletions, and substitutions. Specially this situation is even more serious in the 3rd generation of sequencing technologies. Different from previous error-correction and multiple sequence alignment methods, we first transform the multiple reads into a noisy mage, and then construct a conditional generative adversarial network to produce a “smooth” image which refers to the consensus sequence. Results on two real datasets demonstrate that our model can completely reconstruct the tested sequences with as high as 5.9% errors. This means that the proposed DNA-GAN can be applied on 3rd generation nanopore sequencing environments, while the transformer-based models are only tested on next-generation sequencing datasets. Furthermore, DNA-GAN exhibits excellent robustness even when as much as 20% of the clusters are contaminated with irrelevant reads. To the best of our knowledge, this work is the first to use GAN for multi-reads reconstruction in DNA-based storage.https://doi.org/10.1038/s41598-024-83806-5
spellingShingle Xiaodong Zheng
Ranze Xie
Xiangyu Yao
Yanqing Su
Ling Chu
Peng Xu
Wenbin Liu
A generative adversarial network for multiple reads reconstruction in DNA storage
Scientific Reports
title A generative adversarial network for multiple reads reconstruction in DNA storage
title_full A generative adversarial network for multiple reads reconstruction in DNA storage
title_fullStr A generative adversarial network for multiple reads reconstruction in DNA storage
title_full_unstemmed A generative adversarial network for multiple reads reconstruction in DNA storage
title_short A generative adversarial network for multiple reads reconstruction in DNA storage
title_sort generative adversarial network for multiple reads reconstruction in dna storage
url https://doi.org/10.1038/s41598-024-83806-5
work_keys_str_mv AT xiaodongzheng agenerativeadversarialnetworkformultiplereadsreconstructionindnastorage
AT ranzexie agenerativeadversarialnetworkformultiplereadsreconstructionindnastorage
AT xiangyuyao agenerativeadversarialnetworkformultiplereadsreconstructionindnastorage
AT yanqingsu agenerativeadversarialnetworkformultiplereadsreconstructionindnastorage
AT lingchu agenerativeadversarialnetworkformultiplereadsreconstructionindnastorage
AT pengxu agenerativeadversarialnetworkformultiplereadsreconstructionindnastorage
AT wenbinliu agenerativeadversarialnetworkformultiplereadsreconstructionindnastorage
AT xiaodongzheng generativeadversarialnetworkformultiplereadsreconstructionindnastorage
AT ranzexie generativeadversarialnetworkformultiplereadsreconstructionindnastorage
AT xiangyuyao generativeadversarialnetworkformultiplereadsreconstructionindnastorage
AT yanqingsu generativeadversarialnetworkformultiplereadsreconstructionindnastorage
AT lingchu generativeadversarialnetworkformultiplereadsreconstructionindnastorage
AT pengxu generativeadversarialnetworkformultiplereadsreconstructionindnastorage
AT wenbinliu generativeadversarialnetworkformultiplereadsreconstructionindnastorage