SurVIndel2: improving copy number variant calling from next-generation sequencing using hidden split reads
Abstract Deletions and tandem duplications (commonly called CNVs) represent the majority of structural variations in a human genome. They can be identified using short reads, but because they frequently occur in repetitive regions, existing methods fail to detect most of them. This is because CNVs i...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2024-12-01
|
| Series: | Nature Communications |
| Online Access: | https://doi.org/10.1038/s41467-024-53087-7 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849220628940324864 |
|---|---|
| author | Ramesh Rajaby Wing-Kin Sung |
| author_facet | Ramesh Rajaby Wing-Kin Sung |
| author_sort | Ramesh Rajaby |
| collection | DOAJ |
| description | Abstract Deletions and tandem duplications (commonly called CNVs) represent the majority of structural variations in a human genome. They can be identified using short reads, but because they frequently occur in repetitive regions, existing methods fail to detect most of them. This is because CNVs in repetitive regions often do not produce the evidence needed by existing short reads-based callers (split reads, discordant pairs or read depth change). Here, we introduce a new CNV short reads-based caller named SurVIndel2. SurVindel2 builds on statistical techniques we previously developed, but also employs a novel type of evidence, hidden split reads, that can uncover many CNVs missed by existing algorithms. We use public benchmarks to show that SurVIndel2 outperforms other popular callers, both on human and non-human datasets. Then, we demonstrate the practical utility of the method by generating a catalogue of CNVs for the 1000 Genomes Project that contains hundreds of thousands of CNVs missing from the most recent public catalogue. We also show that SurVIndel2 is able to complement small indels predicted by Google DeepVariant, and the two software used in tandem produce a remarkably complete catalogue of variants in an individual. Finally, we characterise how the limitations of current sequencing technologies contribute significantly to the missing CNVs. |
| format | Article |
| id | doaj-art-13a792885ae24f3184e46d22f7419c22 |
| institution | Kabale University |
| issn | 2041-1723 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Nature Communications |
| spelling | doaj-art-13a792885ae24f3184e46d22f7419c222024-12-08T12:37:16ZengNature PortfolioNature Communications2041-17232024-12-0115111610.1038/s41467-024-53087-7SurVIndel2: improving copy number variant calling from next-generation sequencing using hidden split readsRamesh Rajaby0Wing-Kin Sung1Department of Chemical Pathology, The Chinese University of Hong KongDepartment of Chemical Pathology, The Chinese University of Hong KongAbstract Deletions and tandem duplications (commonly called CNVs) represent the majority of structural variations in a human genome. They can be identified using short reads, but because they frequently occur in repetitive regions, existing methods fail to detect most of them. This is because CNVs in repetitive regions often do not produce the evidence needed by existing short reads-based callers (split reads, discordant pairs or read depth change). Here, we introduce a new CNV short reads-based caller named SurVIndel2. SurVindel2 builds on statistical techniques we previously developed, but also employs a novel type of evidence, hidden split reads, that can uncover many CNVs missed by existing algorithms. We use public benchmarks to show that SurVIndel2 outperforms other popular callers, both on human and non-human datasets. Then, we demonstrate the practical utility of the method by generating a catalogue of CNVs for the 1000 Genomes Project that contains hundreds of thousands of CNVs missing from the most recent public catalogue. We also show that SurVIndel2 is able to complement small indels predicted by Google DeepVariant, and the two software used in tandem produce a remarkably complete catalogue of variants in an individual. Finally, we characterise how the limitations of current sequencing technologies contribute significantly to the missing CNVs.https://doi.org/10.1038/s41467-024-53087-7 |
| spellingShingle | Ramesh Rajaby Wing-Kin Sung SurVIndel2: improving copy number variant calling from next-generation sequencing using hidden split reads Nature Communications |
| title | SurVIndel2: improving copy number variant calling from next-generation sequencing using hidden split reads |
| title_full | SurVIndel2: improving copy number variant calling from next-generation sequencing using hidden split reads |
| title_fullStr | SurVIndel2: improving copy number variant calling from next-generation sequencing using hidden split reads |
| title_full_unstemmed | SurVIndel2: improving copy number variant calling from next-generation sequencing using hidden split reads |
| title_short | SurVIndel2: improving copy number variant calling from next-generation sequencing using hidden split reads |
| title_sort | survindel2 improving copy number variant calling from next generation sequencing using hidden split reads |
| url | https://doi.org/10.1038/s41467-024-53087-7 |
| work_keys_str_mv | AT rameshrajaby survindel2improvingcopynumbervariantcallingfromnextgenerationsequencingusinghiddensplitreads AT wingkinsung survindel2improvingcopynumbervariantcallingfromnextgenerationsequencingusinghiddensplitreads |