TRFill: synergistic use of HiFi and Hi-C sequencing enables accurate assembly of tandem repeats for population-level analysis

Abstract The highly repetitive content of eukaryotic genomes, including long tandem repeats, segmental duplications, and centromeres, makes haplotype-resolved genome assembly hard. Repeat sequences introduce gaps or mis-joins in the assemblies. We introduce TRFill, a novel algorithm that can close t...

Full description

Saved in:
Bibliographic Details
Main Authors: Huaming Wen, Jinbao Yang, Xianjia Zhao, Xingbin Wang, Jiawei Lei, Yanchun Li, Wenjie Du, Dongxi Li, Yun Xu, Stefano Lonardi, Weihua Pan
Format: Article
Language:English
Published: BMC 2025-07-01
Series:Genome Biology
Subjects:
Online Access:https://doi.org/10.1186/s13059-025-03685-5
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract The highly repetitive content of eukaryotic genomes, including long tandem repeats, segmental duplications, and centromeres, makes haplotype-resolved genome assembly hard. Repeat sequences introduce gaps or mis-joins in the assemblies. We introduce TRFill, a novel algorithm that can close the gaps in a draft chromosome-level assembly using exclusively PacBio HiFi and Hi-C data. Experimental results on human centromeres and tomato subtelomeres show that TRFill can improve the completeness and correctness of about two-thirds of the tandem repeats. We also show that the improved completeness of subtelomeric tandem repeats in the tomato pangenome enables a population-level analysis of these complex repeats.
ISSN:1474-760X