Genotyping TOMM40′523 poly-T polymorphisms using whole-genome sequencing
Summary: The TOMM40′523 poly-T repeat polymorphism (rs10524523) has been associated with cognitive decline and Alzheimer's disease (AD) progression. Challenges in processing whole-genome sequencing (WGS) data traditionally require additional PCR and targeted sequencing assays to genotype these...
Saved in:
| Main Authors: | , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-10-01
|
| Series: | HGG Advances |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2666247725000910 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849229226074439680 |
|---|---|
| author | Ricardo A. Vialle Lei Yu Yan Li Roberto T. Raittz Jose M. Farfel Philip L. De Jager Julie A. Schneider Lisa L. Barnes Shinya Tasaki David A. Bennett |
| author_facet | Ricardo A. Vialle Lei Yu Yan Li Roberto T. Raittz Jose M. Farfel Philip L. De Jager Julie A. Schneider Lisa L. Barnes Shinya Tasaki David A. Bennett |
| author_sort | Ricardo A. Vialle |
| collection | DOAJ |
| description | Summary: The TOMM40′523 poly-T repeat polymorphism (rs10524523) has been associated with cognitive decline and Alzheimer's disease (AD) progression. Challenges in processing whole-genome sequencing (WGS) data traditionally require additional PCR and targeted sequencing assays to genotype these polymorphisms. We introduce a computational pipeline that integrates multiple short tandem repeat (STR) detection tools in an ensemble machine learning model using XGBoost. Using a sample of 1,202 participants from 4 cohort studies, we benchmarked our method against PCR-based measures. Our ensemble model outperformed individual STR tools, improving repeat length estimation accuracy (R2 = 0.92) and achieving an accuracy rate of 93.2% compared with PCR-derived genotypes. Additionally, we validated our WGS-derived genotypes by replicating previously reported associations between TOMM40′523 variants and cognitive decline. Our computational genotyping tool is a scalable and reliable alternative to PCR-based assays, enabling broader investigations of TOMM40 variation in studies with WGS data. |
| format | Article |
| id | doaj-art-42b9808aa29842e3a2ca7f2c857c8b65 |
| institution | Kabale University |
| issn | 2666-2477 |
| language | English |
| publishDate | 2025-10-01 |
| publisher | Elsevier |
| record_format | Article |
| series | HGG Advances |
| spelling | doaj-art-42b9808aa29842e3a2ca7f2c857c8b652025-08-22T04:58:18ZengElsevierHGG Advances2666-24772025-10-016410048810.1016/j.xhgg.2025.100488Genotyping TOMM40′523 poly-T polymorphisms using whole-genome sequencingRicardo A. Vialle0Lei Yu1Yan Li2Roberto T. Raittz3Jose M. Farfel4Philip L. De Jager5Julie A. Schneider6Lisa L. Barnes7Shinya Tasaki8David A. Bennett9Rush Alzheimer’s Disease Center, Rush University Medical Center, 1750 West Harrison Street, Chicago, IL 60612, USA; Graduate Program in Bioinformatics, Professional and Technical Education Sector (SEPT), Universidade Federal do Paraná (UFPR), R. Dr. Alcides Vieira Arcoverde 1225, Curitiba, Paraná 81520-260, Brazil; Corresponding authorRush Alzheimer’s Disease Center, Rush University Medical Center, 1750 West Harrison Street, Chicago, IL 60612, USARush Alzheimer’s Disease Center, Rush University Medical Center, 1750 West Harrison Street, Chicago, IL 60612, USAGraduate Program in Bioinformatics, Professional and Technical Education Sector (SEPT), Universidade Federal do Paraná (UFPR), R. Dr. Alcides Vieira Arcoverde 1225, Curitiba, Paraná 81520-260, Brazil; Department of Biochemistry, Universidade Federal do Paraná (UFPR), Avenida Coronel Francisco H. dos Santos, 100, Curitiba, Paraná 81531-980, BrazilRush Alzheimer’s Disease Center, Rush University Medical Center, 1750 West Harrison Street, Chicago, IL 60612, USACenter for Translational and Computational Neuroimmunology, Department of Neurology, Columbia University Irving Medical Center, 630 West 168th Street, PH 19, New York, NY 10032, USARush Alzheimer’s Disease Center, Rush University Medical Center, 1750 West Harrison Street, Chicago, IL 60612, USARush Alzheimer’s Disease Center, Rush University Medical Center, 1750 West Harrison Street, Chicago, IL 60612, USARush Alzheimer’s Disease Center, Rush University Medical Center, 1750 West Harrison Street, Chicago, IL 60612, USARush Alzheimer’s Disease Center, Rush University Medical Center, 1750 West Harrison Street, Chicago, IL 60612, USASummary: The TOMM40′523 poly-T repeat polymorphism (rs10524523) has been associated with cognitive decline and Alzheimer's disease (AD) progression. Challenges in processing whole-genome sequencing (WGS) data traditionally require additional PCR and targeted sequencing assays to genotype these polymorphisms. We introduce a computational pipeline that integrates multiple short tandem repeat (STR) detection tools in an ensemble machine learning model using XGBoost. Using a sample of 1,202 participants from 4 cohort studies, we benchmarked our method against PCR-based measures. Our ensemble model outperformed individual STR tools, improving repeat length estimation accuracy (R2 = 0.92) and achieving an accuracy rate of 93.2% compared with PCR-derived genotypes. Additionally, we validated our WGS-derived genotypes by replicating previously reported associations between TOMM40′523 variants and cognitive decline. Our computational genotyping tool is a scalable and reliable alternative to PCR-based assays, enabling broader investigations of TOMM40 variation in studies with WGS data.http://www.sciencedirect.com/science/article/pii/S2666247725000910TOMM40APOEmachine learningXGBoostWGS |
| spellingShingle | Ricardo A. Vialle Lei Yu Yan Li Roberto T. Raittz Jose M. Farfel Philip L. De Jager Julie A. Schneider Lisa L. Barnes Shinya Tasaki David A. Bennett Genotyping TOMM40′523 poly-T polymorphisms using whole-genome sequencing HGG Advances TOMM40 APOE machine learning XGBoost WGS |
| title | Genotyping TOMM40′523 poly-T polymorphisms using whole-genome sequencing |
| title_full | Genotyping TOMM40′523 poly-T polymorphisms using whole-genome sequencing |
| title_fullStr | Genotyping TOMM40′523 poly-T polymorphisms using whole-genome sequencing |
| title_full_unstemmed | Genotyping TOMM40′523 poly-T polymorphisms using whole-genome sequencing |
| title_short | Genotyping TOMM40′523 poly-T polymorphisms using whole-genome sequencing |
| title_sort | genotyping tomm40 523 poly t polymorphisms using whole genome sequencing |
| topic | TOMM40 APOE machine learning XGBoost WGS |
| url | http://www.sciencedirect.com/science/article/pii/S2666247725000910 |
| work_keys_str_mv | AT ricardoavialle genotypingtomm40523polytpolymorphismsusingwholegenomesequencing AT leiyu genotypingtomm40523polytpolymorphismsusingwholegenomesequencing AT yanli genotypingtomm40523polytpolymorphismsusingwholegenomesequencing AT robertotraittz genotypingtomm40523polytpolymorphismsusingwholegenomesequencing AT josemfarfel genotypingtomm40523polytpolymorphismsusingwholegenomesequencing AT philipldejager genotypingtomm40523polytpolymorphismsusingwholegenomesequencing AT julieaschneider genotypingtomm40523polytpolymorphismsusingwholegenomesequencing AT lisalbarnes genotypingtomm40523polytpolymorphismsusingwholegenomesequencing AT shinyatasaki genotypingtomm40523polytpolymorphismsusingwholegenomesequencing AT davidabennett genotypingtomm40523polytpolymorphismsusingwholegenomesequencing |