Genotyping TOMM40′523 poly-T polymorphisms using whole-genome sequencing

Summary: The TOMM40′523 poly-T repeat polymorphism (rs10524523) has been associated with cognitive decline and Alzheimer's disease (AD) progression. Challenges in processing whole-genome sequencing (WGS) data traditionally require additional PCR and targeted sequencing assays to genotype these...

Full description

Saved in:
Bibliographic Details
Main Authors: Ricardo A. Vialle, Lei Yu, Yan Li, Roberto T. Raittz, Jose M. Farfel, Philip L. De Jager, Julie A. Schneider, Lisa L. Barnes, Shinya Tasaki, David A. Bennett
Format: Article
Language:English
Published: Elsevier 2025-10-01
Series:HGG Advances
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666247725000910
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849229226074439680
author Ricardo A. Vialle
Lei Yu
Yan Li
Roberto T. Raittz
Jose M. Farfel
Philip L. De Jager
Julie A. Schneider
Lisa L. Barnes
Shinya Tasaki
David A. Bennett
author_facet Ricardo A. Vialle
Lei Yu
Yan Li
Roberto T. Raittz
Jose M. Farfel
Philip L. De Jager
Julie A. Schneider
Lisa L. Barnes
Shinya Tasaki
David A. Bennett
author_sort Ricardo A. Vialle
collection DOAJ
description Summary: The TOMM40′523 poly-T repeat polymorphism (rs10524523) has been associated with cognitive decline and Alzheimer's disease (AD) progression. Challenges in processing whole-genome sequencing (WGS) data traditionally require additional PCR and targeted sequencing assays to genotype these polymorphisms. We introduce a computational pipeline that integrates multiple short tandem repeat (STR) detection tools in an ensemble machine learning model using XGBoost. Using a sample of 1,202 participants from 4 cohort studies, we benchmarked our method against PCR-based measures. Our ensemble model outperformed individual STR tools, improving repeat length estimation accuracy (R2 = 0.92) and achieving an accuracy rate of 93.2% compared with PCR-derived genotypes. Additionally, we validated our WGS-derived genotypes by replicating previously reported associations between TOMM40′523 variants and cognitive decline. Our computational genotyping tool is a scalable and reliable alternative to PCR-based assays, enabling broader investigations of TOMM40 variation in studies with WGS data.
format Article
id doaj-art-42b9808aa29842e3a2ca7f2c857c8b65
institution Kabale University
issn 2666-2477
language English
publishDate 2025-10-01
publisher Elsevier
record_format Article
series HGG Advances
spelling doaj-art-42b9808aa29842e3a2ca7f2c857c8b652025-08-22T04:58:18ZengElsevierHGG Advances2666-24772025-10-016410048810.1016/j.xhgg.2025.100488Genotyping TOMM40′523 poly-T polymorphisms using whole-genome sequencingRicardo A. Vialle0Lei Yu1Yan Li2Roberto T. Raittz3Jose M. Farfel4Philip L. De Jager5Julie A. Schneider6Lisa L. Barnes7Shinya Tasaki8David A. Bennett9Rush Alzheimer’s Disease Center, Rush University Medical Center, 1750 West Harrison Street, Chicago, IL 60612, USA; Graduate Program in Bioinformatics, Professional and Technical Education Sector (SEPT), Universidade Federal do Paraná (UFPR), R. Dr. Alcides Vieira Arcoverde 1225, Curitiba, Paraná 81520-260, Brazil; Corresponding authorRush Alzheimer’s Disease Center, Rush University Medical Center, 1750 West Harrison Street, Chicago, IL 60612, USARush Alzheimer’s Disease Center, Rush University Medical Center, 1750 West Harrison Street, Chicago, IL 60612, USAGraduate Program in Bioinformatics, Professional and Technical Education Sector (SEPT), Universidade Federal do Paraná (UFPR), R. Dr. Alcides Vieira Arcoverde 1225, Curitiba, Paraná 81520-260, Brazil; Department of Biochemistry, Universidade Federal do Paraná (UFPR), Avenida Coronel Francisco H. dos Santos, 100, Curitiba, Paraná 81531-980, BrazilRush Alzheimer’s Disease Center, Rush University Medical Center, 1750 West Harrison Street, Chicago, IL 60612, USACenter for Translational and Computational Neuroimmunology, Department of Neurology, Columbia University Irving Medical Center, 630 West 168th Street, PH 19, New York, NY 10032, USARush Alzheimer’s Disease Center, Rush University Medical Center, 1750 West Harrison Street, Chicago, IL 60612, USARush Alzheimer’s Disease Center, Rush University Medical Center, 1750 West Harrison Street, Chicago, IL 60612, USARush Alzheimer’s Disease Center, Rush University Medical Center, 1750 West Harrison Street, Chicago, IL 60612, USARush Alzheimer’s Disease Center, Rush University Medical Center, 1750 West Harrison Street, Chicago, IL 60612, USASummary: The TOMM40′523 poly-T repeat polymorphism (rs10524523) has been associated with cognitive decline and Alzheimer's disease (AD) progression. Challenges in processing whole-genome sequencing (WGS) data traditionally require additional PCR and targeted sequencing assays to genotype these polymorphisms. We introduce a computational pipeline that integrates multiple short tandem repeat (STR) detection tools in an ensemble machine learning model using XGBoost. Using a sample of 1,202 participants from 4 cohort studies, we benchmarked our method against PCR-based measures. Our ensemble model outperformed individual STR tools, improving repeat length estimation accuracy (R2 = 0.92) and achieving an accuracy rate of 93.2% compared with PCR-derived genotypes. Additionally, we validated our WGS-derived genotypes by replicating previously reported associations between TOMM40′523 variants and cognitive decline. Our computational genotyping tool is a scalable and reliable alternative to PCR-based assays, enabling broader investigations of TOMM40 variation in studies with WGS data.http://www.sciencedirect.com/science/article/pii/S2666247725000910TOMM40APOEmachine learningXGBoostWGS
spellingShingle Ricardo A. Vialle
Lei Yu
Yan Li
Roberto T. Raittz
Jose M. Farfel
Philip L. De Jager
Julie A. Schneider
Lisa L. Barnes
Shinya Tasaki
David A. Bennett
Genotyping TOMM40′523 poly-T polymorphisms using whole-genome sequencing
HGG Advances
TOMM40
APOE
machine learning
XGBoost
WGS
title Genotyping TOMM40′523 poly-T polymorphisms using whole-genome sequencing
title_full Genotyping TOMM40′523 poly-T polymorphisms using whole-genome sequencing
title_fullStr Genotyping TOMM40′523 poly-T polymorphisms using whole-genome sequencing
title_full_unstemmed Genotyping TOMM40′523 poly-T polymorphisms using whole-genome sequencing
title_short Genotyping TOMM40′523 poly-T polymorphisms using whole-genome sequencing
title_sort genotyping tomm40 523 poly t polymorphisms using whole genome sequencing
topic TOMM40
APOE
machine learning
XGBoost
WGS
url http://www.sciencedirect.com/science/article/pii/S2666247725000910
work_keys_str_mv AT ricardoavialle genotypingtomm40523polytpolymorphismsusingwholegenomesequencing
AT leiyu genotypingtomm40523polytpolymorphismsusingwholegenomesequencing
AT yanli genotypingtomm40523polytpolymorphismsusingwholegenomesequencing
AT robertotraittz genotypingtomm40523polytpolymorphismsusingwholegenomesequencing
AT josemfarfel genotypingtomm40523polytpolymorphismsusingwholegenomesequencing
AT philipldejager genotypingtomm40523polytpolymorphismsusingwholegenomesequencing
AT julieaschneider genotypingtomm40523polytpolymorphismsusingwholegenomesequencing
AT lisalbarnes genotypingtomm40523polytpolymorphismsusingwholegenomesequencing
AT shinyatasaki genotypingtomm40523polytpolymorphismsusingwholegenomesequencing
AT davidabennett genotypingtomm40523polytpolymorphismsusingwholegenomesequencing