Deep learning captures the effect of epistasis in multifactorial diseases

BackgroundPolygenic risk score (PRS) prediction is widely used to assess the risk of diagnosis and progression of many diseases. Routinely, the weights of individual SNPs are estimated by the linear regression model that assumes independent and linear contribution of each SNP to the phenotype. Howev...

Full description

Saved in:
Bibliographic Details
Main Authors: Vladislav Perelygin, Alexey Kamelin, Nikita Syzrantsev, Layal Shaheen, Anna Kim, Nikolay Plotnikov, Anna Ilinskaya, Valery Ilinsky, Alexander Rakitko, Maria Poptsova
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-01-01
Series:Frontiers in Medicine
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fmed.2024.1479717/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841556907447287808
author Vladislav Perelygin
Alexey Kamelin
Alexey Kamelin
Nikita Syzrantsev
Layal Shaheen
Layal Shaheen
Anna Kim
Nikolay Plotnikov
Anna Ilinskaya
Valery Ilinsky
Alexander Rakitko
Alexander Rakitko
Maria Poptsova
author_facet Vladislav Perelygin
Alexey Kamelin
Alexey Kamelin
Nikita Syzrantsev
Layal Shaheen
Layal Shaheen
Anna Kim
Nikolay Plotnikov
Anna Ilinskaya
Valery Ilinsky
Alexander Rakitko
Alexander Rakitko
Maria Poptsova
author_sort Vladislav Perelygin
collection DOAJ
description BackgroundPolygenic risk score (PRS) prediction is widely used to assess the risk of diagnosis and progression of many diseases. Routinely, the weights of individual SNPs are estimated by the linear regression model that assumes independent and linear contribution of each SNP to the phenotype. However, for complex multifactorial diseases such as Alzheimer’s disease, diabetes, cardiovascular disease, cancer, and others, association between individual SNPs and disease could be non-linear due to epistatic interactions. The aim of the presented study is to explore the power of non-linear machine learning algorithms and deep learning models to predict the risk of multifactorial diseases with epistasis.MethodsSimulated data with 2- and 3-loci interactions and tested three different models of epistasis: additive, multiplicative and threshold, were generated using the GAMETES. Penetrance tables were generated using PyTOXO package. For machine learning methods we used multilayer perceptron (MLP), convolutional neural network (CNN) and recurrent neural network (RNN), Lasso regression, random forest and gradient boosting models. Performance of machine learning models were assessed using accuracy, AUC-ROC, AUC-PR, recall, precision, and F1 score.ResultsFirst, we tested ensemble tree methods and deep learning neural networks against LASSO linear regression model on simulated data with different types and strength of epistasis. The results showed that with the increase of strength of epistasis effect, non-linear models significantly outperform linear. Then the higher performance of non-linear models over linear was confirmed on real genetic data for multifactorial phenotypes such as obesity, type 1 diabetes, and psoriasis. From non-linear models, gradient boosting appeared to be the best model in obesity and psoriasis while deep learning methods significantly outperform linear approaches in type 1 diabetes.ConclusionOverall, our study underscores the efficacy of non-linear models and deep learning approaches in more accurately accounting for the effects of epistasis in simulations with specific configurations and in the context of certain diseases.
format Article
id doaj-art-f7cb4ed1449d4d199f152b7b272c9c0a
institution Kabale University
issn 2296-858X
language English
publishDate 2025-01-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Medicine
spelling doaj-art-f7cb4ed1449d4d199f152b7b272c9c0a2025-01-07T05:24:05ZengFrontiers Media S.A.Frontiers in Medicine2296-858X2025-01-011110.3389/fmed.2024.14797171479717Deep learning captures the effect of epistasis in multifactorial diseasesVladislav Perelygin0Alexey Kamelin1Alexey Kamelin2Nikita Syzrantsev3Layal Shaheen4Layal Shaheen5Anna Kim6Nikolay Plotnikov7Anna Ilinskaya8Valery Ilinsky9Alexander Rakitko10Alexander Rakitko11Maria Poptsova12International Laboratory of Bioinformatics, AI and Digital Sciences Institute, Faculty of Computer Science, HSE University, Moscow, RussiaInternational Laboratory of Bioinformatics, AI and Digital Sciences Institute, Faculty of Computer Science, HSE University, Moscow, RussiaGenotek Ltd., Moscow, RussiaGenotek Ltd., Moscow, RussiaGenotek Ltd., Moscow, RussiaPhystech School of Biological and Medical Physics, Moscow Institute of Physics and Technology, Moscow, RussiaGenotek Ltd., Moscow, RussiaGenotek Ltd., Moscow, RussiaEligens SIA, Mārupe, LatviaEligens SIA, Mārupe, LatviaInternational Laboratory of Bioinformatics, AI and Digital Sciences Institute, Faculty of Computer Science, HSE University, Moscow, RussiaGenotek Ltd., Moscow, RussiaInternational Laboratory of Bioinformatics, AI and Digital Sciences Institute, Faculty of Computer Science, HSE University, Moscow, RussiaBackgroundPolygenic risk score (PRS) prediction is widely used to assess the risk of diagnosis and progression of many diseases. Routinely, the weights of individual SNPs are estimated by the linear regression model that assumes independent and linear contribution of each SNP to the phenotype. However, for complex multifactorial diseases such as Alzheimer’s disease, diabetes, cardiovascular disease, cancer, and others, association between individual SNPs and disease could be non-linear due to epistatic interactions. The aim of the presented study is to explore the power of non-linear machine learning algorithms and deep learning models to predict the risk of multifactorial diseases with epistasis.MethodsSimulated data with 2- and 3-loci interactions and tested three different models of epistasis: additive, multiplicative and threshold, were generated using the GAMETES. Penetrance tables were generated using PyTOXO package. For machine learning methods we used multilayer perceptron (MLP), convolutional neural network (CNN) and recurrent neural network (RNN), Lasso regression, random forest and gradient boosting models. Performance of machine learning models were assessed using accuracy, AUC-ROC, AUC-PR, recall, precision, and F1 score.ResultsFirst, we tested ensemble tree methods and deep learning neural networks against LASSO linear regression model on simulated data with different types and strength of epistasis. The results showed that with the increase of strength of epistasis effect, non-linear models significantly outperform linear. Then the higher performance of non-linear models over linear was confirmed on real genetic data for multifactorial phenotypes such as obesity, type 1 diabetes, and psoriasis. From non-linear models, gradient boosting appeared to be the best model in obesity and psoriasis while deep learning methods significantly outperform linear approaches in type 1 diabetes.ConclusionOverall, our study underscores the efficacy of non-linear models and deep learning approaches in more accurately accounting for the effects of epistasis in simulations with specific configurations and in the context of certain diseases.https://www.frontiersin.org/articles/10.3389/fmed.2024.1479717/fullpolygenic risk scoremultifactorial diseasesepistasisobesitytype 1 diabetespsoriasis
spellingShingle Vladislav Perelygin
Alexey Kamelin
Alexey Kamelin
Nikita Syzrantsev
Layal Shaheen
Layal Shaheen
Anna Kim
Nikolay Plotnikov
Anna Ilinskaya
Valery Ilinsky
Alexander Rakitko
Alexander Rakitko
Maria Poptsova
Deep learning captures the effect of epistasis in multifactorial diseases
Frontiers in Medicine
polygenic risk score
multifactorial diseases
epistasis
obesity
type 1 diabetes
psoriasis
title Deep learning captures the effect of epistasis in multifactorial diseases
title_full Deep learning captures the effect of epistasis in multifactorial diseases
title_fullStr Deep learning captures the effect of epistasis in multifactorial diseases
title_full_unstemmed Deep learning captures the effect of epistasis in multifactorial diseases
title_short Deep learning captures the effect of epistasis in multifactorial diseases
title_sort deep learning captures the effect of epistasis in multifactorial diseases
topic polygenic risk score
multifactorial diseases
epistasis
obesity
type 1 diabetes
psoriasis
url https://www.frontiersin.org/articles/10.3389/fmed.2024.1479717/full
work_keys_str_mv AT vladislavperelygin deeplearningcapturestheeffectofepistasisinmultifactorialdiseases
AT alexeykamelin deeplearningcapturestheeffectofepistasisinmultifactorialdiseases
AT alexeykamelin deeplearningcapturestheeffectofepistasisinmultifactorialdiseases
AT nikitasyzrantsev deeplearningcapturestheeffectofepistasisinmultifactorialdiseases
AT layalshaheen deeplearningcapturestheeffectofepistasisinmultifactorialdiseases
AT layalshaheen deeplearningcapturestheeffectofepistasisinmultifactorialdiseases
AT annakim deeplearningcapturestheeffectofepistasisinmultifactorialdiseases
AT nikolayplotnikov deeplearningcapturestheeffectofepistasisinmultifactorialdiseases
AT annailinskaya deeplearningcapturestheeffectofepistasisinmultifactorialdiseases
AT valeryilinsky deeplearningcapturestheeffectofepistasisinmultifactorialdiseases
AT alexanderrakitko deeplearningcapturestheeffectofepistasisinmultifactorialdiseases
AT alexanderrakitko deeplearningcapturestheeffectofepistasisinmultifactorialdiseases
AT mariapoptsova deeplearningcapturestheeffectofepistasisinmultifactorialdiseases