Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia.

Hybridization and gene flow between species appears to be common. Even though it is clear that hybridization is widespread across all surveyed taxonomic groups, the magnitude and consequences of introgression are still largely unknown. Thus it is crucial to develop the statistical machinery required...

Full description

Saved in:
Bibliographic Details
Main Authors: Daniel R Schrider, Julien Ayroles, Daniel R Matute, Andrew D Kern
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-04-01
Series:PLoS Genetics
Online Access:https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1007341&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841527077641125888
author Daniel R Schrider
Julien Ayroles
Daniel R Matute
Andrew D Kern
author_facet Daniel R Schrider
Julien Ayroles
Daniel R Matute
Andrew D Kern
author_sort Daniel R Schrider
collection DOAJ
description Hybridization and gene flow between species appears to be common. Even though it is clear that hybridization is widespread across all surveyed taxonomic groups, the magnitude and consequences of introgression are still largely unknown. Thus it is crucial to develop the statistical machinery required to uncover which genomic regions have recently acquired haplotypes via introgression from a sister population. We developed a novel machine learning framework, called FILET (Finding Introgressed Loci via Extra-Trees) capable of revealing genomic introgression with far greater power than competing methods. FILET works by combining information from a number of population genetic summary statistics, including several new statistics that we introduce, that capture patterns of variation across two populations. We show that FILET is able to identify loci that have experienced gene flow between related species with high accuracy, and in most situations can correctly infer which population was the donor and which was the recipient. Here we describe a data set of outbred diploid Drosophila sechellia genomes, and combine them with data from D. simulans to examine recent introgression between these species using FILET. Although we find that these populations may have split more recently than previously appreciated, FILET confirms that there has indeed been appreciable recent introgression (some of which might have been adaptive) between these species, and reveals that this gene flow is primarily in the direction of D. simulans to D. sechellia.
format Article
id doaj-art-12e9ab4401e245468fea8f17a3ebf529
institution Kabale University
issn 1553-7390
1553-7404
language English
publishDate 2018-04-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Genetics
spelling doaj-art-12e9ab4401e245468fea8f17a3ebf5292025-01-16T05:31:14ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042018-04-01144e100734110.1371/journal.pgen.1007341Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia.Daniel R SchriderJulien AyrolesDaniel R MatuteAndrew D KernHybridization and gene flow between species appears to be common. Even though it is clear that hybridization is widespread across all surveyed taxonomic groups, the magnitude and consequences of introgression are still largely unknown. Thus it is crucial to develop the statistical machinery required to uncover which genomic regions have recently acquired haplotypes via introgression from a sister population. We developed a novel machine learning framework, called FILET (Finding Introgressed Loci via Extra-Trees) capable of revealing genomic introgression with far greater power than competing methods. FILET works by combining information from a number of population genetic summary statistics, including several new statistics that we introduce, that capture patterns of variation across two populations. We show that FILET is able to identify loci that have experienced gene flow between related species with high accuracy, and in most situations can correctly infer which population was the donor and which was the recipient. Here we describe a data set of outbred diploid Drosophila sechellia genomes, and combine them with data from D. simulans to examine recent introgression between these species using FILET. Although we find that these populations may have split more recently than previously appreciated, FILET confirms that there has indeed been appreciable recent introgression (some of which might have been adaptive) between these species, and reveals that this gene flow is primarily in the direction of D. simulans to D. sechellia.https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1007341&type=printable
spellingShingle Daniel R Schrider
Julien Ayroles
Daniel R Matute
Andrew D Kern
Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia.
PLoS Genetics
title Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia.
title_full Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia.
title_fullStr Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia.
title_full_unstemmed Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia.
title_short Supervised machine learning reveals introgressed loci in the genomes of Drosophila simulans and D. sechellia.
title_sort supervised machine learning reveals introgressed loci in the genomes of drosophila simulans and d sechellia
url https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1007341&type=printable
work_keys_str_mv AT danielrschrider supervisedmachinelearningrevealsintrogressedlociinthegenomesofdrosophilasimulansanddsechellia
AT julienayroles supervisedmachinelearningrevealsintrogressedlociinthegenomesofdrosophilasimulansanddsechellia
AT danielrmatute supervisedmachinelearningrevealsintrogressedlociinthegenomesofdrosophilasimulansanddsechellia
AT andrewdkern supervisedmachinelearningrevealsintrogressedlociinthegenomesofdrosophilasimulansanddsechellia