Sex identification in rainbow trout using genomic information and machine learning

Abstract Sex identification in farmed fish is important for the management of fish stocks and breeding programs, but identification based on visual characteristics is typically difficult or impossible in juvenile or premature fish. The amount of genomic data obtained from farmed fish is rapidly grow...

Full description

Saved in:
Bibliographic Details
Main Authors: Andrei A. Kudinov, Antti Kause
Format: Article
Language:deu
Published: BMC 2024-12-01
Series:Genetics Selection Evolution
Online Access:https://doi.org/10.1186/s12711-024-00944-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559872077824000
author Andrei A. Kudinov
Antti Kause
author_facet Andrei A. Kudinov
Antti Kause
author_sort Andrei A. Kudinov
collection DOAJ
description Abstract Sex identification in farmed fish is important for the management of fish stocks and breeding programs, but identification based on visual characteristics is typically difficult or impossible in juvenile or premature fish. The amount of genomic data obtained from farmed fish is rapidly growing with the implementation of genomic selection in aquaculture. In comparison to mammals and birds, ray-finned fishes exhibit a greater diversity of sex determination systems, with an absence of conserved genomic regions. A group of genomic markers located on a standard genotyping array has been reported to potentially be linked with sex determination in rainbow trout. However, the set of markers suitable for sex identification may vary between populations. Sex identification from genomic data is usually performed using probabilistic methods, where suitable markers are known beforehand. In our study, we demonstrated the use of the Extreme Gradient Boosting approach from the supervised machine learning gradient boost framework to predict sex from unimputed genomic data, when the suitability of the markers was unknown a priori. The accuracy of the method was assessed using four simulated datasets with different genotyping error rates and one real dataset from the Finnish Rainbow Trout Breeding Program. The method showed high prediction quality on both simulated and real datasets. For simulated datasets with low (5%) and high (50%) genotyping error rates, the accuracies were 1.0 and 0.60, respectively. In the real data, the method achieved a prediction accuracy of 98%, which is suitable for routine use.
format Article
id doaj-art-36be5029bbee4d0ea1338ac7a36c2a2c
institution Kabale University
issn 1297-9686
language deu
publishDate 2024-12-01
publisher BMC
record_format Article
series Genetics Selection Evolution
spelling doaj-art-36be5029bbee4d0ea1338ac7a36c2a2c2025-01-05T12:05:02ZdeuBMCGenetics Selection Evolution1297-96862024-12-015611810.1186/s12711-024-00944-0Sex identification in rainbow trout using genomic information and machine learningAndrei A. Kudinov0Antti Kause1Natural Resources Institute FinlandNatural Resources Institute FinlandAbstract Sex identification in farmed fish is important for the management of fish stocks and breeding programs, but identification based on visual characteristics is typically difficult or impossible in juvenile or premature fish. The amount of genomic data obtained from farmed fish is rapidly growing with the implementation of genomic selection in aquaculture. In comparison to mammals and birds, ray-finned fishes exhibit a greater diversity of sex determination systems, with an absence of conserved genomic regions. A group of genomic markers located on a standard genotyping array has been reported to potentially be linked with sex determination in rainbow trout. However, the set of markers suitable for sex identification may vary between populations. Sex identification from genomic data is usually performed using probabilistic methods, where suitable markers are known beforehand. In our study, we demonstrated the use of the Extreme Gradient Boosting approach from the supervised machine learning gradient boost framework to predict sex from unimputed genomic data, when the suitability of the markers was unknown a priori. The accuracy of the method was assessed using four simulated datasets with different genotyping error rates and one real dataset from the Finnish Rainbow Trout Breeding Program. The method showed high prediction quality on both simulated and real datasets. For simulated datasets with low (5%) and high (50%) genotyping error rates, the accuracies were 1.0 and 0.60, respectively. In the real data, the method achieved a prediction accuracy of 98%, which is suitable for routine use.https://doi.org/10.1186/s12711-024-00944-0
spellingShingle Andrei A. Kudinov
Antti Kause
Sex identification in rainbow trout using genomic information and machine learning
Genetics Selection Evolution
title Sex identification in rainbow trout using genomic information and machine learning
title_full Sex identification in rainbow trout using genomic information and machine learning
title_fullStr Sex identification in rainbow trout using genomic information and machine learning
title_full_unstemmed Sex identification in rainbow trout using genomic information and machine learning
title_short Sex identification in rainbow trout using genomic information and machine learning
title_sort sex identification in rainbow trout using genomic information and machine learning
url https://doi.org/10.1186/s12711-024-00944-0
work_keys_str_mv AT andreiakudinov sexidentificationinrainbowtroutusinggenomicinformationandmachinelearning
AT anttikause sexidentificationinrainbowtroutusinggenomicinformationandmachinelearning