SyntheVAEiser: augmenting traditional machine learning methods with VAE-based gene expression sample generation for improved cancer subtype predictions

Abstract The accuracy of machine learning methods is often limited by the amount of training data that is available. We proposed to improve machine learning training regimes by augmenting datasets with synthetically generated samples. We present a method for synthesizing gene expression samples and...

Full description

Saved in:

Bibliographic Details
Main Authors:	Brian Karlberg, Raphael Kirchgaessner, Jordan Lee, Matthew Peterkort, Liam Beckman, Jeremy Goecks, Kyle Ellrott
Format:	Article
Language:	English
Published:	BMC 2024-12-01
Series:	Genome Biology
Subjects:	Sample synthesis Synthetic data Data augmentation Generative modeling Feature engineering Cancer subtyping
Online Access:	https://doi.org/10.1186/s13059-024-03431-3
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Abstract The accuracy of machine learning methods is often limited by the amount of training data that is available. We proposed to improve machine learning training regimes by augmenting datasets with synthetically generated samples. We present a method for synthesizing gene expression samples and test the system’s capabilities for improving the accuracy of categorical prediction of cancer subtypes. We developed SyntheVAEiser, a variational autoencoder based tool that was trained and tested on over 8000 cancer samples. We have shown that this technique can be used to augment machine learning tasks and increase performance of recognition of underrepresented cohorts.
ISSN:	1474-760X

SyntheVAEiser: augmenting traditional machine learning methods with VAE-based gene expression sample generation for improved cancer subtype predictions

Similar Items