Human essential gene identification based on feature fusion and feature screening

Abstract Essential genes are necessary to sustain the life of a species under adequate nutritional conditions. These genes have attracted significant attention for their potential as drug targets, especially in developing broad‐spectrum antibacterial drugs. However, studying essential genes remains...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhao‐Yue Zhang, Yue‐Er Fan, Cheng‐Bing Huang, Meng‐Ze Du
Format: Article
Language:English
Published: Wiley 2024-12-01
Series:IET Systems Biology
Subjects:
Online Access:https://doi.org/10.1049/syb2.12105
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846110632428961792
author Zhao‐Yue Zhang
Yue‐Er Fan
Cheng‐Bing Huang
Meng‐Ze Du
author_facet Zhao‐Yue Zhang
Yue‐Er Fan
Cheng‐Bing Huang
Meng‐Ze Du
author_sort Zhao‐Yue Zhang
collection DOAJ
description Abstract Essential genes are necessary to sustain the life of a species under adequate nutritional conditions. These genes have attracted significant attention for their potential as drug targets, especially in developing broad‐spectrum antibacterial drugs. However, studying essential genes remains challenging due to their variability in specific environmental conditions. In this study, the authors aim to develop a powerful prediction model for identifying essential genes in humans. The authors first obtained the essential gene data from human cancer cell lines and characterised gene sequences using 7 feature encoding methods such as Kmer, the Composition of K‐spaced Nucleic Acid Pairs, and Z‐curve. Subsequently, feature fusion and feature optimisation strategies were employed to select the impactful features. Finally, machine learning algorithms were applied to construct the prediction models and evaluate their performance. The single‐feature‐based model achieved the highest area under the Receiver Operating Characteristic curve (AUC) of 0.830. After fusing and filtering these features, the classical machine learning models achieved the highest AUC at 0.823 while the deep learning model reached 0.860. Results obtained by the authors show that compared to using individual features, feature fusion and feature optimisation strategies significantly improved model performance. Moreover, the study provided an advantageous method for essential gene identification compared to other methods.
format Article
id doaj-art-b0b78c89d4a84d1e9b344f243a931161
institution Kabale University
issn 1751-8849
1751-8857
language English
publishDate 2024-12-01
publisher Wiley
record_format Article
series IET Systems Biology
spelling doaj-art-b0b78c89d4a84d1e9b344f243a9311612024-12-23T18:41:56ZengWileyIET Systems Biology1751-88491751-88572024-12-0118622723710.1049/syb2.12105Human essential gene identification based on feature fusion and feature screeningZhao‐Yue Zhang0Yue‐Er Fan1Cheng‐Bing Huang2Meng‐Ze Du3School of Healthcare Technology Chengdu Neusoft University Chengdu ChinaSchool of Life Science and Technology University of Electronic Science and Technology of China Chengdu ChinaSchool of Computer Science and Technology ABa Teachers University Chengdu ChinaSchool of Healthcare Technology Chengdu Neusoft University Chengdu ChinaAbstract Essential genes are necessary to sustain the life of a species under adequate nutritional conditions. These genes have attracted significant attention for their potential as drug targets, especially in developing broad‐spectrum antibacterial drugs. However, studying essential genes remains challenging due to their variability in specific environmental conditions. In this study, the authors aim to develop a powerful prediction model for identifying essential genes in humans. The authors first obtained the essential gene data from human cancer cell lines and characterised gene sequences using 7 feature encoding methods such as Kmer, the Composition of K‐spaced Nucleic Acid Pairs, and Z‐curve. Subsequently, feature fusion and feature optimisation strategies were employed to select the impactful features. Finally, machine learning algorithms were applied to construct the prediction models and evaluate their performance. The single‐feature‐based model achieved the highest area under the Receiver Operating Characteristic curve (AUC) of 0.830. After fusing and filtering these features, the classical machine learning models achieved the highest AUC at 0.823 while the deep learning model reached 0.860. Results obtained by the authors show that compared to using individual features, feature fusion and feature optimisation strategies significantly improved model performance. Moreover, the study provided an advantageous method for essential gene identification compared to other methods.https://doi.org/10.1049/syb2.12105bioinformaticsessential genefeature selectionneural nets
spellingShingle Zhao‐Yue Zhang
Yue‐Er Fan
Cheng‐Bing Huang
Meng‐Ze Du
Human essential gene identification based on feature fusion and feature screening
IET Systems Biology
bioinformatics
essential gene
feature selection
neural nets
title Human essential gene identification based on feature fusion and feature screening
title_full Human essential gene identification based on feature fusion and feature screening
title_fullStr Human essential gene identification based on feature fusion and feature screening
title_full_unstemmed Human essential gene identification based on feature fusion and feature screening
title_short Human essential gene identification based on feature fusion and feature screening
title_sort human essential gene identification based on feature fusion and feature screening
topic bioinformatics
essential gene
feature selection
neural nets
url https://doi.org/10.1049/syb2.12105
work_keys_str_mv AT zhaoyuezhang humanessentialgeneidentificationbasedonfeaturefusionandfeaturescreening
AT yueerfan humanessentialgeneidentificationbasedonfeaturefusionandfeaturescreening
AT chengbinghuang humanessentialgeneidentificationbasedonfeaturefusionandfeaturescreening
AT mengzedu humanessentialgeneidentificationbasedonfeaturefusionandfeaturescreening