Deciphering the proteome of Escherichia coli K-12: Integrating transcriptomics and machine learning to annotate hypothetical proteins
Omics technologies have led to the discovery of a vast number of proteins that are expressed but have no functional annotation - so called hypothetical proteins (HPs). Even in the best-studied model organism Escherichia coli K-12, over 2 % of the proteome remains uncharacterized. This knowledge gap...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-01-01
|
| Series: | Computational and Structural Biotechnology Journal |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2001037025003009 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849236243198509056 |
|---|---|
| author | Sagarika Chakraborty Zachary Ardern Habibu Aliyu Anne-Kristin Kaster |
| author_facet | Sagarika Chakraborty Zachary Ardern Habibu Aliyu Anne-Kristin Kaster |
| author_sort | Sagarika Chakraborty |
| collection | DOAJ |
| description | Omics technologies have led to the discovery of a vast number of proteins that are expressed but have no functional annotation - so called hypothetical proteins (HPs). Even in the best-studied model organism Escherichia coli K-12, over 2 % of the proteome remains uncharacterized. This knowledge gap becomes even worse when looking at microbial dark matter. However, knowing the functions of proteins is crucial for elucidating cellular and metabolic processes and harnessing biotechnological potentials. Here, we employed machine learning to decipher the transcriptional regulatory network of E. coli K-12, as well as other in silico tools to assign functions to uncharacterized HPs. We further provide experimental validation of in silico predicted functions for three HP-encoding genes (yhdN, yeaC and ydgH) as proof of concept, by analyzing growth patterns of deletion mutants compared to the wild type, as well as their transcriptional responses to specific conditions. This study demonstrates that the use of Big Omics Data in combination with Artificial Intelligence and experimental controls is a powerful approach to illuminate functional dark matter. |
| format | Article |
| id | doaj-art-072f7fe0576b4685a43eaeb1ea177e61 |
| institution | Kabale University |
| issn | 2001-0370 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Computational and Structural Biotechnology Journal |
| spelling | doaj-art-072f7fe0576b4685a43eaeb1ea177e612025-08-20T04:02:23ZengElsevierComputational and Structural Biotechnology Journal2001-03702025-01-01273565357810.1016/j.csbj.2025.07.036Deciphering the proteome of Escherichia coli K-12: Integrating transcriptomics and machine learning to annotate hypothetical proteinsSagarika Chakraborty0Zachary Ardern1Habibu Aliyu2Anne-Kristin Kaster3Institute for Biological Interfaces 5 (IBG-5), Biotechnology and Microbial Genetics, Karlsruhe Institute of Technology (KIT), Hermann-von-Helmholtz-Platz 1, Eggenstein-Leopoldshafen 76344, GermanyInstitute for Biological Interfaces 5 (IBG-5), Biotechnology and Microbial Genetics, Karlsruhe Institute of Technology (KIT), Hermann-von-Helmholtz-Platz 1, Eggenstein-Leopoldshafen 76344, Germany; Wellcome Trust Sanger Institute, Hinxton, Saffron Walden CB10 1RQ, United KingdomInstitute for Biological Interfaces 5 (IBG-5), Biotechnology and Microbial Genetics, Karlsruhe Institute of Technology (KIT), Hermann-von-Helmholtz-Platz 1, Eggenstein-Leopoldshafen 76344, GermanyInstitute for Biological Interfaces 5 (IBG-5), Biotechnology and Microbial Genetics, Karlsruhe Institute of Technology (KIT), Hermann-von-Helmholtz-Platz 1, Eggenstein-Leopoldshafen 76344, Germany; Institute for Applied Biosciences (IAB), Karlsruhe Institute of Technology (KIT), Kaiserstraße 12, Karlsruhe 76131, Germany; Corresponding author at: Institute for Biological Interfaces 5 (IBG-5), Biotechnology and Microbial Genetics, Karlsruhe Institute of Technology (KIT), Hermann-von-Helmholtz-Platz 1, Eggenstein-Leopoldshafen 76344, Germany.Omics technologies have led to the discovery of a vast number of proteins that are expressed but have no functional annotation - so called hypothetical proteins (HPs). Even in the best-studied model organism Escherichia coli K-12, over 2 % of the proteome remains uncharacterized. This knowledge gap becomes even worse when looking at microbial dark matter. However, knowing the functions of proteins is crucial for elucidating cellular and metabolic processes and harnessing biotechnological potentials. Here, we employed machine learning to decipher the transcriptional regulatory network of E. coli K-12, as well as other in silico tools to assign functions to uncharacterized HPs. We further provide experimental validation of in silico predicted functions for three HP-encoding genes (yhdN, yeaC and ydgH) as proof of concept, by analyzing growth patterns of deletion mutants compared to the wild type, as well as their transcriptional responses to specific conditions. This study demonstrates that the use of Big Omics Data in combination with Artificial Intelligence and experimental controls is a powerful approach to illuminate functional dark matter.http://www.sciencedirect.com/science/article/pii/S2001037025003009Artificial intelligenceBig omics dataFunctional annotation of proteinsFunctional dark matterIndependent Component Analysis (ICA) |
| spellingShingle | Sagarika Chakraborty Zachary Ardern Habibu Aliyu Anne-Kristin Kaster Deciphering the proteome of Escherichia coli K-12: Integrating transcriptomics and machine learning to annotate hypothetical proteins Computational and Structural Biotechnology Journal Artificial intelligence Big omics data Functional annotation of proteins Functional dark matter Independent Component Analysis (ICA) |
| title | Deciphering the proteome of Escherichia coli K-12: Integrating transcriptomics and machine learning to annotate hypothetical proteins |
| title_full | Deciphering the proteome of Escherichia coli K-12: Integrating transcriptomics and machine learning to annotate hypothetical proteins |
| title_fullStr | Deciphering the proteome of Escherichia coli K-12: Integrating transcriptomics and machine learning to annotate hypothetical proteins |
| title_full_unstemmed | Deciphering the proteome of Escherichia coli K-12: Integrating transcriptomics and machine learning to annotate hypothetical proteins |
| title_short | Deciphering the proteome of Escherichia coli K-12: Integrating transcriptomics and machine learning to annotate hypothetical proteins |
| title_sort | deciphering the proteome of escherichia coli k 12 integrating transcriptomics and machine learning to annotate hypothetical proteins |
| topic | Artificial intelligence Big omics data Functional annotation of proteins Functional dark matter Independent Component Analysis (ICA) |
| url | http://www.sciencedirect.com/science/article/pii/S2001037025003009 |
| work_keys_str_mv | AT sagarikachakraborty decipheringtheproteomeofescherichiacolik12integratingtranscriptomicsandmachinelearningtoannotatehypotheticalproteins AT zacharyardern decipheringtheproteomeofescherichiacolik12integratingtranscriptomicsandmachinelearningtoannotatehypotheticalproteins AT habibualiyu decipheringtheproteomeofescherichiacolik12integratingtranscriptomicsandmachinelearningtoannotatehypotheticalproteins AT annekristinkaster decipheringtheproteomeofescherichiacolik12integratingtranscriptomicsandmachinelearningtoannotatehypotheticalproteins |