Technical note: Towards atmospheric compound identification in chemical ionization mass spectrometry with pesticide standards and machine learning

<p>Chemical ionization mass spectrometry (CIMS) is widely used in atmospheric chemistry studies. However, due to the complex interactions between reagent ions and target compounds, chemical understanding remains limited and compound identification difficult. In this study, we apply machine lea...

Full description

Saved in:
Bibliographic Details
Main Authors: F. Bortolussi, H. Sandström, F. Partovi, J. Mikkilä, P. Rinke, M. Rissanen
Format: Article
Language:English
Published: Copernicus Publications 2025-01-01
Series:Atmospheric Chemistry and Physics
Online Access:https://acp.copernicus.org/articles/25/685/2025/acp-25-685-2025.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841525364749238272
author F. Bortolussi
H. Sandström
F. Partovi
F. Partovi
J. Mikkilä
P. Rinke
P. Rinke
P. Rinke
P. Rinke
M. Rissanen
M. Rissanen
author_facet F. Bortolussi
H. Sandström
F. Partovi
F. Partovi
J. Mikkilä
P. Rinke
P. Rinke
P. Rinke
P. Rinke
M. Rissanen
M. Rissanen
author_sort F. Bortolussi
collection DOAJ
description <p>Chemical ionization mass spectrometry (CIMS) is widely used in atmospheric chemistry studies. However, due to the complex interactions between reagent ions and target compounds, chemical understanding remains limited and compound identification difficult. In this study, we apply machine learning to a reference dataset of pesticides in two standard solutions to build a model that can provide insights from CIMS analyses in atmospheric science. The CIMS measurements were performed with an Orbitrap mass spectrometer coupled to a thermal desorption multi-scheme chemical ionization inlet unit (TD-MION-MS) with both negative and positive ionization modes utilizing <span class="inline-formula">Br<sup>−</sup></span>, <span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M2" display="inline" overflow="scroll" dspmath="mathml"><mrow class="chem"><msubsup><mi mathvariant="normal">O</mi><mn mathvariant="normal">2</mn><mo>-</mo></msubsup></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="17pt" height="16pt" class="svg-formula" dspmath="mathimg" md5hash="0874c22b59d147314cd2f8e88d131371"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="acp-25-685-2025-ie00001.svg" width="17pt" height="16pt" src="acp-25-685-2025-ie00001.png"/></svg:svg></span></span>, <span class="inline-formula">H<sub>3</sub>O<sup>+</sup></span> and <span class="inline-formula">(CH<sub>3</sub>)<sub>2</sub>COH<sup>+</sup></span> (<span class="inline-formula">AceH<sup>+</sup></span>) as reagent ions. We then trained two machine learning methods on these data: (1) random forest (RF) for classifying if a pesticide can be detected with CIMS and (2) kernel ridge regression (KRR) for predicting the expected CIMS signals. We compared their performance on five different representations of the molecular structure: the topological fingerprint (TopFP), the molecular access system keys (MACCS), a custom descriptor based on standard molecular properties (RDKitPROP), the Coulomb matrix (CM) and the many-body tensor representation (MBTR). The results indicate that MACCS outperforms the other descriptors. Our best classification model reaches a prediction accuracy of 0.85 <span class="inline-formula">±</span> 0.02 and a receiver operating characteristic curve area of 0.91 <span class="inline-formula">±</span> 0.01. Our best regression model reaches an accuracy of 0.44 <span class="inline-formula">±</span> 0.03 logarithmic units of the signal intensity. Subsequent feature importance analysis of the classifiers reveals that the most important sub-structures are NH and OH for the negative ionization schemes and nitrogen-containing groups for the positive ionization schemes.</p>
format Article
id doaj-art-4195e9ff18dd4b3b9f8c5cff5804cff2
institution Kabale University
issn 1680-7316
1680-7324
language English
publishDate 2025-01-01
publisher Copernicus Publications
record_format Article
series Atmospheric Chemistry and Physics
spelling doaj-art-4195e9ff18dd4b3b9f8c5cff5804cff22025-01-17T14:07:43ZengCopernicus PublicationsAtmospheric Chemistry and Physics1680-73161680-73242025-01-012568570410.5194/acp-25-685-2025Technical note: Towards atmospheric compound identification in chemical ionization mass spectrometry with pesticide standards and machine learningF. Bortolussi0H. Sandström1F. Partovi2F. Partovi3J. Mikkilä4P. Rinke5P. Rinke6P. Rinke7P. Rinke8M. Rissanen9M. Rissanen10Department of Chemistry, University of Helsinki, 00560 Helsinki, FinlandDepartment of Applied Physics, Aalto University, Espoo, FinlandAerosol Physics Laboratory, Physics Unit, Tampere University, 33720 Tampere, FinlandKarsa Ltd., A. I. Virtasen aukio 1, 00560 Helsinki, FinlandKarsa Ltd., A. I. Virtasen aukio 1, 00560 Helsinki, FinlandDepartment of Applied Physics, Aalto University, Espoo, FinlandPhysics Department, TUM School of Natural Sciences, Technical University of Munich, Garching, GermanyAtomistic Modelling Center, Munich Data Science Institute, Technical University of Munich, Garching, GermanyMunich Center for Machine Learning (MCML), Munich, Germany​​​​​​​Department of Chemistry, University of Helsinki, 00560 Helsinki, FinlandAerosol Physics Laboratory, Physics Unit, Tampere University, 33720 Tampere, Finland<p>Chemical ionization mass spectrometry (CIMS) is widely used in atmospheric chemistry studies. However, due to the complex interactions between reagent ions and target compounds, chemical understanding remains limited and compound identification difficult. In this study, we apply machine learning to a reference dataset of pesticides in two standard solutions to build a model that can provide insights from CIMS analyses in atmospheric science. The CIMS measurements were performed with an Orbitrap mass spectrometer coupled to a thermal desorption multi-scheme chemical ionization inlet unit (TD-MION-MS) with both negative and positive ionization modes utilizing <span class="inline-formula">Br<sup>−</sup></span>, <span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M2" display="inline" overflow="scroll" dspmath="mathml"><mrow class="chem"><msubsup><mi mathvariant="normal">O</mi><mn mathvariant="normal">2</mn><mo>-</mo></msubsup></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="17pt" height="16pt" class="svg-formula" dspmath="mathimg" md5hash="0874c22b59d147314cd2f8e88d131371"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="acp-25-685-2025-ie00001.svg" width="17pt" height="16pt" src="acp-25-685-2025-ie00001.png"/></svg:svg></span></span>, <span class="inline-formula">H<sub>3</sub>O<sup>+</sup></span> and <span class="inline-formula">(CH<sub>3</sub>)<sub>2</sub>COH<sup>+</sup></span> (<span class="inline-formula">AceH<sup>+</sup></span>) as reagent ions. We then trained two machine learning methods on these data: (1) random forest (RF) for classifying if a pesticide can be detected with CIMS and (2) kernel ridge regression (KRR) for predicting the expected CIMS signals. We compared their performance on five different representations of the molecular structure: the topological fingerprint (TopFP), the molecular access system keys (MACCS), a custom descriptor based on standard molecular properties (RDKitPROP), the Coulomb matrix (CM) and the many-body tensor representation (MBTR). The results indicate that MACCS outperforms the other descriptors. Our best classification model reaches a prediction accuracy of 0.85 <span class="inline-formula">±</span> 0.02 and a receiver operating characteristic curve area of 0.91 <span class="inline-formula">±</span> 0.01. Our best regression model reaches an accuracy of 0.44 <span class="inline-formula">±</span> 0.03 logarithmic units of the signal intensity. Subsequent feature importance analysis of the classifiers reveals that the most important sub-structures are NH and OH for the negative ionization schemes and nitrogen-containing groups for the positive ionization schemes.</p>https://acp.copernicus.org/articles/25/685/2025/acp-25-685-2025.pdf
spellingShingle F. Bortolussi
H. Sandström
F. Partovi
F. Partovi
J. Mikkilä
P. Rinke
P. Rinke
P. Rinke
P. Rinke
M. Rissanen
M. Rissanen
Technical note: Towards atmospheric compound identification in chemical ionization mass spectrometry with pesticide standards and machine learning
Atmospheric Chemistry and Physics
title Technical note: Towards atmospheric compound identification in chemical ionization mass spectrometry with pesticide standards and machine learning
title_full Technical note: Towards atmospheric compound identification in chemical ionization mass spectrometry with pesticide standards and machine learning
title_fullStr Technical note: Towards atmospheric compound identification in chemical ionization mass spectrometry with pesticide standards and machine learning
title_full_unstemmed Technical note: Towards atmospheric compound identification in chemical ionization mass spectrometry with pesticide standards and machine learning
title_short Technical note: Towards atmospheric compound identification in chemical ionization mass spectrometry with pesticide standards and machine learning
title_sort technical note towards atmospheric compound identification in chemical ionization mass spectrometry with pesticide standards and machine learning
url https://acp.copernicus.org/articles/25/685/2025/acp-25-685-2025.pdf
work_keys_str_mv AT fbortolussi technicalnotetowardsatmosphericcompoundidentificationinchemicalionizationmassspectrometrywithpesticidestandardsandmachinelearning
AT hsandstrom technicalnotetowardsatmosphericcompoundidentificationinchemicalionizationmassspectrometrywithpesticidestandardsandmachinelearning
AT fpartovi technicalnotetowardsatmosphericcompoundidentificationinchemicalionizationmassspectrometrywithpesticidestandardsandmachinelearning
AT fpartovi technicalnotetowardsatmosphericcompoundidentificationinchemicalionizationmassspectrometrywithpesticidestandardsandmachinelearning
AT jmikkila technicalnotetowardsatmosphericcompoundidentificationinchemicalionizationmassspectrometrywithpesticidestandardsandmachinelearning
AT prinke technicalnotetowardsatmosphericcompoundidentificationinchemicalionizationmassspectrometrywithpesticidestandardsandmachinelearning
AT prinke technicalnotetowardsatmosphericcompoundidentificationinchemicalionizationmassspectrometrywithpesticidestandardsandmachinelearning
AT prinke technicalnotetowardsatmosphericcompoundidentificationinchemicalionizationmassspectrometrywithpesticidestandardsandmachinelearning
AT prinke technicalnotetowardsatmosphericcompoundidentificationinchemicalionizationmassspectrometrywithpesticidestandardsandmachinelearning
AT mrissanen technicalnotetowardsatmosphericcompoundidentificationinchemicalionizationmassspectrometrywithpesticidestandardsandmachinelearning
AT mrissanen technicalnotetowardsatmosphericcompoundidentificationinchemicalionizationmassspectrometrywithpesticidestandardsandmachinelearning