Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study

Abstract: Artificial intelligence can detect suicidal ideation manifestations in texts. Studies demonstrate that BERT-based models achieve better performance in text classification problems. Large language models (LLMs) answer free-text queries without being specifically trained. This work aims to c...

Full description

Saved in:

Bibliographic Details
Main Authors:	Adonias Caetano de Oliveira, Renato Freitas Bessa, Ariel Soares Teles
Format:	Article
Language:	English
Published:	Escola Nacional de Saúde Pública, Fundação Oswaldo Cruz 2024-11-01
Series:	Cadernos de Saúde Pública
Subjects:	Suicide Suicidal Ideation Artificial Intelligence Natural Language Processing
Online Access:	http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0102-311X2024001001413&lng=en&tlng=en
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846119123156729856
author	Adonias Caetano de Oliveira Renato Freitas Bessa Ariel Soares Teles
author_facet	Adonias Caetano de Oliveira Renato Freitas Bessa Ariel Soares Teles
author_sort	Adonias Caetano de Oliveira
collection	DOAJ
description	Abstract: Artificial intelligence can detect suicidal ideation manifestations in texts. Studies demonstrate that BERT-based models achieve better performance in text classification problems. Large language models (LLMs) answer free-text queries without being specifically trained. This work aims to compare the performance of three variations of BERT models and LLMs (Google Bard, Microsoft Bing/GPT-4, and OpenAI ChatGPT-3.5) for identifying suicidal ideation from nonclinical texts written in Brazilian Portuguese. A dataset labeled by psychologists consisted of 2,691 sentences without suicidal ideation and 1,097 with suicidal ideation, of which 100 sentences were selected for testing. We applied data preprocessing techniques, hyperparameter optimization, and hold-out cross-validation for training and testing BERT models. When evaluating LLMs, we used zero-shot prompting engineering. Each test sentence was labeled if it contained suicidal ideation, according to the chatbot’s response. Bing/GPT-4 achieved the best performance, with 98% across all metrics. Fine-tuned BERT models outperformed the other LLMs: BERTimbau-Large performed the best with a 96% accuracy, followed by BERTimbau-Base with 94%, and BERT-Multilingual with 87%. Bard performed the worst with 62% accuracy, whereas ChatGPT-3.5 achieved 81%. The high recall capacity of the models suggests a low misclassification rate of at-risk patients, which is crucial to prevent missed interventions by professionals. However, despite their potential in supporting suicidal ideation detection, these models have not been validated in a patient monitoring clinical setting. Therefore, caution is advised when using the evaluated models as tools to assist healthcare professionals in detecting suicidal ideation.
format	Article
id	doaj-art-9136fc30d08447c7a908d5223d99b3ba
institution	Kabale University
issn	1678-4464
language	English
publishDate	2024-11-01
publisher	Escola Nacional de Saúde Pública, Fundação Oswaldo Cruz
record_format	Article
series	Cadernos de Saúde Pública
spelling	doaj-art-9136fc30d08447c7a908d5223d99b3ba2024-12-17T07:47:00ZengEscola Nacional de Saúde Pública, Fundação Oswaldo CruzCadernos de Saúde Pública1678-44642024-11-01401010.1590/0102-311xen028824Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation studyAdonias Caetano de Oliveirahttps://orcid.org/0000-0002-5643-2916Renato Freitas Bessahttps://orcid.org/0009-0005-8989-768XAriel Soares Teleshttps://orcid.org/0000-0002-0840-3870Abstract: Artificial intelligence can detect suicidal ideation manifestations in texts. Studies demonstrate that BERT-based models achieve better performance in text classification problems. Large language models (LLMs) answer free-text queries without being specifically trained. This work aims to compare the performance of three variations of BERT models and LLMs (Google Bard, Microsoft Bing/GPT-4, and OpenAI ChatGPT-3.5) for identifying suicidal ideation from nonclinical texts written in Brazilian Portuguese. A dataset labeled by psychologists consisted of 2,691 sentences without suicidal ideation and 1,097 with suicidal ideation, of which 100 sentences were selected for testing. We applied data preprocessing techniques, hyperparameter optimization, and hold-out cross-validation for training and testing BERT models. When evaluating LLMs, we used zero-shot prompting engineering. Each test sentence was labeled if it contained suicidal ideation, according to the chatbot’s response. Bing/GPT-4 achieved the best performance, with 98% across all metrics. Fine-tuned BERT models outperformed the other LLMs: BERTimbau-Large performed the best with a 96% accuracy, followed by BERTimbau-Base with 94%, and BERT-Multilingual with 87%. Bard performed the worst with 62% accuracy, whereas ChatGPT-3.5 achieved 81%. The high recall capacity of the models suggests a low misclassification rate of at-risk patients, which is crucial to prevent missed interventions by professionals. However, despite their potential in supporting suicidal ideation detection, these models have not been validated in a patient monitoring clinical setting. Therefore, caution is advised when using the evaluated models as tools to assist healthcare professionals in detecting suicidal ideation.http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0102-311X2024001001413&lng=en&tlng=enSuicideSuicidal IdeationArtificial IntelligenceNatural Language Processing
spellingShingle	Adonias Caetano de Oliveira Renato Freitas Bessa Ariel Soares Teles Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study Cadernos de Saúde Pública Suicide Suicidal Ideation Artificial Intelligence Natural Language Processing
title	Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study
title_full	Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study
title_fullStr	Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study
title_full_unstemmed	Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study
title_short	Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study
title_sort	comparative analysis of bert based and generative large language models for detecting suicidal ideation a performance evaluation study
topic	Suicide Suicidal Ideation Artificial Intelligence Natural Language Processing
url	http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0102-311X2024001001413&lng=en&tlng=en
work_keys_str_mv	AT adoniascaetanodeoliveira comparativeanalysisofbertbasedandgenerativelargelanguagemodelsfordetectingsuicidalideationaperformanceevaluationstudy AT renatofreitasbessa comparativeanalysisofbertbasedandgenerativelargelanguagemodelsfordetectingsuicidalideationaperformanceevaluationstudy AT arielsoaresteles comparativeanalysisofbertbasedandgenerativelargelanguagemodelsfordetectingsuicidalideationaperformanceevaluationstudy

Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study

Similar Items