Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study

Abstract: Artificial intelligence can detect suicidal ideation manifestations in texts. Studies demonstrate that BERT-based models achieve better performance in text classification problems. Large language models (LLMs) answer free-text queries without being specifically trained. This work aims to c...

Full description

Saved in:
Bibliographic Details
Main Authors: Adonias Caetano de Oliveira, Renato Freitas Bessa, Ariel Soares Teles
Format: Article
Language:English
Published: Escola Nacional de Saúde Pública, Fundação Oswaldo Cruz 2024-11-01
Series:Cadernos de Saúde Pública
Subjects:
Online Access:http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0102-311X2024001001413&lng=en&tlng=en
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846119123156729856
author Adonias Caetano de Oliveira
Renato Freitas Bessa
Ariel Soares Teles
author_facet Adonias Caetano de Oliveira
Renato Freitas Bessa
Ariel Soares Teles
author_sort Adonias Caetano de Oliveira
collection DOAJ
description Abstract: Artificial intelligence can detect suicidal ideation manifestations in texts. Studies demonstrate that BERT-based models achieve better performance in text classification problems. Large language models (LLMs) answer free-text queries without being specifically trained. This work aims to compare the performance of three variations of BERT models and LLMs (Google Bard, Microsoft Bing/GPT-4, and OpenAI ChatGPT-3.5) for identifying suicidal ideation from nonclinical texts written in Brazilian Portuguese. A dataset labeled by psychologists consisted of 2,691 sentences without suicidal ideation and 1,097 with suicidal ideation, of which 100 sentences were selected for testing. We applied data preprocessing techniques, hyperparameter optimization, and hold-out cross-validation for training and testing BERT models. When evaluating LLMs, we used zero-shot prompting engineering. Each test sentence was labeled if it contained suicidal ideation, according to the chatbot’s response. Bing/GPT-4 achieved the best performance, with 98% across all metrics. Fine-tuned BERT models outperformed the other LLMs: BERTimbau-Large performed the best with a 96% accuracy, followed by BERTimbau-Base with 94%, and BERT-Multilingual with 87%. Bard performed the worst with 62% accuracy, whereas ChatGPT-3.5 achieved 81%. The high recall capacity of the models suggests a low misclassification rate of at-risk patients, which is crucial to prevent missed interventions by professionals. However, despite their potential in supporting suicidal ideation detection, these models have not been validated in a patient monitoring clinical setting. Therefore, caution is advised when using the evaluated models as tools to assist healthcare professionals in detecting suicidal ideation.
format Article
id doaj-art-9136fc30d08447c7a908d5223d99b3ba
institution Kabale University
issn 1678-4464
language English
publishDate 2024-11-01
publisher Escola Nacional de Saúde Pública, Fundação Oswaldo Cruz
record_format Article
series Cadernos de Saúde Pública
spelling doaj-art-9136fc30d08447c7a908d5223d99b3ba2024-12-17T07:47:00ZengEscola Nacional de Saúde Pública, Fundação Oswaldo CruzCadernos de Saúde Pública1678-44642024-11-01401010.1590/0102-311xen028824Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation studyAdonias Caetano de Oliveirahttps://orcid.org/0000-0002-5643-2916Renato Freitas Bessahttps://orcid.org/0009-0005-8989-768XAriel Soares Teleshttps://orcid.org/0000-0002-0840-3870Abstract: Artificial intelligence can detect suicidal ideation manifestations in texts. Studies demonstrate that BERT-based models achieve better performance in text classification problems. Large language models (LLMs) answer free-text queries without being specifically trained. This work aims to compare the performance of three variations of BERT models and LLMs (Google Bard, Microsoft Bing/GPT-4, and OpenAI ChatGPT-3.5) for identifying suicidal ideation from nonclinical texts written in Brazilian Portuguese. A dataset labeled by psychologists consisted of 2,691 sentences without suicidal ideation and 1,097 with suicidal ideation, of which 100 sentences were selected for testing. We applied data preprocessing techniques, hyperparameter optimization, and hold-out cross-validation for training and testing BERT models. When evaluating LLMs, we used zero-shot prompting engineering. Each test sentence was labeled if it contained suicidal ideation, according to the chatbot’s response. Bing/GPT-4 achieved the best performance, with 98% across all metrics. Fine-tuned BERT models outperformed the other LLMs: BERTimbau-Large performed the best with a 96% accuracy, followed by BERTimbau-Base with 94%, and BERT-Multilingual with 87%. Bard performed the worst with 62% accuracy, whereas ChatGPT-3.5 achieved 81%. The high recall capacity of the models suggests a low misclassification rate of at-risk patients, which is crucial to prevent missed interventions by professionals. However, despite their potential in supporting suicidal ideation detection, these models have not been validated in a patient monitoring clinical setting. Therefore, caution is advised when using the evaluated models as tools to assist healthcare professionals in detecting suicidal ideation.http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0102-311X2024001001413&lng=en&tlng=enSuicideSuicidal IdeationArtificial IntelligenceNatural Language Processing
spellingShingle Adonias Caetano de Oliveira
Renato Freitas Bessa
Ariel Soares Teles
Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study
Cadernos de Saúde Pública
Suicide
Suicidal Ideation
Artificial Intelligence
Natural Language Processing
title Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study
title_full Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study
title_fullStr Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study
title_full_unstemmed Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study
title_short Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study
title_sort comparative analysis of bert based and generative large language models for detecting suicidal ideation a performance evaluation study
topic Suicide
Suicidal Ideation
Artificial Intelligence
Natural Language Processing
url http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0102-311X2024001001413&lng=en&tlng=en
work_keys_str_mv AT adoniascaetanodeoliveira comparativeanalysisofbertbasedandgenerativelargelanguagemodelsfordetectingsuicidalideationaperformanceevaluationstudy
AT renatofreitasbessa comparativeanalysisofbertbasedandgenerativelargelanguagemodelsfordetectingsuicidalideationaperformanceevaluationstudy
AT arielsoaresteles comparativeanalysisofbertbasedandgenerativelargelanguagemodelsfordetectingsuicidalideationaperformanceevaluationstudy