Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study
Abstract: Artificial intelligence can detect suicidal ideation manifestations in texts. Studies demonstrate that BERT-based models achieve better performance in text classification problems. Large language models (LLMs) answer free-text queries without being specifically trained. This work aims to c...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Escola Nacional de Saúde Pública, Fundação Oswaldo Cruz
2024-11-01
|
| Series: | Cadernos de Saúde Pública |
| Subjects: | |
| Online Access: | http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0102-311X2024001001413&lng=en&tlng=en |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846119123156729856 |
|---|---|
| author | Adonias Caetano de Oliveira Renato Freitas Bessa Ariel Soares Teles |
| author_facet | Adonias Caetano de Oliveira Renato Freitas Bessa Ariel Soares Teles |
| author_sort | Adonias Caetano de Oliveira |
| collection | DOAJ |
| description | Abstract: Artificial intelligence can detect suicidal ideation manifestations in texts. Studies demonstrate that BERT-based models achieve better performance in text classification problems. Large language models (LLMs) answer free-text queries without being specifically trained. This work aims to compare the performance of three variations of BERT models and LLMs (Google Bard, Microsoft Bing/GPT-4, and OpenAI ChatGPT-3.5) for identifying suicidal ideation from nonclinical texts written in Brazilian Portuguese. A dataset labeled by psychologists consisted of 2,691 sentences without suicidal ideation and 1,097 with suicidal ideation, of which 100 sentences were selected for testing. We applied data preprocessing techniques, hyperparameter optimization, and hold-out cross-validation for training and testing BERT models. When evaluating LLMs, we used zero-shot prompting engineering. Each test sentence was labeled if it contained suicidal ideation, according to the chatbot’s response. Bing/GPT-4 achieved the best performance, with 98% across all metrics. Fine-tuned BERT models outperformed the other LLMs: BERTimbau-Large performed the best with a 96% accuracy, followed by BERTimbau-Base with 94%, and BERT-Multilingual with 87%. Bard performed the worst with 62% accuracy, whereas ChatGPT-3.5 achieved 81%. The high recall capacity of the models suggests a low misclassification rate of at-risk patients, which is crucial to prevent missed interventions by professionals. However, despite their potential in supporting suicidal ideation detection, these models have not been validated in a patient monitoring clinical setting. Therefore, caution is advised when using the evaluated models as tools to assist healthcare professionals in detecting suicidal ideation. |
| format | Article |
| id | doaj-art-9136fc30d08447c7a908d5223d99b3ba |
| institution | Kabale University |
| issn | 1678-4464 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | Escola Nacional de Saúde Pública, Fundação Oswaldo Cruz |
| record_format | Article |
| series | Cadernos de Saúde Pública |
| spelling | doaj-art-9136fc30d08447c7a908d5223d99b3ba2024-12-17T07:47:00ZengEscola Nacional de Saúde Pública, Fundação Oswaldo CruzCadernos de Saúde Pública1678-44642024-11-01401010.1590/0102-311xen028824Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation studyAdonias Caetano de Oliveirahttps://orcid.org/0000-0002-5643-2916Renato Freitas Bessahttps://orcid.org/0009-0005-8989-768XAriel Soares Teleshttps://orcid.org/0000-0002-0840-3870Abstract: Artificial intelligence can detect suicidal ideation manifestations in texts. Studies demonstrate that BERT-based models achieve better performance in text classification problems. Large language models (LLMs) answer free-text queries without being specifically trained. This work aims to compare the performance of three variations of BERT models and LLMs (Google Bard, Microsoft Bing/GPT-4, and OpenAI ChatGPT-3.5) for identifying suicidal ideation from nonclinical texts written in Brazilian Portuguese. A dataset labeled by psychologists consisted of 2,691 sentences without suicidal ideation and 1,097 with suicidal ideation, of which 100 sentences were selected for testing. We applied data preprocessing techniques, hyperparameter optimization, and hold-out cross-validation for training and testing BERT models. When evaluating LLMs, we used zero-shot prompting engineering. Each test sentence was labeled if it contained suicidal ideation, according to the chatbot’s response. Bing/GPT-4 achieved the best performance, with 98% across all metrics. Fine-tuned BERT models outperformed the other LLMs: BERTimbau-Large performed the best with a 96% accuracy, followed by BERTimbau-Base with 94%, and BERT-Multilingual with 87%. Bard performed the worst with 62% accuracy, whereas ChatGPT-3.5 achieved 81%. The high recall capacity of the models suggests a low misclassification rate of at-risk patients, which is crucial to prevent missed interventions by professionals. However, despite their potential in supporting suicidal ideation detection, these models have not been validated in a patient monitoring clinical setting. Therefore, caution is advised when using the evaluated models as tools to assist healthcare professionals in detecting suicidal ideation.http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0102-311X2024001001413&lng=en&tlng=enSuicideSuicidal IdeationArtificial IntelligenceNatural Language Processing |
| spellingShingle | Adonias Caetano de Oliveira Renato Freitas Bessa Ariel Soares Teles Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study Cadernos de Saúde Pública Suicide Suicidal Ideation Artificial Intelligence Natural Language Processing |
| title | Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study |
| title_full | Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study |
| title_fullStr | Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study |
| title_full_unstemmed | Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study |
| title_short | Comparative analysis of BERT-based and generative large language models for detecting suicidal ideation: a performance evaluation study |
| title_sort | comparative analysis of bert based and generative large language models for detecting suicidal ideation a performance evaluation study |
| topic | Suicide Suicidal Ideation Artificial Intelligence Natural Language Processing |
| url | http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0102-311X2024001001413&lng=en&tlng=en |
| work_keys_str_mv | AT adoniascaetanodeoliveira comparativeanalysisofbertbasedandgenerativelargelanguagemodelsfordetectingsuicidalideationaperformanceevaluationstudy AT renatofreitasbessa comparativeanalysisofbertbasedandgenerativelargelanguagemodelsfordetectingsuicidalideationaperformanceevaluationstudy AT arielsoaresteles comparativeanalysisofbertbasedandgenerativelargelanguagemodelsfordetectingsuicidalideationaperformanceevaluationstudy |