Classifying Patient Complaints Using Artificial Intelligence–Powered Large Language Models: Cross-Sectional Study

Abstract BackgroundPatient complaints provide valuable insights into the performance of health care systems, highlighting potential risks not apparent to staff. Patient complaints can drive systemic changes that enhance patient safety. However, manual categorization and analys...

Full description

Saved in:
Bibliographic Details
Main Authors: Sky Wei Chee Koh, Eunice Rui Ning Wong, John Chong Min Tan, Stephanie C C van der Lubbe, Jun Cong Goh, Ethan Sheng Yong Ching, Ian Wen Yih Chia, Si Hui Low, Ping Young Ang, Queenie Quek, Mehul Motani, Jose M Valderas
Format: Article
Language:English
Published: JMIR Publications 2025-08-01
Series:Journal of Medical Internet Research
Online Access:https://www.jmir.org/2025/1/e74231
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849406003952484352
author Sky Wei Chee Koh
Eunice Rui Ning Wong
John Chong Min Tan
Stephanie C C van der Lubbe
Jun Cong Goh
Ethan Sheng Yong Ching
Ian Wen Yih Chia
Si Hui Low
Ping Young Ang
Queenie Quek
Mehul Motani
Jose M Valderas
author_facet Sky Wei Chee Koh
Eunice Rui Ning Wong
John Chong Min Tan
Stephanie C C van der Lubbe
Jun Cong Goh
Ethan Sheng Yong Ching
Ian Wen Yih Chia
Si Hui Low
Ping Young Ang
Queenie Quek
Mehul Motani
Jose M Valderas
author_sort Sky Wei Chee Koh
collection DOAJ
description Abstract BackgroundPatient complaints provide valuable insights into the performance of health care systems, highlighting potential risks not apparent to staff. Patient complaints can drive systemic changes that enhance patient safety. However, manual categorization and analysis pose a huge logistical challenge, hindering the ability to harness the potential of these data. ObjectiveThis study aims to evaluate the accuracy of artificial intelligence (AI)–powered categorization of patient complaints in primary care based on the Healthcare Complaint Analysis Tool (HCAT) General Practice (GP) taxonomy and assess the importance of advanced large language models (LLMs) in complaint categorization. MethodsThis cross-sectional study analyzed 1816 anonymous patient complaints from 7 public primary care clinics in Singapore. Complaints were first coded by trained human coders using the HCAT (GP) taxonomy through a rigorous process involving independent assessment and consensus discussions. LLMs (GPT-3.5 turbo, GPT-4o mini, and Claude 3.5 Sonnet) were used to validate manual classification. Claude 3.5 Sonnet was further used to identify complaint themes. LLM classifications were assessed for accuracy and consistency with human coding using accuracy and F1 ResultsThe majority of complaints fell under the HCAT (GP) domain of management (1079/1816, 59.4%), specifically relating to institutional processes (830/1816, 45.7%). Most complaints were of medium severity (994/1816, 54.7%), occurred within the practice (627/1816, 34.5%), and resulted in minimal harm (75.4%). LLMs achieved moderate to good accuracy (58.4%‐95.5%) in HCAT (GP) field classifications, with GPT-4o mini generally outperforming GPT-3.5 turbo, except in severity classification. All 3 LLMs demonstrated moderate concordance rates (average 61.9%‐68.8%) in complaints classification with varying levels of agreement (κ=0.114‐0.623). GPT-4o mini and Claude 3.5 significantly outperformed GPT-3.5 turbo in several fields (P ConclusionsOur study highlighted the potential of LLMs in classifying patient complaints in primary care using HCAT (GP) taxonomy. While GPT-4o and Claude 3.5 demonstrated promising results, further fine-tuning and model training are required to improve accuracy. Integrating AI into complaint analysis can facilitate proactive identification of systemic issues, ultimately enhancing quality improvement and patient safety. By leveraging LLMs, health care organizations can prioritize complaints and escalate high-risk issues more effectively. Theoretically, this could lead to improved patient care and experience; further research is needed to confirm this potential benefit.
format Article
id doaj-art-e63d30f70a6043a8b57cdf8cb0e67fd2
institution Kabale University
issn 1438-8871
language English
publishDate 2025-08-01
publisher JMIR Publications
record_format Article
series Journal of Medical Internet Research
spelling doaj-art-e63d30f70a6043a8b57cdf8cb0e67fd22025-08-20T03:36:31ZengJMIR PublicationsJournal of Medical Internet Research1438-88712025-08-0127e74231e7423110.2196/74231Classifying Patient Complaints Using Artificial Intelligence–Powered Large Language Models: Cross-Sectional StudySky Wei Chee Kohhttp://orcid.org/0000-0003-4520-646XEunice Rui Ning Wonghttp://orcid.org/0009-0002-0041-2390John Chong Min Tanhttp://orcid.org/0009-0005-9463-8770Stephanie C C van der Lubbehttp://orcid.org/0000-0002-9723-7017Jun Cong Gohhttp://orcid.org/0009-0000-6635-386XEthan Sheng Yong Chinghttp://orcid.org/0009-0008-9745-002XIan Wen Yih Chiahttp://orcid.org/0009-0005-6564-7870Si Hui Lowhttp://orcid.org/0009-0006-7273-4646Ping Young Anghttp://orcid.org/0009-0001-2623-5516Queenie Quekhttp://orcid.org/0009-0006-1209-5213Mehul Motanihttp://orcid.org/0000-0003-3262-0207Jose M Valderashttp://orcid.org/0000-0002-9299-1555 Abstract BackgroundPatient complaints provide valuable insights into the performance of health care systems, highlighting potential risks not apparent to staff. Patient complaints can drive systemic changes that enhance patient safety. However, manual categorization and analysis pose a huge logistical challenge, hindering the ability to harness the potential of these data. ObjectiveThis study aims to evaluate the accuracy of artificial intelligence (AI)–powered categorization of patient complaints in primary care based on the Healthcare Complaint Analysis Tool (HCAT) General Practice (GP) taxonomy and assess the importance of advanced large language models (LLMs) in complaint categorization. MethodsThis cross-sectional study analyzed 1816 anonymous patient complaints from 7 public primary care clinics in Singapore. Complaints were first coded by trained human coders using the HCAT (GP) taxonomy through a rigorous process involving independent assessment and consensus discussions. LLMs (GPT-3.5 turbo, GPT-4o mini, and Claude 3.5 Sonnet) were used to validate manual classification. Claude 3.5 Sonnet was further used to identify complaint themes. LLM classifications were assessed for accuracy and consistency with human coding using accuracy and F1 ResultsThe majority of complaints fell under the HCAT (GP) domain of management (1079/1816, 59.4%), specifically relating to institutional processes (830/1816, 45.7%). Most complaints were of medium severity (994/1816, 54.7%), occurred within the practice (627/1816, 34.5%), and resulted in minimal harm (75.4%). LLMs achieved moderate to good accuracy (58.4%‐95.5%) in HCAT (GP) field classifications, with GPT-4o mini generally outperforming GPT-3.5 turbo, except in severity classification. All 3 LLMs demonstrated moderate concordance rates (average 61.9%‐68.8%) in complaints classification with varying levels of agreement (κ=0.114‐0.623). GPT-4o mini and Claude 3.5 significantly outperformed GPT-3.5 turbo in several fields (P ConclusionsOur study highlighted the potential of LLMs in classifying patient complaints in primary care using HCAT (GP) taxonomy. While GPT-4o and Claude 3.5 demonstrated promising results, further fine-tuning and model training are required to improve accuracy. Integrating AI into complaint analysis can facilitate proactive identification of systemic issues, ultimately enhancing quality improvement and patient safety. By leveraging LLMs, health care organizations can prioritize complaints and escalate high-risk issues more effectively. Theoretically, this could lead to improved patient care and experience; further research is needed to confirm this potential benefit.https://www.jmir.org/2025/1/e74231
spellingShingle Sky Wei Chee Koh
Eunice Rui Ning Wong
John Chong Min Tan
Stephanie C C van der Lubbe
Jun Cong Goh
Ethan Sheng Yong Ching
Ian Wen Yih Chia
Si Hui Low
Ping Young Ang
Queenie Quek
Mehul Motani
Jose M Valderas
Classifying Patient Complaints Using Artificial Intelligence–Powered Large Language Models: Cross-Sectional Study
Journal of Medical Internet Research
title Classifying Patient Complaints Using Artificial Intelligence–Powered Large Language Models: Cross-Sectional Study
title_full Classifying Patient Complaints Using Artificial Intelligence–Powered Large Language Models: Cross-Sectional Study
title_fullStr Classifying Patient Complaints Using Artificial Intelligence–Powered Large Language Models: Cross-Sectional Study
title_full_unstemmed Classifying Patient Complaints Using Artificial Intelligence–Powered Large Language Models: Cross-Sectional Study
title_short Classifying Patient Complaints Using Artificial Intelligence–Powered Large Language Models: Cross-Sectional Study
title_sort classifying patient complaints using artificial intelligence powered large language models cross sectional study
url https://www.jmir.org/2025/1/e74231
work_keys_str_mv AT skyweicheekoh classifyingpatientcomplaintsusingartificialintelligencepoweredlargelanguagemodelscrosssectionalstudy
AT euniceruiningwong classifyingpatientcomplaintsusingartificialintelligencepoweredlargelanguagemodelscrosssectionalstudy
AT johnchongmintan classifyingpatientcomplaintsusingartificialintelligencepoweredlargelanguagemodelscrosssectionalstudy
AT stephanieccvanderlubbe classifyingpatientcomplaintsusingartificialintelligencepoweredlargelanguagemodelscrosssectionalstudy
AT junconggoh classifyingpatientcomplaintsusingartificialintelligencepoweredlargelanguagemodelscrosssectionalstudy
AT ethanshengyongching classifyingpatientcomplaintsusingartificialintelligencepoweredlargelanguagemodelscrosssectionalstudy
AT ianwenyihchia classifyingpatientcomplaintsusingartificialintelligencepoweredlargelanguagemodelscrosssectionalstudy
AT sihuilow classifyingpatientcomplaintsusingartificialintelligencepoweredlargelanguagemodelscrosssectionalstudy
AT pingyoungang classifyingpatientcomplaintsusingartificialintelligencepoweredlargelanguagemodelscrosssectionalstudy
AT queeniequek classifyingpatientcomplaintsusingartificialintelligencepoweredlargelanguagemodelscrosssectionalstudy
AT mehulmotani classifyingpatientcomplaintsusingartificialintelligencepoweredlargelanguagemodelscrosssectionalstudy
AT josemvalderas classifyingpatientcomplaintsusingartificialintelligencepoweredlargelanguagemodelscrosssectionalstudy