Evaluation of a Computer-Based Morphological Analysis Method for Free-Text Responses in the General Medicine In-Training Examination: Algorithm Validation Study
Abstract BackgroundThe General Medicine In-Training Examination (GM-ITE) tests clinical knowledge in a 2-year postgraduate residency program in Japan. In the academic year 2021, as a domain of medical safety, the GM-ITE included questions regarding the diagnosis from medical h...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
JMIR Publications
2024-12-01
|
| Series: | JMIR Medical Education |
| Online Access: | https://mededu.jmir.org/2024/1/e52068 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846126325909159936 |
|---|---|
| author | Daiki Yokokawa Kiyoshi Shikino Yuji Nishizaki Sho Fukui Yasuharu Tokuda |
| author_facet | Daiki Yokokawa Kiyoshi Shikino Yuji Nishizaki Sho Fukui Yasuharu Tokuda |
| author_sort | Daiki Yokokawa |
| collection | DOAJ |
| description |
Abstract
BackgroundThe General Medicine In-Training Examination (GM-ITE) tests clinical knowledge in a 2-year postgraduate residency program in Japan. In the academic year 2021, as a domain of medical safety, the GM-ITE included questions regarding the diagnosis from medical history and physical findings through video viewing and the skills in presenting a case. Examinees watched a video or audio recording of a patient examination and provided free-text responses. However, the human cost of scoring free-text answers may limit the implementation of GM-ITE. A simple morphological analysis and word-matching model, thus, can be used to score free-text responses.
ObjectiveThis study aimed to compare human versus computer scoring of free-text responses and qualitatively evaluate the discrepancies between human- and machine-generated scores to assess the efficacy of machine scoring.
MethodsAfter obtaining consent for participation in the study, the authors used text data from residents who voluntarily answered the GM-ITE patient reproduction video-based questions involving simulated patients. The GM-ITE used video-based questions to simulate a patient’s consultation in the emergency room with a diagnosis of pulmonary embolism following a fracture. Residents provided statements for the case presentation. We obtained human-generated scores by collating the results of 2 independent scorers and machine-generated scores by converting the free-text responses into a word sequence through segmentation and morphological analysis and matching them with a prepared list of correct answers in 2022.
ResultsOf the 104 responses collected—63 for postgraduate year 1 and 41 for postgraduate year 2—39 cases remained for final analysis after excluding invalid responses. The authors found discrepancies between human and machine scoring in 14 questions (7.2%); some were due to shortcomings in machine scoring that could be resolved by maintaining a list of correct words and dictionaries, whereas others were due to human error.
ConclusionsMachine scoring is comparable to human scoring. It requires a simple program and calibration but can potentially reduce the cost of scoring free-text responses. |
| format | Article |
| id | doaj-art-07af989862c242918d03bbf8537ba3a7 |
| institution | Kabale University |
| issn | 2369-3762 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | JMIR Publications |
| record_format | Article |
| series | JMIR Medical Education |
| spelling | doaj-art-07af989862c242918d03bbf8537ba3a72024-12-12T21:01:04ZengJMIR PublicationsJMIR Medical Education2369-37622024-12-0110e52068e5206810.2196/52068Evaluation of a Computer-Based Morphological Analysis Method for Free-Text Responses in the General Medicine In-Training Examination: Algorithm Validation StudyDaiki Yokokawahttp://orcid.org/0000-0003-0944-8664Kiyoshi Shikinohttp://orcid.org/0000-0002-3721-3443Yuji Nishizakihttp://orcid.org/0000-0002-6964-6702Sho Fukuihttp://orcid.org/0000-0002-3082-1374Yasuharu Tokudahttp://orcid.org/0000-0002-9325-7934 Abstract BackgroundThe General Medicine In-Training Examination (GM-ITE) tests clinical knowledge in a 2-year postgraduate residency program in Japan. In the academic year 2021, as a domain of medical safety, the GM-ITE included questions regarding the diagnosis from medical history and physical findings through video viewing and the skills in presenting a case. Examinees watched a video or audio recording of a patient examination and provided free-text responses. However, the human cost of scoring free-text answers may limit the implementation of GM-ITE. A simple morphological analysis and word-matching model, thus, can be used to score free-text responses. ObjectiveThis study aimed to compare human versus computer scoring of free-text responses and qualitatively evaluate the discrepancies between human- and machine-generated scores to assess the efficacy of machine scoring. MethodsAfter obtaining consent for participation in the study, the authors used text data from residents who voluntarily answered the GM-ITE patient reproduction video-based questions involving simulated patients. The GM-ITE used video-based questions to simulate a patient’s consultation in the emergency room with a diagnosis of pulmonary embolism following a fracture. Residents provided statements for the case presentation. We obtained human-generated scores by collating the results of 2 independent scorers and machine-generated scores by converting the free-text responses into a word sequence through segmentation and morphological analysis and matching them with a prepared list of correct answers in 2022. ResultsOf the 104 responses collected—63 for postgraduate year 1 and 41 for postgraduate year 2—39 cases remained for final analysis after excluding invalid responses. The authors found discrepancies between human and machine scoring in 14 questions (7.2%); some were due to shortcomings in machine scoring that could be resolved by maintaining a list of correct words and dictionaries, whereas others were due to human error. ConclusionsMachine scoring is comparable to human scoring. It requires a simple program and calibration but can potentially reduce the cost of scoring free-text responses.https://mededu.jmir.org/2024/1/e52068 |
| spellingShingle | Daiki Yokokawa Kiyoshi Shikino Yuji Nishizaki Sho Fukui Yasuharu Tokuda Evaluation of a Computer-Based Morphological Analysis Method for Free-Text Responses in the General Medicine In-Training Examination: Algorithm Validation Study JMIR Medical Education |
| title | Evaluation of a Computer-Based Morphological Analysis Method for Free-Text Responses in the General Medicine In-Training Examination: Algorithm Validation Study |
| title_full | Evaluation of a Computer-Based Morphological Analysis Method for Free-Text Responses in the General Medicine In-Training Examination: Algorithm Validation Study |
| title_fullStr | Evaluation of a Computer-Based Morphological Analysis Method for Free-Text Responses in the General Medicine In-Training Examination: Algorithm Validation Study |
| title_full_unstemmed | Evaluation of a Computer-Based Morphological Analysis Method for Free-Text Responses in the General Medicine In-Training Examination: Algorithm Validation Study |
| title_short | Evaluation of a Computer-Based Morphological Analysis Method for Free-Text Responses in the General Medicine In-Training Examination: Algorithm Validation Study |
| title_sort | evaluation of a computer based morphological analysis method for free text responses in the general medicine in training examination algorithm validation study |
| url | https://mededu.jmir.org/2024/1/e52068 |
| work_keys_str_mv | AT daikiyokokawa evaluationofacomputerbasedmorphologicalanalysismethodforfreetextresponsesinthegeneralmedicineintrainingexaminationalgorithmvalidationstudy AT kiyoshishikino evaluationofacomputerbasedmorphologicalanalysismethodforfreetextresponsesinthegeneralmedicineintrainingexaminationalgorithmvalidationstudy AT yujinishizaki evaluationofacomputerbasedmorphologicalanalysismethodforfreetextresponsesinthegeneralmedicineintrainingexaminationalgorithmvalidationstudy AT shofukui evaluationofacomputerbasedmorphologicalanalysismethodforfreetextresponsesinthegeneralmedicineintrainingexaminationalgorithmvalidationstudy AT yasuharutokuda evaluationofacomputerbasedmorphologicalanalysismethodforfreetextresponsesinthegeneralmedicineintrainingexaminationalgorithmvalidationstudy |