Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset

BackgroundAdvances in genetics have underscored a strong association between genetic factors and health outcomes, leading to an increased demand for genetic counseling services. However, a shortage of qualified genetic counselors poses a significant challenge. Large language...

Full description

Saved in:
Bibliographic Details
Main Authors: Takuya Fukushima, Masae Manabe, Shuntaro Yada, Shoko Wakamiya, Akiko Yoshida, Yusaku Urakawa, Akiko Maeda, Shigeyuki Kan, Masayo Takahashi, Eiji Aramaki
Format: Article
Language:English
Published: JMIR Publications 2025-01-01
Series:JMIR Medical Informatics
Online Access:https://medinform.jmir.org/2025/1/e65047
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841526529450835968
author Takuya Fukushima
Masae Manabe
Shuntaro Yada
Shoko Wakamiya
Akiko Yoshida
Yusaku Urakawa
Akiko Maeda
Shigeyuki Kan
Masayo Takahashi
Eiji Aramaki
author_facet Takuya Fukushima
Masae Manabe
Shuntaro Yada
Shoko Wakamiya
Akiko Yoshida
Yusaku Urakawa
Akiko Maeda
Shigeyuki Kan
Masayo Takahashi
Eiji Aramaki
author_sort Takuya Fukushima
collection DOAJ
description BackgroundAdvances in genetics have underscored a strong association between genetic factors and health outcomes, leading to an increased demand for genetic counseling services. However, a shortage of qualified genetic counselors poses a significant challenge. Large language models (LLMs) have emerged as a potential solution for augmenting support in genetic counseling tasks. Despite the potential, Japanese genetic counseling LLMs (JGCLLMs) are underexplored. To advance a JGCLLM-based dialogue system for genetic counseling, effective domain adaptation methods require investigation. ObjectiveThis study aims to evaluate the current capabilities and identify challenges in developing a JGCLLM-based dialogue system for genetic counseling. The primary focus is to assess the effectiveness of prompt engineering, retrieval-augmented generation (RAG), and instruction tuning within the context of genetic counseling. Furthermore, we will establish an experts-evaluated dataset of responses generated by LLMs adapted to Japanese genetic counseling for the future development of JGCLLMs. MethodsTwo primary datasets were used in this study: (1) a question-answer (QA) dataset for LLM adaptation and (2) a genetic counseling question dataset for evaluation. The QA dataset included 899 QA pairs covering medical and genetic counseling topics, while the evaluation dataset contained 120 curated questions across 6 genetic counseling categories. Three enhancement techniques of LLMs—instruction tuning, RAG, and prompt engineering—were applied to a lightweight Japanese LLM to enhance its ability for genetic counseling. The performance of the adapted LLM was evaluated on the 120-question dataset by 2 certified genetic counselors and 1 ophthalmologist (SK, YU, and AY). Evaluation focused on four metrics: (1) inappropriateness of information, (2) sufficiency of information, (3) severity of harm, and (4) alignment with medical consensus. ResultsThe evaluation by certified genetic counselors and an ophthalmologist revealed varied outcomes across different methods. RAG showed potential, particularly in enhancing critical aspects of genetic counseling. In contrast, instruction tuning and prompt engineering produced less favorable outcomes. This evaluation process facilitated the creation an expert-evaluated dataset of responses generated by LLMs adapted with different combinations of these methods. Error analysis identified key ethical concerns, including inappropriate promotion of prenatal testing, criticism of relatives, and inaccurate probability statements. ConclusionsRAG demonstrated notable improvements across all evaluation metrics, suggesting potential for further enhancement through the expansion of RAG data. The expert-evaluated dataset developed in this study provides valuable insights for future optimization efforts. However, the ethical issues observed in JGCLLM responses underscore the critical need for ongoing refinement and thorough ethical evaluation before these systems can be implemented in health care settings.
format Article
id doaj-art-d37a1fe14cde49f890a25e80f8184c87
institution Kabale University
issn 2291-9694
language English
publishDate 2025-01-01
publisher JMIR Publications
record_format Article
series JMIR Medical Informatics
spelling doaj-art-d37a1fe14cde49f890a25e80f8184c872025-01-16T21:31:54ZengJMIR PublicationsJMIR Medical Informatics2291-96942025-01-0113e6504710.2196/65047Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated DatasetTakuya Fukushimahttps://orcid.org/0009-0006-9796-9126Masae Manabehttps://orcid.org/0000-0001-8018-4177Shuntaro Yadahttps://orcid.org/0000-0002-6209-1054Shoko Wakamiyahttps://orcid.org/0000-0002-9371-1340Akiko Yoshidahttps://orcid.org/0009-0007-9914-5021Yusaku Urakawahttps://orcid.org/0009-0006-2513-4250Akiko Maedahttps://orcid.org/0009-0009-5935-1183Shigeyuki Kanhttps://orcid.org/0000-0002-1889-5127Masayo Takahashihttps://orcid.org/0000-0003-1836-6484Eiji Aramakihttps://orcid.org/0000-0003-0201-3609 BackgroundAdvances in genetics have underscored a strong association between genetic factors and health outcomes, leading to an increased demand for genetic counseling services. However, a shortage of qualified genetic counselors poses a significant challenge. Large language models (LLMs) have emerged as a potential solution for augmenting support in genetic counseling tasks. Despite the potential, Japanese genetic counseling LLMs (JGCLLMs) are underexplored. To advance a JGCLLM-based dialogue system for genetic counseling, effective domain adaptation methods require investigation. ObjectiveThis study aims to evaluate the current capabilities and identify challenges in developing a JGCLLM-based dialogue system for genetic counseling. The primary focus is to assess the effectiveness of prompt engineering, retrieval-augmented generation (RAG), and instruction tuning within the context of genetic counseling. Furthermore, we will establish an experts-evaluated dataset of responses generated by LLMs adapted to Japanese genetic counseling for the future development of JGCLLMs. MethodsTwo primary datasets were used in this study: (1) a question-answer (QA) dataset for LLM adaptation and (2) a genetic counseling question dataset for evaluation. The QA dataset included 899 QA pairs covering medical and genetic counseling topics, while the evaluation dataset contained 120 curated questions across 6 genetic counseling categories. Three enhancement techniques of LLMs—instruction tuning, RAG, and prompt engineering—were applied to a lightweight Japanese LLM to enhance its ability for genetic counseling. The performance of the adapted LLM was evaluated on the 120-question dataset by 2 certified genetic counselors and 1 ophthalmologist (SK, YU, and AY). Evaluation focused on four metrics: (1) inappropriateness of information, (2) sufficiency of information, (3) severity of harm, and (4) alignment with medical consensus. ResultsThe evaluation by certified genetic counselors and an ophthalmologist revealed varied outcomes across different methods. RAG showed potential, particularly in enhancing critical aspects of genetic counseling. In contrast, instruction tuning and prompt engineering produced less favorable outcomes. This evaluation process facilitated the creation an expert-evaluated dataset of responses generated by LLMs adapted with different combinations of these methods. Error analysis identified key ethical concerns, including inappropriate promotion of prenatal testing, criticism of relatives, and inaccurate probability statements. ConclusionsRAG demonstrated notable improvements across all evaluation metrics, suggesting potential for further enhancement through the expansion of RAG data. The expert-evaluated dataset developed in this study provides valuable insights for future optimization efforts. However, the ethical issues observed in JGCLLM responses underscore the critical need for ongoing refinement and thorough ethical evaluation before these systems can be implemented in health care settings.https://medinform.jmir.org/2025/1/e65047
spellingShingle Takuya Fukushima
Masae Manabe
Shuntaro Yada
Shoko Wakamiya
Akiko Yoshida
Yusaku Urakawa
Akiko Maeda
Shigeyuki Kan
Masayo Takahashi
Eiji Aramaki
Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset
JMIR Medical Informatics
title Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset
title_full Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset
title_fullStr Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset
title_full_unstemmed Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset
title_short Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset
title_sort evaluating and enhancing japanese large language models for genetic counseling support comparative study of domain adaptation and the development of an expert evaluated dataset
url https://medinform.jmir.org/2025/1/e65047
work_keys_str_mv AT takuyafukushima evaluatingandenhancingjapaneselargelanguagemodelsforgeneticcounselingsupportcomparativestudyofdomainadaptationandthedevelopmentofanexpertevaluateddataset
AT masaemanabe evaluatingandenhancingjapaneselargelanguagemodelsforgeneticcounselingsupportcomparativestudyofdomainadaptationandthedevelopmentofanexpertevaluateddataset
AT shuntaroyada evaluatingandenhancingjapaneselargelanguagemodelsforgeneticcounselingsupportcomparativestudyofdomainadaptationandthedevelopmentofanexpertevaluateddataset
AT shokowakamiya evaluatingandenhancingjapaneselargelanguagemodelsforgeneticcounselingsupportcomparativestudyofdomainadaptationandthedevelopmentofanexpertevaluateddataset
AT akikoyoshida evaluatingandenhancingjapaneselargelanguagemodelsforgeneticcounselingsupportcomparativestudyofdomainadaptationandthedevelopmentofanexpertevaluateddataset
AT yusakuurakawa evaluatingandenhancingjapaneselargelanguagemodelsforgeneticcounselingsupportcomparativestudyofdomainadaptationandthedevelopmentofanexpertevaluateddataset
AT akikomaeda evaluatingandenhancingjapaneselargelanguagemodelsforgeneticcounselingsupportcomparativestudyofdomainadaptationandthedevelopmentofanexpertevaluateddataset
AT shigeyukikan evaluatingandenhancingjapaneselargelanguagemodelsforgeneticcounselingsupportcomparativestudyofdomainadaptationandthedevelopmentofanexpertevaluateddataset
AT masayotakahashi evaluatingandenhancingjapaneselargelanguagemodelsforgeneticcounselingsupportcomparativestudyofdomainadaptationandthedevelopmentofanexpertevaluateddataset
AT eijiaramaki evaluatingandenhancingjapaneselargelanguagemodelsforgeneticcounselingsupportcomparativestudyofdomainadaptationandthedevelopmentofanexpertevaluateddataset