Keyword-optimized template insertion for clinical note classification via prompt-based learning

Abstract Background Prompt-based learning involves the additions of prompts (i.e., templates) to the input of pre-trained large language models (PLMs) to adapt them to specific tasks with minimal training. This technique is particularly advantageous in clinical scenarios where the amount of annotate...

Full description

Saved in:

Bibliographic Details
Main Authors:	Eugenia Alleva, Isotta Landi, Leslee J. Shaw, Erwin Böttinger, Ipek Ensari, Thomas J. Fuchs
Format:	Article
Language:	English
Published:	BMC 2025-07-01
Series:	BMC Medical Informatics and Decision Making
Subjects:	NLP Encoders Information extraction Dysmenorrhea Gatortron Prompt
Online Access:	https://doi.org/10.1186/s12911-025-03071-y
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849334711516659712
author	Eugenia Alleva Isotta Landi Leslee J. Shaw Erwin Böttinger Ipek Ensari Thomas J. Fuchs
author_facet	Eugenia Alleva Isotta Landi Leslee J. Shaw Erwin Böttinger Ipek Ensari Thomas J. Fuchs
author_sort	Eugenia Alleva
collection	DOAJ
description	Abstract Background Prompt-based learning involves the additions of prompts (i.e., templates) to the input of pre-trained large language models (PLMs) to adapt them to specific tasks with minimal training. This technique is particularly advantageous in clinical scenarios where the amount of annotated data is limited. This study aims to investigate the impact of template position on model performance and training efficiency in clinical note classification tasks using prompt-based learning, especially in zero- and few-shot settings. Methods We developed a keyword-optimized template insertion method (KOTI) to enhance model performance by strategically placing prompt templates near relevant clinical information within the notes. The method involves defining task-specific keywords, identifying sentences containing these keywords, and inserting the prompt template in their vicinity. We compared KOTI with standard template insertion (STI) methods in which the template is directly appended at the end of the input text. Specifically, we compared STI with naïve tail-truncation (STI-s) and STI with keyword-optimized input truncation (STI-k). Experiments were conducted using two pre-trained encoder models, GatorTron and ClinicalBERT, and two decoder models, BioGPT and ClinicalT5, across five classification tasks, including dysmenorrhea, peripheral vascular disease, depression, osteoarthritis, and smoking status classification. Results Our experiments revealed that the KOTI approach consistently outperformed both STI-s and STI-k in zero-shot and few-shot scenarios for encoder models, with KOTI yielding a significant 24% F1 improvement over STI-k for GatorTron and 8% for Clinical BERT. Additionally, training with balanced examples further enhanced performance, particularly under few-shot conditions. In contrast, decoder-based models exhibited inconsistent results, with KOTI showing significant improvement in F1 score over STI-k for BioGPT (+19%), but a significant drop for ClinicalT5 (−18%), suggesting that KOTI is not beneficial across all transformer model architectures. Conclusion Our findings underscore the significance of template position in prompt-based fine-tuning of encoder models and highlights KOTI’s potential to optimize real-world clinical note classification tasks with few training examples.
format	Article
id	doaj-art-a3d5a90b6b7a42dd8b47e4b0c92b2da2
institution	Kabale University
issn	1472-6947
language	English
publishDate	2025-07-01
publisher	BMC
record_format	Article
series	BMC Medical Informatics and Decision Making
spelling	doaj-art-a3d5a90b6b7a42dd8b47e4b0c92b2da22025-08-20T03:45:30ZengBMCBMC Medical Informatics and Decision Making1472-69472025-07-0125111110.1186/s12911-025-03071-yKeyword-optimized template insertion for clinical note classification via prompt-based learningEugenia Alleva0Isotta Landi1Leslee J. Shaw2Erwin Böttinger3Ipek Ensari4Thomas J. Fuchs5Windreich Department of Artificial Intelligence and Human Health at Mount Sinai, Icahn School of Medicine at Mount SinaiInstitute for Personalized Medicine, Icahn School of Medicine at Mount SinaiBlavatnik Family Women’s Health Research Institute, Icahn School of Medicine at Mount SinaiHasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount SinaiWindreich Department of Artificial Intelligence and Human Health at Mount Sinai, Icahn School of Medicine at Mount SinaiWindreich Department of Artificial Intelligence and Human Health at Mount Sinai, Icahn School of Medicine at Mount SinaiAbstract Background Prompt-based learning involves the additions of prompts (i.e., templates) to the input of pre-trained large language models (PLMs) to adapt them to specific tasks with minimal training. This technique is particularly advantageous in clinical scenarios where the amount of annotated data is limited. This study aims to investigate the impact of template position on model performance and training efficiency in clinical note classification tasks using prompt-based learning, especially in zero- and few-shot settings. Methods We developed a keyword-optimized template insertion method (KOTI) to enhance model performance by strategically placing prompt templates near relevant clinical information within the notes. The method involves defining task-specific keywords, identifying sentences containing these keywords, and inserting the prompt template in their vicinity. We compared KOTI with standard template insertion (STI) methods in which the template is directly appended at the end of the input text. Specifically, we compared STI with naïve tail-truncation (STI-s) and STI with keyword-optimized input truncation (STI-k). Experiments were conducted using two pre-trained encoder models, GatorTron and ClinicalBERT, and two decoder models, BioGPT and ClinicalT5, across five classification tasks, including dysmenorrhea, peripheral vascular disease, depression, osteoarthritis, and smoking status classification. Results Our experiments revealed that the KOTI approach consistently outperformed both STI-s and STI-k in zero-shot and few-shot scenarios for encoder models, with KOTI yielding a significant 24% F1 improvement over STI-k for GatorTron and 8% for Clinical BERT. Additionally, training with balanced examples further enhanced performance, particularly under few-shot conditions. In contrast, decoder-based models exhibited inconsistent results, with KOTI showing significant improvement in F1 score over STI-k for BioGPT (+19%), but a significant drop for ClinicalT5 (−18%), suggesting that KOTI is not beneficial across all transformer model architectures. Conclusion Our findings underscore the significance of template position in prompt-based fine-tuning of encoder models and highlights KOTI’s potential to optimize real-world clinical note classification tasks with few training examples.https://doi.org/10.1186/s12911-025-03071-yNLPEncodersInformation extractionDysmenorrheaGatortronPrompt
spellingShingle	Eugenia Alleva Isotta Landi Leslee J. Shaw Erwin Böttinger Ipek Ensari Thomas J. Fuchs Keyword-optimized template insertion for clinical note classification via prompt-based learning BMC Medical Informatics and Decision Making NLP Encoders Information extraction Dysmenorrhea Gatortron Prompt
title	Keyword-optimized template insertion for clinical note classification via prompt-based learning
title_full	Keyword-optimized template insertion for clinical note classification via prompt-based learning
title_fullStr	Keyword-optimized template insertion for clinical note classification via prompt-based learning
title_full_unstemmed	Keyword-optimized template insertion for clinical note classification via prompt-based learning
title_short	Keyword-optimized template insertion for clinical note classification via prompt-based learning
title_sort	keyword optimized template insertion for clinical note classification via prompt based learning
topic	NLP Encoders Information extraction Dysmenorrhea Gatortron Prompt
url	https://doi.org/10.1186/s12911-025-03071-y
work_keys_str_mv	AT eugeniaalleva keywordoptimizedtemplateinsertionforclinicalnoteclassificationviapromptbasedlearning AT isottalandi keywordoptimizedtemplateinsertionforclinicalnoteclassificationviapromptbasedlearning AT lesleejshaw keywordoptimizedtemplateinsertionforclinicalnoteclassificationviapromptbasedlearning AT erwinbottinger keywordoptimizedtemplateinsertionforclinicalnoteclassificationviapromptbasedlearning AT ipekensari keywordoptimizedtemplateinsertionforclinicalnoteclassificationviapromptbasedlearning AT thomasjfuchs keywordoptimizedtemplateinsertionforclinicalnoteclassificationviapromptbasedlearning

Keyword-optimized template insertion for clinical note classification via prompt-based learning

Similar Items