Keyword-optimized template insertion for clinical note classification via prompt-based learning

Abstract Background Prompt-based learning involves the additions of prompts (i.e., templates) to the input of pre-trained large language models (PLMs) to adapt them to specific tasks with minimal training. This technique is particularly advantageous in clinical scenarios where the amount of annotate...

Full description

Saved in:
Bibliographic Details
Main Authors: Eugenia Alleva, Isotta Landi, Leslee J. Shaw, Erwin Böttinger, Ipek Ensari, Thomas J. Fuchs
Format: Article
Language:English
Published: BMC 2025-07-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-025-03071-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849334711516659712
author Eugenia Alleva
Isotta Landi
Leslee J. Shaw
Erwin Böttinger
Ipek Ensari
Thomas J. Fuchs
author_facet Eugenia Alleva
Isotta Landi
Leslee J. Shaw
Erwin Böttinger
Ipek Ensari
Thomas J. Fuchs
author_sort Eugenia Alleva
collection DOAJ
description Abstract Background Prompt-based learning involves the additions of prompts (i.e., templates) to the input of pre-trained large language models (PLMs) to adapt them to specific tasks with minimal training. This technique is particularly advantageous in clinical scenarios where the amount of annotated data is limited. This study aims to investigate the impact of template position on model performance and training efficiency in clinical note classification tasks using prompt-based learning, especially in zero- and few-shot settings. Methods We developed a keyword-optimized template insertion method (KOTI) to enhance model performance by strategically placing prompt templates near relevant clinical information within the notes. The method involves defining task-specific keywords, identifying sentences containing these keywords, and inserting the prompt template in their vicinity. We compared KOTI with standard template insertion (STI) methods in which the template is directly appended at the end of the input text. Specifically, we compared STI with naïve tail-truncation (STI-s) and STI with keyword-optimized input truncation (STI-k). Experiments were conducted using two pre-trained encoder models, GatorTron and ClinicalBERT, and two decoder models, BioGPT and ClinicalT5, across five classification tasks, including dysmenorrhea, peripheral vascular disease, depression, osteoarthritis, and smoking status classification. Results Our experiments revealed that the KOTI approach consistently outperformed both STI-s and STI-k in zero-shot and few-shot scenarios for encoder models, with KOTI yielding a significant 24% F1 improvement over STI-k for GatorTron and 8% for Clinical BERT. Additionally, training with balanced examples further enhanced performance, particularly under few-shot conditions. In contrast, decoder-based models exhibited inconsistent results, with KOTI showing significant improvement in F1 score over STI-k for BioGPT (+19%), but a significant drop for ClinicalT5 (−18%), suggesting that KOTI is not beneficial across all transformer model architectures. Conclusion Our findings underscore the significance of template position in prompt-based fine-tuning of encoder models and highlights KOTI’s potential to optimize real-world clinical note classification tasks with few training examples.
format Article
id doaj-art-a3d5a90b6b7a42dd8b47e4b0c92b2da2
institution Kabale University
issn 1472-6947
language English
publishDate 2025-07-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj-art-a3d5a90b6b7a42dd8b47e4b0c92b2da22025-08-20T03:45:30ZengBMCBMC Medical Informatics and Decision Making1472-69472025-07-0125111110.1186/s12911-025-03071-yKeyword-optimized template insertion for clinical note classification via prompt-based learningEugenia Alleva0Isotta Landi1Leslee J. Shaw2Erwin Böttinger3Ipek Ensari4Thomas J. Fuchs5Windreich Department of Artificial Intelligence and Human Health at Mount Sinai, Icahn School of Medicine at Mount SinaiInstitute for Personalized Medicine, Icahn School of Medicine at Mount SinaiBlavatnik Family Women’s Health Research Institute, Icahn School of Medicine at Mount SinaiHasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount SinaiWindreich Department of Artificial Intelligence and Human Health at Mount Sinai, Icahn School of Medicine at Mount SinaiWindreich Department of Artificial Intelligence and Human Health at Mount Sinai, Icahn School of Medicine at Mount SinaiAbstract Background Prompt-based learning involves the additions of prompts (i.e., templates) to the input of pre-trained large language models (PLMs) to adapt them to specific tasks with minimal training. This technique is particularly advantageous in clinical scenarios where the amount of annotated data is limited. This study aims to investigate the impact of template position on model performance and training efficiency in clinical note classification tasks using prompt-based learning, especially in zero- and few-shot settings. Methods We developed a keyword-optimized template insertion method (KOTI) to enhance model performance by strategically placing prompt templates near relevant clinical information within the notes. The method involves defining task-specific keywords, identifying sentences containing these keywords, and inserting the prompt template in their vicinity. We compared KOTI with standard template insertion (STI) methods in which the template is directly appended at the end of the input text. Specifically, we compared STI with naïve tail-truncation (STI-s) and STI with keyword-optimized input truncation (STI-k). Experiments were conducted using two pre-trained encoder models, GatorTron and ClinicalBERT, and two decoder models, BioGPT and ClinicalT5, across five classification tasks, including dysmenorrhea, peripheral vascular disease, depression, osteoarthritis, and smoking status classification. Results Our experiments revealed that the KOTI approach consistently outperformed both STI-s and STI-k in zero-shot and few-shot scenarios for encoder models, with KOTI yielding a significant 24% F1 improvement over STI-k for GatorTron and 8% for Clinical BERT. Additionally, training with balanced examples further enhanced performance, particularly under few-shot conditions. In contrast, decoder-based models exhibited inconsistent results, with KOTI showing significant improvement in F1 score over STI-k for BioGPT (+19%), but a significant drop for ClinicalT5 (−18%), suggesting that KOTI is not beneficial across all transformer model architectures. Conclusion Our findings underscore the significance of template position in prompt-based fine-tuning of encoder models and highlights KOTI’s potential to optimize real-world clinical note classification tasks with few training examples.https://doi.org/10.1186/s12911-025-03071-yNLPEncodersInformation extractionDysmenorrheaGatortronPrompt
spellingShingle Eugenia Alleva
Isotta Landi
Leslee J. Shaw
Erwin Böttinger
Ipek Ensari
Thomas J. Fuchs
Keyword-optimized template insertion for clinical note classification via prompt-based learning
BMC Medical Informatics and Decision Making
NLP
Encoders
Information extraction
Dysmenorrhea
Gatortron
Prompt
title Keyword-optimized template insertion for clinical note classification via prompt-based learning
title_full Keyword-optimized template insertion for clinical note classification via prompt-based learning
title_fullStr Keyword-optimized template insertion for clinical note classification via prompt-based learning
title_full_unstemmed Keyword-optimized template insertion for clinical note classification via prompt-based learning
title_short Keyword-optimized template insertion for clinical note classification via prompt-based learning
title_sort keyword optimized template insertion for clinical note classification via prompt based learning
topic NLP
Encoders
Information extraction
Dysmenorrhea
Gatortron
Prompt
url https://doi.org/10.1186/s12911-025-03071-y
work_keys_str_mv AT eugeniaalleva keywordoptimizedtemplateinsertionforclinicalnoteclassificationviapromptbasedlearning
AT isottalandi keywordoptimizedtemplateinsertionforclinicalnoteclassificationviapromptbasedlearning
AT lesleejshaw keywordoptimizedtemplateinsertionforclinicalnoteclassificationviapromptbasedlearning
AT erwinbottinger keywordoptimizedtemplateinsertionforclinicalnoteclassificationviapromptbasedlearning
AT ipekensari keywordoptimizedtemplateinsertionforclinicalnoteclassificationviapromptbasedlearning
AT thomasjfuchs keywordoptimizedtemplateinsertionforclinicalnoteclassificationviapromptbasedlearning