Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.

Semantical text understanding holds significant importance in natural language processing (NLP). Numerous datasets, such as Quora Question Pairs (QQP), have been devised for this purpose. In our previous study, we developed a Siamese Convolutional Neural Network (S-CNN) that achieved an F1 score of...

Full description

Saved in:
Bibliographic Details
Main Authors: Sifei Han, Lingyun Shi, Fuchiang Rich Tsui
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0317042
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841533208216207360
author Sifei Han
Lingyun Shi
Fuchiang Rich Tsui
author_facet Sifei Han
Lingyun Shi
Fuchiang Rich Tsui
author_sort Sifei Han
collection DOAJ
description Semantical text understanding holds significant importance in natural language processing (NLP). Numerous datasets, such as Quora Question Pairs (QQP), have been devised for this purpose. In our previous study, we developed a Siamese Convolutional Neural Network (S-CNN) that achieved an F1 score of 82.02% (95% C.I.: 81.83%-82.20%). Given the growing attention toward large language models (LLMs) like ChatGPT, we aimed to explore their effectiveness in text similarity tasks. In this research, we leveraged 5 pretrained LLMs, conducted various fine-tuning approaches (prompt engineering, n-shot learning, and supervised learning using the low-rank adaptation [LoRA]), and compared their performance using F1 score. To ensure a fair comparison, we followed our previous study's design and dataset by employing a 10-fold cross-validation for supervised model training and evaluation. Additionally, we conducted a secondary study by introducing a recent larger LLM with 70B parameters and comparing it with the 7B model using the GLUE benchmark, and both models were finetuned with the corpus. The fine-tuned LLaMA model with 7B parameters (qLLaMA_LoRA-7B) using 100,000 QQP corpus yielded the best results, achieving an F1 score of 84.9% (95% C.I.: 84.13%-85.67%), which outperformed the Alpaca_LoRA-65B (finetuned based on LLaMA-65B) (F1: 64.98% [64.72%-65.25%]; P<0.01) and had a 3% improvement compared to our previously published best model, S-CNN. The finetuned LLaMA3.1-70B (qLLaMA3.1_LoRA-70B) with 70B parameters (F1: 74.4%) outperformed the qLLaMA_LoRA-7B (F1: 71.9%) using the GLUE benchmark. The study demonstrated an effective LLM finetuning framework, which highlights the importance of finetuning LLMs for improved performance. Our task-specific supervised finetuning demonstrated improved LLM performance compared to larger pretrained models with or without n-shot learning; moreover, finetuning a larger LLM further improved performance compared to finetuning a smaller LLM. Our LLM-based finetuning framework may potentially improve various document similarity tasks, such as matching resumes with job descriptions, recommending subject-matter experts, or identifying potential reviewers for grant proposals or manuscript submissions.
format Article
id doaj-art-04c03fbf6484410787b955c3f2985c53
institution Kabale University
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-04c03fbf6484410787b955c3f2985c532025-01-17T05:31:19ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01201e031704210.1371/journal.pone.0317042Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.Sifei HanLingyun ShiFuchiang Rich TsuiSemantical text understanding holds significant importance in natural language processing (NLP). Numerous datasets, such as Quora Question Pairs (QQP), have been devised for this purpose. In our previous study, we developed a Siamese Convolutional Neural Network (S-CNN) that achieved an F1 score of 82.02% (95% C.I.: 81.83%-82.20%). Given the growing attention toward large language models (LLMs) like ChatGPT, we aimed to explore their effectiveness in text similarity tasks. In this research, we leveraged 5 pretrained LLMs, conducted various fine-tuning approaches (prompt engineering, n-shot learning, and supervised learning using the low-rank adaptation [LoRA]), and compared their performance using F1 score. To ensure a fair comparison, we followed our previous study's design and dataset by employing a 10-fold cross-validation for supervised model training and evaluation. Additionally, we conducted a secondary study by introducing a recent larger LLM with 70B parameters and comparing it with the 7B model using the GLUE benchmark, and both models were finetuned with the corpus. The fine-tuned LLaMA model with 7B parameters (qLLaMA_LoRA-7B) using 100,000 QQP corpus yielded the best results, achieving an F1 score of 84.9% (95% C.I.: 84.13%-85.67%), which outperformed the Alpaca_LoRA-65B (finetuned based on LLaMA-65B) (F1: 64.98% [64.72%-65.25%]; P<0.01) and had a 3% improvement compared to our previously published best model, S-CNN. The finetuned LLaMA3.1-70B (qLLaMA3.1_LoRA-70B) with 70B parameters (F1: 74.4%) outperformed the qLLaMA_LoRA-7B (F1: 71.9%) using the GLUE benchmark. The study demonstrated an effective LLM finetuning framework, which highlights the importance of finetuning LLMs for improved performance. Our task-specific supervised finetuning demonstrated improved LLM performance compared to larger pretrained models with or without n-shot learning; moreover, finetuning a larger LLM further improved performance compared to finetuning a smaller LLM. Our LLM-based finetuning framework may potentially improve various document similarity tasks, such as matching resumes with job descriptions, recommending subject-matter experts, or identifying potential reviewers for grant proposals or manuscript submissions.https://doi.org/10.1371/journal.pone.0317042
spellingShingle Sifei Han
Lingyun Shi
Fuchiang Rich Tsui
Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.
PLoS ONE
title Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.
title_full Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.
title_fullStr Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.
title_full_unstemmed Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.
title_short Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.
title_sort enhancing semantical text understanding with fine tuned large language models a case study on quora question pair duplicate identification
url https://doi.org/10.1371/journal.pone.0317042
work_keys_str_mv AT sifeihan enhancingsemanticaltextunderstandingwithfinetunedlargelanguagemodelsacasestudyonquoraquestionpairduplicateidentification
AT lingyunshi enhancingsemanticaltextunderstandingwithfinetunedlargelanguagemodelsacasestudyonquoraquestionpairduplicateidentification
AT fuchiangrichtsui enhancingsemanticaltextunderstandingwithfinetunedlargelanguagemodelsacasestudyonquoraquestionpairduplicateidentification