Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.

Semantical text understanding holds significant importance in natural language processing (NLP). Numerous datasets, such as Quora Question Pairs (QQP), have been devised for this purpose. In our previous study, we developed a Siamese Convolutional Neural Network (S-CNN) that achieved an F1 score of...

Full description

Saved in:

Bibliographic Details
Main Authors:	Sifei Han, Lingyun Shi, Fuchiang Rich Tsui
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2025-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0317042
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841533208216207360
author	Sifei Han Lingyun Shi Fuchiang Rich Tsui
author_facet	Sifei Han Lingyun Shi Fuchiang Rich Tsui
author_sort	Sifei Han
collection	DOAJ
description	Semantical text understanding holds significant importance in natural language processing (NLP). Numerous datasets, such as Quora Question Pairs (QQP), have been devised for this purpose. In our previous study, we developed a Siamese Convolutional Neural Network (S-CNN) that achieved an F1 score of 82.02% (95% C.I.: 81.83%-82.20%). Given the growing attention toward large language models (LLMs) like ChatGPT, we aimed to explore their effectiveness in text similarity tasks. In this research, we leveraged 5 pretrained LLMs, conducted various fine-tuning approaches (prompt engineering, n-shot learning, and supervised learning using the low-rank adaptation [LoRA]), and compared their performance using F1 score. To ensure a fair comparison, we followed our previous study's design and dataset by employing a 10-fold cross-validation for supervised model training and evaluation. Additionally, we conducted a secondary study by introducing a recent larger LLM with 70B parameters and comparing it with the 7B model using the GLUE benchmark, and both models were finetuned with the corpus. The fine-tuned LLaMA model with 7B parameters (qLLaMA_LoRA-7B) using 100,000 QQP corpus yielded the best results, achieving an F1 score of 84.9% (95% C.I.: 84.13%-85.67%), which outperformed the Alpaca_LoRA-65B (finetuned based on LLaMA-65B) (F1: 64.98% [64.72%-65.25%]; P<0.01) and had a 3% improvement compared to our previously published best model, S-CNN. The finetuned LLaMA3.1-70B (qLLaMA3.1_LoRA-70B) with 70B parameters (F1: 74.4%) outperformed the qLLaMA_LoRA-7B (F1: 71.9%) using the GLUE benchmark. The study demonstrated an effective LLM finetuning framework, which highlights the importance of finetuning LLMs for improved performance. Our task-specific supervised finetuning demonstrated improved LLM performance compared to larger pretrained models with or without n-shot learning; moreover, finetuning a larger LLM further improved performance compared to finetuning a smaller LLM. Our LLM-based finetuning framework may potentially improve various document similarity tasks, such as matching resumes with job descriptions, recommending subject-matter experts, or identifying potential reviewers for grant proposals or manuscript submissions.
format	Article
id	doaj-art-04c03fbf6484410787b955c3f2985c53
institution	Kabale University
issn	1932-6203
language	English
publishDate	2025-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj-art-04c03fbf6484410787b955c3f2985c532025-01-17T05:31:19ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01201e031704210.1371/journal.pone.0317042Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.Sifei HanLingyun ShiFuchiang Rich TsuiSemantical text understanding holds significant importance in natural language processing (NLP). Numerous datasets, such as Quora Question Pairs (QQP), have been devised for this purpose. In our previous study, we developed a Siamese Convolutional Neural Network (S-CNN) that achieved an F1 score of 82.02% (95% C.I.: 81.83%-82.20%). Given the growing attention toward large language models (LLMs) like ChatGPT, we aimed to explore their effectiveness in text similarity tasks. In this research, we leveraged 5 pretrained LLMs, conducted various fine-tuning approaches (prompt engineering, n-shot learning, and supervised learning using the low-rank adaptation [LoRA]), and compared their performance using F1 score. To ensure a fair comparison, we followed our previous study's design and dataset by employing a 10-fold cross-validation for supervised model training and evaluation. Additionally, we conducted a secondary study by introducing a recent larger LLM with 70B parameters and comparing it with the 7B model using the GLUE benchmark, and both models were finetuned with the corpus. The fine-tuned LLaMA model with 7B parameters (qLLaMA_LoRA-7B) using 100,000 QQP corpus yielded the best results, achieving an F1 score of 84.9% (95% C.I.: 84.13%-85.67%), which outperformed the Alpaca_LoRA-65B (finetuned based on LLaMA-65B) (F1: 64.98% [64.72%-65.25%]; P<0.01) and had a 3% improvement compared to our previously published best model, S-CNN. The finetuned LLaMA3.1-70B (qLLaMA3.1_LoRA-70B) with 70B parameters (F1: 74.4%) outperformed the qLLaMA_LoRA-7B (F1: 71.9%) using the GLUE benchmark. The study demonstrated an effective LLM finetuning framework, which highlights the importance of finetuning LLMs for improved performance. Our task-specific supervised finetuning demonstrated improved LLM performance compared to larger pretrained models with or without n-shot learning; moreover, finetuning a larger LLM further improved performance compared to finetuning a smaller LLM. Our LLM-based finetuning framework may potentially improve various document similarity tasks, such as matching resumes with job descriptions, recommending subject-matter experts, or identifying potential reviewers for grant proposals or manuscript submissions.https://doi.org/10.1371/journal.pone.0317042
spellingShingle	Sifei Han Lingyun Shi Fuchiang Rich Tsui Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification. PLoS ONE
title	Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.
title_full	Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.
title_fullStr	Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.
title_full_unstemmed	Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.
title_short	Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.
title_sort	enhancing semantical text understanding with fine tuned large language models a case study on quora question pair duplicate identification
url	https://doi.org/10.1371/journal.pone.0317042
work_keys_str_mv	AT sifeihan enhancingsemanticaltextunderstandingwithfinetunedlargelanguagemodelsacasestudyonquoraquestionpairduplicateidentification AT lingyunshi enhancingsemanticaltextunderstandingwithfinetunedlargelanguagemodelsacasestudyonquoraquestionpairduplicateidentification AT fuchiangrichtsui enhancingsemanticaltextunderstandingwithfinetunedlargelanguagemodelsacasestudyonquoraquestionpairduplicateidentification

Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.

Similar Items