Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.
Semantical text understanding holds significant importance in natural language processing (NLP). Numerous datasets, such as Quora Question Pairs (QQP), have been devised for this purpose. In our previous study, we developed a Siamese Convolutional Neural Network (S-CNN) that achieved an F1 score of...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2025-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0317042 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841533208216207360 |
---|---|
author | Sifei Han Lingyun Shi Fuchiang Rich Tsui |
author_facet | Sifei Han Lingyun Shi Fuchiang Rich Tsui |
author_sort | Sifei Han |
collection | DOAJ |
description | Semantical text understanding holds significant importance in natural language processing (NLP). Numerous datasets, such as Quora Question Pairs (QQP), have been devised for this purpose. In our previous study, we developed a Siamese Convolutional Neural Network (S-CNN) that achieved an F1 score of 82.02% (95% C.I.: 81.83%-82.20%). Given the growing attention toward large language models (LLMs) like ChatGPT, we aimed to explore their effectiveness in text similarity tasks. In this research, we leveraged 5 pretrained LLMs, conducted various fine-tuning approaches (prompt engineering, n-shot learning, and supervised learning using the low-rank adaptation [LoRA]), and compared their performance using F1 score. To ensure a fair comparison, we followed our previous study's design and dataset by employing a 10-fold cross-validation for supervised model training and evaluation. Additionally, we conducted a secondary study by introducing a recent larger LLM with 70B parameters and comparing it with the 7B model using the GLUE benchmark, and both models were finetuned with the corpus. The fine-tuned LLaMA model with 7B parameters (qLLaMA_LoRA-7B) using 100,000 QQP corpus yielded the best results, achieving an F1 score of 84.9% (95% C.I.: 84.13%-85.67%), which outperformed the Alpaca_LoRA-65B (finetuned based on LLaMA-65B) (F1: 64.98% [64.72%-65.25%]; P<0.01) and had a 3% improvement compared to our previously published best model, S-CNN. The finetuned LLaMA3.1-70B (qLLaMA3.1_LoRA-70B) with 70B parameters (F1: 74.4%) outperformed the qLLaMA_LoRA-7B (F1: 71.9%) using the GLUE benchmark. The study demonstrated an effective LLM finetuning framework, which highlights the importance of finetuning LLMs for improved performance. Our task-specific supervised finetuning demonstrated improved LLM performance compared to larger pretrained models with or without n-shot learning; moreover, finetuning a larger LLM further improved performance compared to finetuning a smaller LLM. Our LLM-based finetuning framework may potentially improve various document similarity tasks, such as matching resumes with job descriptions, recommending subject-matter experts, or identifying potential reviewers for grant proposals or manuscript submissions. |
format | Article |
id | doaj-art-04c03fbf6484410787b955c3f2985c53 |
institution | Kabale University |
issn | 1932-6203 |
language | English |
publishDate | 2025-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj-art-04c03fbf6484410787b955c3f2985c532025-01-17T05:31:19ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01201e031704210.1371/journal.pone.0317042Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification.Sifei HanLingyun ShiFuchiang Rich TsuiSemantical text understanding holds significant importance in natural language processing (NLP). Numerous datasets, such as Quora Question Pairs (QQP), have been devised for this purpose. In our previous study, we developed a Siamese Convolutional Neural Network (S-CNN) that achieved an F1 score of 82.02% (95% C.I.: 81.83%-82.20%). Given the growing attention toward large language models (LLMs) like ChatGPT, we aimed to explore their effectiveness in text similarity tasks. In this research, we leveraged 5 pretrained LLMs, conducted various fine-tuning approaches (prompt engineering, n-shot learning, and supervised learning using the low-rank adaptation [LoRA]), and compared their performance using F1 score. To ensure a fair comparison, we followed our previous study's design and dataset by employing a 10-fold cross-validation for supervised model training and evaluation. Additionally, we conducted a secondary study by introducing a recent larger LLM with 70B parameters and comparing it with the 7B model using the GLUE benchmark, and both models were finetuned with the corpus. The fine-tuned LLaMA model with 7B parameters (qLLaMA_LoRA-7B) using 100,000 QQP corpus yielded the best results, achieving an F1 score of 84.9% (95% C.I.: 84.13%-85.67%), which outperformed the Alpaca_LoRA-65B (finetuned based on LLaMA-65B) (F1: 64.98% [64.72%-65.25%]; P<0.01) and had a 3% improvement compared to our previously published best model, S-CNN. The finetuned LLaMA3.1-70B (qLLaMA3.1_LoRA-70B) with 70B parameters (F1: 74.4%) outperformed the qLLaMA_LoRA-7B (F1: 71.9%) using the GLUE benchmark. The study demonstrated an effective LLM finetuning framework, which highlights the importance of finetuning LLMs for improved performance. Our task-specific supervised finetuning demonstrated improved LLM performance compared to larger pretrained models with or without n-shot learning; moreover, finetuning a larger LLM further improved performance compared to finetuning a smaller LLM. Our LLM-based finetuning framework may potentially improve various document similarity tasks, such as matching resumes with job descriptions, recommending subject-matter experts, or identifying potential reviewers for grant proposals or manuscript submissions.https://doi.org/10.1371/journal.pone.0317042 |
spellingShingle | Sifei Han Lingyun Shi Fuchiang Rich Tsui Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification. PLoS ONE |
title | Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification. |
title_full | Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification. |
title_fullStr | Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification. |
title_full_unstemmed | Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification. |
title_short | Enhancing semantical text understanding with fine-tuned large language models: A case study on Quora Question Pair duplicate identification. |
title_sort | enhancing semantical text understanding with fine tuned large language models a case study on quora question pair duplicate identification |
url | https://doi.org/10.1371/journal.pone.0317042 |
work_keys_str_mv | AT sifeihan enhancingsemanticaltextunderstandingwithfinetunedlargelanguagemodelsacasestudyonquoraquestionpairduplicateidentification AT lingyunshi enhancingsemanticaltextunderstandingwithfinetunedlargelanguagemodelsacasestudyonquoraquestionpairduplicateidentification AT fuchiangrichtsui enhancingsemanticaltextunderstandingwithfinetunedlargelanguagemodelsacasestudyonquoraquestionpairduplicateidentification |