Enhancing Persian text summarization through a three-phase fine-tuning and reinforcement learning approach with the mT5 transformer model

Abstract In the contemporary era, grappling with the vast expanse of big data presents a formidable obstacle, particularly when it comes to extracting vital information from extensive textual sources. The constant influx of news articles from various agencies necessitates an enormous amount of time...

Full description

Saved in:
Bibliographic Details
Main Authors: Vahid Nejad Mahmood Abadi, Fahimeh Ghasemian
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-024-78235-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559637931851776
author Vahid Nejad Mahmood Abadi
Fahimeh Ghasemian
author_facet Vahid Nejad Mahmood Abadi
Fahimeh Ghasemian
author_sort Vahid Nejad Mahmood Abadi
collection DOAJ
description Abstract In the contemporary era, grappling with the vast expanse of big data presents a formidable obstacle, particularly when it comes to extracting vital information from extensive textual sources. The constant influx of news articles from various agencies necessitates an enormous amount of time to digest comprehensively. A viable solution to address this challenge lies in the realm of automatic text summarization, which is a pivotal and intricate endeavor within the field of natural language processing. Text summarization involves transforming pertinent textual content into a concise format that reduces its word count without compromising its underlying meaning. In recent years, transformers have emerged as a prominent force in the landscape of natural language processing, particularly in the realm of text summarization. This research endeavors to harness the power of transformers by training the mT5-base model on a three-step fine-tuning phase on Persian news articles. Subsequently, reinforcement learning via the PPO algorithm is integrated with the fine-tuned model. Finally, we evaluate the model’s performance in summarizing Persian texts, shedding light on its efficacy in addressing the formidable task of distilling meaningful insights from a sea of textual data. Our model has set a new benchmark in the field of Persian text summarization, achieving outstanding ROUGE scores of 53.17 for ROUGE-1, 37.12 for ROUGE-2, and 44.13 for ROUGE-L. These remarkable results reflect a significant advancement in the quality of Persian text summarization, signaling a promising era of more refined and context-aware summaries.
format Article
id doaj-art-129d5513a0f94f2d8aacf93f79645376
institution Kabale University
issn 2045-2322
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-129d5513a0f94f2d8aacf93f796453762025-01-05T12:16:52ZengNature PortfolioScientific Reports2045-23222025-01-0115111110.1038/s41598-024-78235-3Enhancing Persian text summarization through a three-phase fine-tuning and reinforcement learning approach with the mT5 transformer modelVahid Nejad Mahmood Abadi0Fahimeh Ghasemian1Department of Computer Engineering, Faculty of Engineering, Shahid Bahonar University of KermanDepartment of Computer Engineering, Faculty of Engineering, Shahid Bahonar University of KermanAbstract In the contemporary era, grappling with the vast expanse of big data presents a formidable obstacle, particularly when it comes to extracting vital information from extensive textual sources. The constant influx of news articles from various agencies necessitates an enormous amount of time to digest comprehensively. A viable solution to address this challenge lies in the realm of automatic text summarization, which is a pivotal and intricate endeavor within the field of natural language processing. Text summarization involves transforming pertinent textual content into a concise format that reduces its word count without compromising its underlying meaning. In recent years, transformers have emerged as a prominent force in the landscape of natural language processing, particularly in the realm of text summarization. This research endeavors to harness the power of transformers by training the mT5-base model on a three-step fine-tuning phase on Persian news articles. Subsequently, reinforcement learning via the PPO algorithm is integrated with the fine-tuned model. Finally, we evaluate the model’s performance in summarizing Persian texts, shedding light on its efficacy in addressing the formidable task of distilling meaningful insights from a sea of textual data. Our model has set a new benchmark in the field of Persian text summarization, achieving outstanding ROUGE scores of 53.17 for ROUGE-1, 37.12 for ROUGE-2, and 44.13 for ROUGE-L. These remarkable results reflect a significant advancement in the quality of Persian text summarization, signaling a promising era of more refined and context-aware summaries.https://doi.org/10.1038/s41598-024-78235-3
spellingShingle Vahid Nejad Mahmood Abadi
Fahimeh Ghasemian
Enhancing Persian text summarization through a three-phase fine-tuning and reinforcement learning approach with the mT5 transformer model
Scientific Reports
title Enhancing Persian text summarization through a three-phase fine-tuning and reinforcement learning approach with the mT5 transformer model
title_full Enhancing Persian text summarization through a three-phase fine-tuning and reinforcement learning approach with the mT5 transformer model
title_fullStr Enhancing Persian text summarization through a three-phase fine-tuning and reinforcement learning approach with the mT5 transformer model
title_full_unstemmed Enhancing Persian text summarization through a three-phase fine-tuning and reinforcement learning approach with the mT5 transformer model
title_short Enhancing Persian text summarization through a three-phase fine-tuning and reinforcement learning approach with the mT5 transformer model
title_sort enhancing persian text summarization through a three phase fine tuning and reinforcement learning approach with the mt5 transformer model
url https://doi.org/10.1038/s41598-024-78235-3
work_keys_str_mv AT vahidnejadmahmoodabadi enhancingpersiantextsummarizationthroughathreephasefinetuningandreinforcementlearningapproachwiththemt5transformermodel
AT fahimehghasemian enhancingpersiantextsummarizationthroughathreephasefinetuningandreinforcementlearningapproachwiththemt5transformermodel