Fine-Tuning Retrieval-Augmented Generation with an Auto-Regressive Language Model for Sentiment Analysis in Financial Reviews

Sentiment analysis is a well-known task that has been used to analyse customer feedback reviews and media headlines to detect the sentimental personality or polarisation of a given text. With the growth of social media and other online platforms, like Twitter (now branded as X), Facebook, blogs, and...

Full description

Saved in:
Bibliographic Details
Main Authors: Miehleketo Mathebula, Abiodun Modupe, Vukosi Marivate
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/23/10782
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846124492865142784
author Miehleketo Mathebula
Abiodun Modupe
Vukosi Marivate
author_facet Miehleketo Mathebula
Abiodun Modupe
Vukosi Marivate
author_sort Miehleketo Mathebula
collection DOAJ
description Sentiment analysis is a well-known task that has been used to analyse customer feedback reviews and media headlines to detect the sentimental personality or polarisation of a given text. With the growth of social media and other online platforms, like Twitter (now branded as X), Facebook, blogs, and others, it has been used in the investment community to monitor customer feedback, reviews, and news headlines about financial institutions’ products and services to ensure business success and prioritise aspects of customer relationship management. Supervised learning algorithms have been popularly employed for this task, but the performance of these models has been compromised due to the brevity of the content and the presence of idiomatic expressions, sound imitations, and abbreviations. Additionally, the pre-training of a larger language model (PTLM) struggles to capture bidirectional contextual knowledge learnt through word dependency because the sentence-level representation fails to take broad features into account. We develop a novel structure called language feature extraction and adaptation for reviews (LFEAR), an advanced natural language model that amalgamates retrieval-augmented generation (RAG) with a conversation format for an auto-regressive fine-tuning model (ARFT). This helps to overcome the limitations of lexicon-based tools and the reliance on pre-defined sentiment lexicons, which may not fully capture the range of sentiments in natural language and address questions on various topics and tasks. LFEAR is fine-tuned on Hellopeter reviews that incorporate industry-specific contextual information retrieval to show resilience and flexibility for various tasks, including analysing sentiments in reviews of restaurants, movies, politics, and financial products. The proposed model achieved an average precision score of 98.45%, answer correctness of 93.85%, and context precision of 97.69% based on Retrieval-Augmented Generation Assessment (RAGAS) metrics. The LFEAR model is effective in conducting sentiment analysis across various domains due to its adaptability and scalable inference mechanism. It considers unique language characteristics and patterns in specific domains to ensure accurate sentiment annotation. This is particularly beneficial for individuals in the financial sector, such as investors and institutions, including those listed on the Johannesburg Stock Exchange (JSE), which is the primary stock exchange in South Africa and plays a significant role in the country’s financial market. Future initiatives will focus on incorporating a wider range of data sources and improving the system’s ability to express nuanced sentiments effectively, enhancing its usefulness in diverse real-world scenarios.
format Article
id doaj-art-32b3eb95cdd6481bb5a8a4c3ba97f71a
institution Kabale University
issn 2076-3417
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-32b3eb95cdd6481bb5a8a4c3ba97f71a2024-12-13T16:21:43ZengMDPI AGApplied Sciences2076-34172024-11-0114231078210.3390/app142310782Fine-Tuning Retrieval-Augmented Generation with an Auto-Regressive Language Model for Sentiment Analysis in Financial ReviewsMiehleketo Mathebula0Abiodun Modupe1Vukosi Marivate2Department of Computer Science, University of Pretoria, Lynnwood Road, Pretoria 0002, South AfricaDepartment of Computer Science, University of Pretoria, Lynnwood Road, Pretoria 0002, South AfricaDepartment of Computer Science, University of Pretoria, Lynnwood Road, Pretoria 0002, South AfricaSentiment analysis is a well-known task that has been used to analyse customer feedback reviews and media headlines to detect the sentimental personality or polarisation of a given text. With the growth of social media and other online platforms, like Twitter (now branded as X), Facebook, blogs, and others, it has been used in the investment community to monitor customer feedback, reviews, and news headlines about financial institutions’ products and services to ensure business success and prioritise aspects of customer relationship management. Supervised learning algorithms have been popularly employed for this task, but the performance of these models has been compromised due to the brevity of the content and the presence of idiomatic expressions, sound imitations, and abbreviations. Additionally, the pre-training of a larger language model (PTLM) struggles to capture bidirectional contextual knowledge learnt through word dependency because the sentence-level representation fails to take broad features into account. We develop a novel structure called language feature extraction and adaptation for reviews (LFEAR), an advanced natural language model that amalgamates retrieval-augmented generation (RAG) with a conversation format for an auto-regressive fine-tuning model (ARFT). This helps to overcome the limitations of lexicon-based tools and the reliance on pre-defined sentiment lexicons, which may not fully capture the range of sentiments in natural language and address questions on various topics and tasks. LFEAR is fine-tuned on Hellopeter reviews that incorporate industry-specific contextual information retrieval to show resilience and flexibility for various tasks, including analysing sentiments in reviews of restaurants, movies, politics, and financial products. The proposed model achieved an average precision score of 98.45%, answer correctness of 93.85%, and context precision of 97.69% based on Retrieval-Augmented Generation Assessment (RAGAS) metrics. The LFEAR model is effective in conducting sentiment analysis across various domains due to its adaptability and scalable inference mechanism. It considers unique language characteristics and patterns in specific domains to ensure accurate sentiment annotation. This is particularly beneficial for individuals in the financial sector, such as investors and institutions, including those listed on the Johannesburg Stock Exchange (JSE), which is the primary stock exchange in South Africa and plays a significant role in the country’s financial market. Future initiatives will focus on incorporating a wider range of data sources and improving the system’s ability to express nuanced sentiments effectively, enhancing its usefulness in diverse real-world scenarios.https://www.mdpi.com/2076-3417/14/23/10782large language modelssentiment analysisretrieval-augmented generationprompt engineeringconversational fine-tuningretrieval augmented generation assessment
spellingShingle Miehleketo Mathebula
Abiodun Modupe
Vukosi Marivate
Fine-Tuning Retrieval-Augmented Generation with an Auto-Regressive Language Model for Sentiment Analysis in Financial Reviews
Applied Sciences
large language models
sentiment analysis
retrieval-augmented generation
prompt engineering
conversational fine-tuning
retrieval augmented generation assessment
title Fine-Tuning Retrieval-Augmented Generation with an Auto-Regressive Language Model for Sentiment Analysis in Financial Reviews
title_full Fine-Tuning Retrieval-Augmented Generation with an Auto-Regressive Language Model for Sentiment Analysis in Financial Reviews
title_fullStr Fine-Tuning Retrieval-Augmented Generation with an Auto-Regressive Language Model for Sentiment Analysis in Financial Reviews
title_full_unstemmed Fine-Tuning Retrieval-Augmented Generation with an Auto-Regressive Language Model for Sentiment Analysis in Financial Reviews
title_short Fine-Tuning Retrieval-Augmented Generation with an Auto-Regressive Language Model for Sentiment Analysis in Financial Reviews
title_sort fine tuning retrieval augmented generation with an auto regressive language model for sentiment analysis in financial reviews
topic large language models
sentiment analysis
retrieval-augmented generation
prompt engineering
conversational fine-tuning
retrieval augmented generation assessment
url https://www.mdpi.com/2076-3417/14/23/10782
work_keys_str_mv AT miehleketomathebula finetuningretrievalaugmentedgenerationwithanautoregressivelanguagemodelforsentimentanalysisinfinancialreviews
AT abiodunmodupe finetuningretrievalaugmentedgenerationwithanautoregressivelanguagemodelforsentimentanalysisinfinancialreviews
AT vukosimarivate finetuningretrievalaugmentedgenerationwithanautoregressivelanguagemodelforsentimentanalysisinfinancialreviews