Effective Stemmers Using Trie Data Structure for Enhanced Processing of Gujarati Text

Abstract Stemming plays a crucial role in natural language processing and information retrieval. It is challenging for the Gujarati language due to the complex morphology of several stemming algorithms for the Gujarati language that have been developed using rule-based, dictionary-based, or hybrid a...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nakul R. Dave, Mayuri A. Mehta, Ketan Kotecha
Format:	Article
Language:	English
Published:	Springer 2024-12-01
Series:	International Journal of Computational Intelligence Systems
Subjects:	Natural language processing Information retrieval Stemming Rule-based stemmer Suffix-stripping Based stemmer
Online Access:	https://doi.org/10.1007/s44196-024-00679-2
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846112206798716928
author	Nakul R. Dave Mayuri A. Mehta Ketan Kotecha
author_facet	Nakul R. Dave Mayuri A. Mehta Ketan Kotecha
author_sort	Nakul R. Dave
collection	DOAJ
description	Abstract Stemming plays a crucial role in natural language processing and information retrieval. It is challenging for the Gujarati language due to the complex morphology of several stemming algorithms for the Gujarati language that have been developed using rule-based, dictionary-based, or hybrid approaches. However, they are computationally expensive, produce more over-stemming errors and have limited accuracy. This paper introduces three novel optimized Gujarati stemmers using a trie data structure to overcome the above-mentioned limitations. The significant contributions to this paper are as follows. First, three optimized Gujarati stemmers, namely Optimized Gujarati Stemmer using Suffix Stripping Approach (OGS_SSA), Optimized Gujarati Stemmer using Rule-Based Approach (OGS_RBA), and Optimized Gujarati Stemmer using Re-parsing Based Approach (OGS_RPA), are proposed. Second, a novel algorithm to create a Gujarati dictionary using the trie data structure is proposed. Third, the proposed stemmers are rigorously assessed using three standard datasets, namely entertainment, health, and agriculture. The performance of the proposed stemmers is measured using evaluation parameters such as precision, recall, F1 score, accuracy, number of stemming errors and processing time. The results show that OGS_RPA consistently exceeds the OGS_SSA and OGS_RBA for precision, recall, F1 score, and accuracy. In addition, it exhibits a lower number of stemming errors. Moreover, the performance of the proposed stemmer is compared with the existing Gujarati hybrid stemmer. The results show a 14–16% improvement in accuracy and less processing time compared to the Gujarati hybrid stemmer. OGS_SSA demonstrated enhanced processing time, making it a feasible option for applications that prioritize prompt response time. Furthermore, it demonstrates 10–11% enhancement in accuracy and a reduction in processing time than the Gujarati hybrid stemmer. OGS_RBA exhibits moderate performance due to its rule-based methodology compared to OGS_RPA and OGS_SSA. However, it shows 10–13% improvement in accuracy than the Gujarati hybrid stemmer.
format	Article
id	doaj-art-a1cc6b6ce96e42438261a1e8960bc426
institution	Kabale University
issn	1875-6883
language	English
publishDate	2024-12-01
publisher	Springer
record_format	Article
series	International Journal of Computational Intelligence Systems
spelling	doaj-art-a1cc6b6ce96e42438261a1e8960bc4262024-12-22T12:46:56ZengSpringerInternational Journal of Computational Intelligence Systems1875-68832024-12-0117112110.1007/s44196-024-00679-2Effective Stemmers Using Trie Data Structure for Enhanced Processing of Gujarati TextNakul R. Dave0Mayuri A. Mehta1Ketan Kotecha2Department of Computer Engineering, Vishwakarma Government Engineering CollegeDepartment of Computer Engineering, Sarvajanik College of Engineering and TechnologySymbiosis Centre for Applied Artificial Intelligence, Symbiosis Institute of Technology, Symbiosis International UniversityAbstract Stemming plays a crucial role in natural language processing and information retrieval. It is challenging for the Gujarati language due to the complex morphology of several stemming algorithms for the Gujarati language that have been developed using rule-based, dictionary-based, or hybrid approaches. However, they are computationally expensive, produce more over-stemming errors and have limited accuracy. This paper introduces three novel optimized Gujarati stemmers using a trie data structure to overcome the above-mentioned limitations. The significant contributions to this paper are as follows. First, three optimized Gujarati stemmers, namely Optimized Gujarati Stemmer using Suffix Stripping Approach (OGS_SSA), Optimized Gujarati Stemmer using Rule-Based Approach (OGS_RBA), and Optimized Gujarati Stemmer using Re-parsing Based Approach (OGS_RPA), are proposed. Second, a novel algorithm to create a Gujarati dictionary using the trie data structure is proposed. Third, the proposed stemmers are rigorously assessed using three standard datasets, namely entertainment, health, and agriculture. The performance of the proposed stemmers is measured using evaluation parameters such as precision, recall, F1 score, accuracy, number of stemming errors and processing time. The results show that OGS_RPA consistently exceeds the OGS_SSA and OGS_RBA for precision, recall, F1 score, and accuracy. In addition, it exhibits a lower number of stemming errors. Moreover, the performance of the proposed stemmer is compared with the existing Gujarati hybrid stemmer. The results show a 14–16% improvement in accuracy and less processing time compared to the Gujarati hybrid stemmer. OGS_SSA demonstrated enhanced processing time, making it a feasible option for applications that prioritize prompt response time. Furthermore, it demonstrates 10–11% enhancement in accuracy and a reduction in processing time than the Gujarati hybrid stemmer. OGS_RBA exhibits moderate performance due to its rule-based methodology compared to OGS_RPA and OGS_SSA. However, it shows 10–13% improvement in accuracy than the Gujarati hybrid stemmer.https://doi.org/10.1007/s44196-024-00679-2Natural language processingInformation retrievalStemmingRule-based stemmerSuffix-strippingBased stemmer
spellingShingle	Nakul R. Dave Mayuri A. Mehta Ketan Kotecha Effective Stemmers Using Trie Data Structure for Enhanced Processing of Gujarati Text International Journal of Computational Intelligence Systems Natural language processing Information retrieval Stemming Rule-based stemmer Suffix-stripping Based stemmer
title	Effective Stemmers Using Trie Data Structure for Enhanced Processing of Gujarati Text
title_full	Effective Stemmers Using Trie Data Structure for Enhanced Processing of Gujarati Text
title_fullStr	Effective Stemmers Using Trie Data Structure for Enhanced Processing of Gujarati Text
title_full_unstemmed	Effective Stemmers Using Trie Data Structure for Enhanced Processing of Gujarati Text
title_short	Effective Stemmers Using Trie Data Structure for Enhanced Processing of Gujarati Text
title_sort	effective stemmers using trie data structure for enhanced processing of gujarati text
topic	Natural language processing Information retrieval Stemming Rule-based stemmer Suffix-stripping Based stemmer
url	https://doi.org/10.1007/s44196-024-00679-2
work_keys_str_mv	AT nakulrdave effectivestemmersusingtriedatastructureforenhancedprocessingofgujaratitext AT mayuriamehta effectivestemmersusingtriedatastructureforenhancedprocessingofgujaratitext AT ketankotecha effectivestemmersusingtriedatastructureforenhancedprocessingofgujaratitext

Effective Stemmers Using Trie Data Structure for Enhanced Processing of Gujarati Text

Similar Items