Effective Stemmers Using Trie Data Structure for Enhanced Processing of Gujarati Text

Abstract Stemming plays a crucial role in natural language processing and information retrieval. It is challenging for the Gujarati language due to the complex morphology of several stemming algorithms for the Gujarati language that have been developed using rule-based, dictionary-based, or hybrid a...

Full description

Saved in:
Bibliographic Details
Main Authors: Nakul R. Dave, Mayuri A. Mehta, Ketan Kotecha
Format: Article
Language:English
Published: Springer 2024-12-01
Series:International Journal of Computational Intelligence Systems
Subjects:
Online Access:https://doi.org/10.1007/s44196-024-00679-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846112206798716928
author Nakul R. Dave
Mayuri A. Mehta
Ketan Kotecha
author_facet Nakul R. Dave
Mayuri A. Mehta
Ketan Kotecha
author_sort Nakul R. Dave
collection DOAJ
description Abstract Stemming plays a crucial role in natural language processing and information retrieval. It is challenging for the Gujarati language due to the complex morphology of several stemming algorithms for the Gujarati language that have been developed using rule-based, dictionary-based, or hybrid approaches. However, they are computationally expensive, produce more over-stemming errors and have limited accuracy. This paper introduces three novel optimized Gujarati stemmers using a trie data structure to overcome the above-mentioned limitations. The significant contributions to this paper are as follows. First, three optimized Gujarati stemmers, namely Optimized Gujarati Stemmer using Suffix Stripping Approach (OGS_SSA), Optimized Gujarati Stemmer using Rule-Based Approach (OGS_RBA), and Optimized Gujarati Stemmer using Re-parsing Based Approach (OGS_RPA), are proposed. Second, a novel algorithm to create a Gujarati dictionary using the trie data structure is proposed. Third, the proposed stemmers are rigorously assessed using three standard datasets, namely entertainment, health, and agriculture. The performance of the proposed stemmers is measured using evaluation parameters such as precision, recall, F1 score, accuracy, number of stemming errors and processing time. The results show that OGS_RPA consistently exceeds the OGS_SSA and OGS_RBA for precision, recall, F1 score, and accuracy. In addition, it exhibits a lower number of stemming errors. Moreover, the performance of the proposed stemmer is compared with the existing Gujarati hybrid stemmer. The results show a 14–16% improvement in accuracy and less processing time compared to the Gujarati hybrid stemmer. OGS_SSA demonstrated enhanced processing time, making it a feasible option for applications that prioritize prompt response time. Furthermore, it demonstrates 10–11% enhancement in accuracy and a reduction in processing time than the Gujarati hybrid stemmer. OGS_RBA exhibits moderate performance due to its rule-based methodology compared to OGS_RPA and OGS_SSA. However, it shows 10–13% improvement in accuracy than the Gujarati hybrid stemmer.
format Article
id doaj-art-a1cc6b6ce96e42438261a1e8960bc426
institution Kabale University
issn 1875-6883
language English
publishDate 2024-12-01
publisher Springer
record_format Article
series International Journal of Computational Intelligence Systems
spelling doaj-art-a1cc6b6ce96e42438261a1e8960bc4262024-12-22T12:46:56ZengSpringerInternational Journal of Computational Intelligence Systems1875-68832024-12-0117112110.1007/s44196-024-00679-2Effective Stemmers Using Trie Data Structure for Enhanced Processing of Gujarati TextNakul R. Dave0Mayuri A. Mehta1Ketan Kotecha2Department of Computer Engineering, Vishwakarma Government Engineering CollegeDepartment of Computer Engineering, Sarvajanik College of Engineering and TechnologySymbiosis Centre for Applied Artificial Intelligence, Symbiosis Institute of Technology, Symbiosis International UniversityAbstract Stemming plays a crucial role in natural language processing and information retrieval. It is challenging for the Gujarati language due to the complex morphology of several stemming algorithms for the Gujarati language that have been developed using rule-based, dictionary-based, or hybrid approaches. However, they are computationally expensive, produce more over-stemming errors and have limited accuracy. This paper introduces three novel optimized Gujarati stemmers using a trie data structure to overcome the above-mentioned limitations. The significant contributions to this paper are as follows. First, three optimized Gujarati stemmers, namely Optimized Gujarati Stemmer using Suffix Stripping Approach (OGS_SSA), Optimized Gujarati Stemmer using Rule-Based Approach (OGS_RBA), and Optimized Gujarati Stemmer using Re-parsing Based Approach (OGS_RPA), are proposed. Second, a novel algorithm to create a Gujarati dictionary using the trie data structure is proposed. Third, the proposed stemmers are rigorously assessed using three standard datasets, namely entertainment, health, and agriculture. The performance of the proposed stemmers is measured using evaluation parameters such as precision, recall, F1 score, accuracy, number of stemming errors and processing time. The results show that OGS_RPA consistently exceeds the OGS_SSA and OGS_RBA for precision, recall, F1 score, and accuracy. In addition, it exhibits a lower number of stemming errors. Moreover, the performance of the proposed stemmer is compared with the existing Gujarati hybrid stemmer. The results show a 14–16% improvement in accuracy and less processing time compared to the Gujarati hybrid stemmer. OGS_SSA demonstrated enhanced processing time, making it a feasible option for applications that prioritize prompt response time. Furthermore, it demonstrates 10–11% enhancement in accuracy and a reduction in processing time than the Gujarati hybrid stemmer. OGS_RBA exhibits moderate performance due to its rule-based methodology compared to OGS_RPA and OGS_SSA. However, it shows 10–13% improvement in accuracy than the Gujarati hybrid stemmer.https://doi.org/10.1007/s44196-024-00679-2Natural language processingInformation retrievalStemmingRule-based stemmerSuffix-strippingBased stemmer
spellingShingle Nakul R. Dave
Mayuri A. Mehta
Ketan Kotecha
Effective Stemmers Using Trie Data Structure for Enhanced Processing of Gujarati Text
International Journal of Computational Intelligence Systems
Natural language processing
Information retrieval
Stemming
Rule-based stemmer
Suffix-stripping
Based stemmer
title Effective Stemmers Using Trie Data Structure for Enhanced Processing of Gujarati Text
title_full Effective Stemmers Using Trie Data Structure for Enhanced Processing of Gujarati Text
title_fullStr Effective Stemmers Using Trie Data Structure for Enhanced Processing of Gujarati Text
title_full_unstemmed Effective Stemmers Using Trie Data Structure for Enhanced Processing of Gujarati Text
title_short Effective Stemmers Using Trie Data Structure for Enhanced Processing of Gujarati Text
title_sort effective stemmers using trie data structure for enhanced processing of gujarati text
topic Natural language processing
Information retrieval
Stemming
Rule-based stemmer
Suffix-stripping
Based stemmer
url https://doi.org/10.1007/s44196-024-00679-2
work_keys_str_mv AT nakulrdave effectivestemmersusingtriedatastructureforenhancedprocessingofgujaratitext
AT mayuriamehta effectivestemmersusingtriedatastructureforenhancedprocessingofgujaratitext
AT ketankotecha effectivestemmersusingtriedatastructureforenhancedprocessingofgujaratitext