Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing

Retrieving contextually relevant verses from the Al-Qur'an translation dataset presents significant challenges due to the linguistic richness and semantic variation of the text. This study aims to enhance the accuracy and relevance of information retrieval in the Al-Qur'an translation data...

Full description

Saved in:
Bibliographic Details
Main Authors: aziz ramadhan, Fandy Setyo Utomo
Format: Article
Language:Indonesian
Published: Islamic University of Indragiri 2025-05-01
Series:Sistemasi: Jurnal Sistem Informasi
Subjects:
Online Access:https://sistemasi.ftik.unisi.ac.id/index.php/stmsi/article/view/4446
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849222084779048960
author aziz ramadhan
Fandy Setyo Utomo
author_facet aziz ramadhan
Fandy Setyo Utomo
author_sort aziz ramadhan
collection DOAJ
description Retrieving contextually relevant verses from the Al-Qur'an translation dataset presents significant challenges due to the linguistic richness and semantic variation of the text. This study aims to enhance the accuracy and relevance of information retrieval in the Al-Qur'an translation dataset by combining Latent Semantic Indexing (LSI) and FastText word embeddings. The proposed method involves several steps: text preprocessing (lowercasing, punctuation removal, stopword elimination, and stemming), tokenization and vocabulary creation, Bag-of-Words (BoW) representation, creation of LSI models, conversion of FastText vectors, and combining LSI and FastText vectors. A similarity index is then created from the combined vectors to process user queries and rank documents based on cosine similarity. Testing on the dataset, consisting of 6236 translated verses from 114 surahs, showed promising results. The combined approach effectively captures both broader semantic structures and detailed word meanings, providing more accurate and contextually relevant search results. Key findings include high similarity scores, with 90% of retrieved verses being highly relevant to the user query, an accuracy improvement to 85%, and enhanced handling of synonyms and morphological variations at 88%. Further development is recommended, including parameter optimization, advanced preprocessing techniques, real-time search optimization, integration of contextual embeddings, and multilingual support to improve search performance and accuracy.
format Article
id doaj-art-e6118c2d868a4a56a9a392f29a132a6a
institution Kabale University
issn 2302-8149
2540-9719
language Indonesian
publishDate 2025-05-01
publisher Islamic University of Indragiri
record_format Article
series Sistemasi: Jurnal Sistem Informasi
spelling doaj-art-e6118c2d868a4a56a9a392f29a132a6a2025-08-26T08:05:46ZindIslamic University of IndragiriSistemasi: Jurnal Sistem Informasi2302-81492540-97192025-05-011431014102410.32520/stmsi.v14i3.44461079Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexingaziz ramadhan0Fandy Setyo Utomo1Universitas Amikom PurwokertoUniversitas Amikom PurwokertoRetrieving contextually relevant verses from the Al-Qur'an translation dataset presents significant challenges due to the linguistic richness and semantic variation of the text. This study aims to enhance the accuracy and relevance of information retrieval in the Al-Qur'an translation dataset by combining Latent Semantic Indexing (LSI) and FastText word embeddings. The proposed method involves several steps: text preprocessing (lowercasing, punctuation removal, stopword elimination, and stemming), tokenization and vocabulary creation, Bag-of-Words (BoW) representation, creation of LSI models, conversion of FastText vectors, and combining LSI and FastText vectors. A similarity index is then created from the combined vectors to process user queries and rank documents based on cosine similarity. Testing on the dataset, consisting of 6236 translated verses from 114 surahs, showed promising results. The combined approach effectively captures both broader semantic structures and detailed word meanings, providing more accurate and contextually relevant search results. Key findings include high similarity scores, with 90% of retrieved verses being highly relevant to the user query, an accuracy improvement to 85%, and enhanced handling of synonyms and morphological variations at 88%. Further development is recommended, including parameter optimization, advanced preprocessing techniques, real-time search optimization, integration of contextual embeddings, and multilingual support to improve search performance and accuracy.https://sistemasi.ftik.unisi.ac.id/index.php/stmsi/article/view/4446information retrievallatent semantic indexingword embeddingfasttextal-qur'an
spellingShingle aziz ramadhan
Fandy Setyo Utomo
Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing
Sistemasi: Jurnal Sistem Informasi
information retrieval
latent semantic indexing
word embedding
fasttext
al-qur'an
title Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing
title_full Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing
title_fullStr Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing
title_full_unstemmed Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing
title_short Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing
title_sort information retrieval method for the qur an based on fasttext and latent semantic indexing
topic information retrieval
latent semantic indexing
word embedding
fasttext
al-qur'an
url https://sistemasi.ftik.unisi.ac.id/index.php/stmsi/article/view/4446
work_keys_str_mv AT azizramadhan informationretrievalmethodforthequranbasedonfasttextandlatentsemanticindexing
AT fandysetyoutomo informationretrievalmethodforthequranbasedonfasttextandlatentsemanticindexing