Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing

Retrieving contextually relevant verses from the Al-Qur'an translation dataset presents significant challenges due to the linguistic richness and semantic variation of the text. This study aims to enhance the accuracy and relevance of information retrieval in the Al-Qur'an translation data...

Full description

Saved in:

Bibliographic Details
Main Authors:	aziz ramadhan, Fandy Setyo Utomo
Format:	Article
Language:	Indonesian
Published:	Islamic University of Indragiri 2025-05-01
Series:	Sistemasi: Jurnal Sistem Informasi
Subjects:	information retrieval latent semantic indexing word embedding fasttext al-qur'an
Online Access:	https://sistemasi.ftik.unisi.ac.id/index.php/stmsi/article/view/4446
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849222084779048960
author	aziz ramadhan Fandy Setyo Utomo
author_facet	aziz ramadhan Fandy Setyo Utomo
author_sort	aziz ramadhan
collection	DOAJ
description	Retrieving contextually relevant verses from the Al-Qur'an translation dataset presents significant challenges due to the linguistic richness and semantic variation of the text. This study aims to enhance the accuracy and relevance of information retrieval in the Al-Qur'an translation dataset by combining Latent Semantic Indexing (LSI) and FastText word embeddings. The proposed method involves several steps: text preprocessing (lowercasing, punctuation removal, stopword elimination, and stemming), tokenization and vocabulary creation, Bag-of-Words (BoW) representation, creation of LSI models, conversion of FastText vectors, and combining LSI and FastText vectors. A similarity index is then created from the combined vectors to process user queries and rank documents based on cosine similarity. Testing on the dataset, consisting of 6236 translated verses from 114 surahs, showed promising results. The combined approach effectively captures both broader semantic structures and detailed word meanings, providing more accurate and contextually relevant search results. Key findings include high similarity scores, with 90% of retrieved verses being highly relevant to the user query, an accuracy improvement to 85%, and enhanced handling of synonyms and morphological variations at 88%. Further development is recommended, including parameter optimization, advanced preprocessing techniques, real-time search optimization, integration of contextual embeddings, and multilingual support to improve search performance and accuracy.
format	Article
id	doaj-art-e6118c2d868a4a56a9a392f29a132a6a
institution	Kabale University
issn	2302-8149 2540-9719
language	Indonesian
publishDate	2025-05-01
publisher	Islamic University of Indragiri
record_format	Article
series	Sistemasi: Jurnal Sistem Informasi
spelling	doaj-art-e6118c2d868a4a56a9a392f29a132a6a2025-08-26T08:05:46ZindIslamic University of IndragiriSistemasi: Jurnal Sistem Informasi2302-81492540-97192025-05-011431014102410.32520/stmsi.v14i3.44461079Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexingaziz ramadhan0Fandy Setyo Utomo1Universitas Amikom PurwokertoUniversitas Amikom PurwokertoRetrieving contextually relevant verses from the Al-Qur'an translation dataset presents significant challenges due to the linguistic richness and semantic variation of the text. This study aims to enhance the accuracy and relevance of information retrieval in the Al-Qur'an translation dataset by combining Latent Semantic Indexing (LSI) and FastText word embeddings. The proposed method involves several steps: text preprocessing (lowercasing, punctuation removal, stopword elimination, and stemming), tokenization and vocabulary creation, Bag-of-Words (BoW) representation, creation of LSI models, conversion of FastText vectors, and combining LSI and FastText vectors. A similarity index is then created from the combined vectors to process user queries and rank documents based on cosine similarity. Testing on the dataset, consisting of 6236 translated verses from 114 surahs, showed promising results. The combined approach effectively captures both broader semantic structures and detailed word meanings, providing more accurate and contextually relevant search results. Key findings include high similarity scores, with 90% of retrieved verses being highly relevant to the user query, an accuracy improvement to 85%, and enhanced handling of synonyms and morphological variations at 88%. Further development is recommended, including parameter optimization, advanced preprocessing techniques, real-time search optimization, integration of contextual embeddings, and multilingual support to improve search performance and accuracy.https://sistemasi.ftik.unisi.ac.id/index.php/stmsi/article/view/4446information retrievallatent semantic indexingword embeddingfasttextal-qur'an
spellingShingle	aziz ramadhan Fandy Setyo Utomo Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing Sistemasi: Jurnal Sistem Informasi information retrieval latent semantic indexing word embedding fasttext al-qur'an
title	Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing
title_full	Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing
title_fullStr	Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing
title_full_unstemmed	Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing
title_short	Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing
title_sort	information retrieval method for the qur an based on fasttext and latent semantic indexing
topic	information retrieval latent semantic indexing word embedding fasttext al-qur'an
url	https://sistemasi.ftik.unisi.ac.id/index.php/stmsi/article/view/4446
work_keys_str_mv	AT azizramadhan informationretrievalmethodforthequranbasedonfasttextandlatentsemanticindexing AT fandysetyoutomo informationretrievalmethodforthequranbasedonfasttextandlatentsemanticindexing

Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing

Similar Items