Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing
Retrieving contextually relevant verses from the Al-Qur'an translation dataset presents significant challenges due to the linguistic richness and semantic variation of the text. This study aims to enhance the accuracy and relevance of information retrieval in the Al-Qur'an translation data...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | Indonesian |
| Published: |
Islamic University of Indragiri
2025-05-01
|
| Series: | Sistemasi: Jurnal Sistem Informasi |
| Subjects: | |
| Online Access: | https://sistemasi.ftik.unisi.ac.id/index.php/stmsi/article/view/4446 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849222084779048960 |
|---|---|
| author | aziz ramadhan Fandy Setyo Utomo |
| author_facet | aziz ramadhan Fandy Setyo Utomo |
| author_sort | aziz ramadhan |
| collection | DOAJ |
| description | Retrieving contextually relevant verses from the Al-Qur'an translation dataset presents significant challenges due to the linguistic richness and semantic variation of the text. This study aims to enhance the accuracy and relevance of information retrieval in the Al-Qur'an translation dataset by combining Latent Semantic Indexing (LSI) and FastText word embeddings. The proposed method involves several steps: text preprocessing (lowercasing, punctuation removal, stopword elimination, and stemming), tokenization and vocabulary creation, Bag-of-Words (BoW) representation, creation of LSI models, conversion of FastText vectors, and combining LSI and FastText vectors. A similarity index is then created from the combined vectors to process user queries and rank documents based on cosine similarity. Testing on the dataset, consisting of 6236 translated verses from 114 surahs, showed promising results. The combined approach effectively captures both broader semantic structures and detailed word meanings, providing more accurate and contextually relevant search results. Key findings include high similarity scores, with 90% of retrieved verses being highly relevant to the user query, an accuracy improvement to 85%, and enhanced handling of synonyms and morphological variations at 88%. Further development is recommended, including parameter optimization, advanced preprocessing techniques, real-time search optimization, integration of contextual embeddings, and multilingual support to improve search performance and accuracy. |
| format | Article |
| id | doaj-art-e6118c2d868a4a56a9a392f29a132a6a |
| institution | Kabale University |
| issn | 2302-8149 2540-9719 |
| language | Indonesian |
| publishDate | 2025-05-01 |
| publisher | Islamic University of Indragiri |
| record_format | Article |
| series | Sistemasi: Jurnal Sistem Informasi |
| spelling | doaj-art-e6118c2d868a4a56a9a392f29a132a6a2025-08-26T08:05:46ZindIslamic University of IndragiriSistemasi: Jurnal Sistem Informasi2302-81492540-97192025-05-011431014102410.32520/stmsi.v14i3.44461079Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexingaziz ramadhan0Fandy Setyo Utomo1Universitas Amikom PurwokertoUniversitas Amikom PurwokertoRetrieving contextually relevant verses from the Al-Qur'an translation dataset presents significant challenges due to the linguistic richness and semantic variation of the text. This study aims to enhance the accuracy and relevance of information retrieval in the Al-Qur'an translation dataset by combining Latent Semantic Indexing (LSI) and FastText word embeddings. The proposed method involves several steps: text preprocessing (lowercasing, punctuation removal, stopword elimination, and stemming), tokenization and vocabulary creation, Bag-of-Words (BoW) representation, creation of LSI models, conversion of FastText vectors, and combining LSI and FastText vectors. A similarity index is then created from the combined vectors to process user queries and rank documents based on cosine similarity. Testing on the dataset, consisting of 6236 translated verses from 114 surahs, showed promising results. The combined approach effectively captures both broader semantic structures and detailed word meanings, providing more accurate and contextually relevant search results. Key findings include high similarity scores, with 90% of retrieved verses being highly relevant to the user query, an accuracy improvement to 85%, and enhanced handling of synonyms and morphological variations at 88%. Further development is recommended, including parameter optimization, advanced preprocessing techniques, real-time search optimization, integration of contextual embeddings, and multilingual support to improve search performance and accuracy.https://sistemasi.ftik.unisi.ac.id/index.php/stmsi/article/view/4446information retrievallatent semantic indexingword embeddingfasttextal-qur'an |
| spellingShingle | aziz ramadhan Fandy Setyo Utomo Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing Sistemasi: Jurnal Sistem Informasi information retrieval latent semantic indexing word embedding fasttext al-qur'an |
| title | Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing |
| title_full | Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing |
| title_fullStr | Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing |
| title_full_unstemmed | Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing |
| title_short | Information Retrieval Method for the Qur’an based on FastText and Latent Semantic Indexing |
| title_sort | information retrieval method for the qur an based on fasttext and latent semantic indexing |
| topic | information retrieval latent semantic indexing word embedding fasttext al-qur'an |
| url | https://sistemasi.ftik.unisi.ac.id/index.php/stmsi/article/view/4446 |
| work_keys_str_mv | AT azizramadhan informationretrievalmethodforthequranbasedonfasttextandlatentsemanticindexing AT fandysetyoutomo informationretrievalmethodforthequranbasedonfasttextandlatentsemanticindexing |