A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval
Abstract Pre-trained models have garnered significant attention in the field of information retrieval, particularly for improving document ranking. Typically, an initial retrieval step using sparse methods such as BM25 is employed to obtain a set of pseudo-relevant documents, followed by re-ranking...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2024-12-01
|
Series: | Scientific Reports |
Subjects: | |
Online Access: | https://doi.org/10.1038/s41598-024-82871-0 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841559453469507584 |
---|---|
author | Min Pan Yu Liu Jinguang Chen Ellen Anne Huang Jimmy X. Huang |
author_facet | Min Pan Yu Liu Jinguang Chen Ellen Anne Huang Jimmy X. Huang |
author_sort | Min Pan |
collection | DOAJ |
description | Abstract Pre-trained models have garnered significant attention in the field of information retrieval, particularly for improving document ranking. Typically, an initial retrieval step using sparse methods such as BM25 is employed to obtain a set of pseudo-relevant documents, followed by re-ranking with a pre-trained model. However, the semantic information captured by pre-trained models from sentences or passages is usually only applied to document ranking, with limited use in query expansion. In fact, the semantic information within pseudo-relevant documents plays a critical role in selecting appropriate query expansion terms. Therefore, this paper proposes a novel approach that leverages pre-trained models to extract multi-dimensional semantic information from pseudo-relevant documents, offering more possibilities for query expansion. First, traditional sparse retrieval methods are used in the initial retrieval stage to ensure efficiency, and term-level weights are calculated based on statistical information. Then, the pre-trained model encodes both the query and the sentences and passages from the documents, extracting sentence-level and passage-level semantic similarities to the query. Finally, these semantic weights are combined with the term-level weights to generate an improved query for the second retrieval round. We conducted experiments on five TREC datasets and a medical dataset, showing improvements in official metrics such as MAP and P@10. The results demonstrate the effectiveness of utilizing multi-dimensional semantic information from pseudo-relevant documents to optimize query expansion. This study offers new insights into how the semantic information of pseudo-relevant documents can be effectively harnessed to enhance retrieval performance. |
format | Article |
id | doaj-art-9c8af92a06164c2a9b35ff37bb3a0818 |
institution | Kabale University |
issn | 2045-2322 |
language | English |
publishDate | 2024-12-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj-art-9c8af92a06164c2a9b35ff37bb3a08182025-01-05T12:29:59ZengNature PortfolioScientific Reports2045-23222024-12-0114111710.1038/s41598-024-82871-0A multi-dimensional semantic pseudo-relevance feedback framework for information retrievalMin Pan0Yu Liu1Jinguang Chen2Ellen Anne Huang3Jimmy X. Huang4College of Computer and Information Engineering, Hubei Normal UniversityCollege of Computer and Information Engineering, Hubei Normal UniversitySchool of Electronic Information, Huzhou CollegeDepartment of Computer Science, Western UniversityInformation Retrieval and Knowledge Management Research Lab, School of Information Technology, York UniversityAbstract Pre-trained models have garnered significant attention in the field of information retrieval, particularly for improving document ranking. Typically, an initial retrieval step using sparse methods such as BM25 is employed to obtain a set of pseudo-relevant documents, followed by re-ranking with a pre-trained model. However, the semantic information captured by pre-trained models from sentences or passages is usually only applied to document ranking, with limited use in query expansion. In fact, the semantic information within pseudo-relevant documents plays a critical role in selecting appropriate query expansion terms. Therefore, this paper proposes a novel approach that leverages pre-trained models to extract multi-dimensional semantic information from pseudo-relevant documents, offering more possibilities for query expansion. First, traditional sparse retrieval methods are used in the initial retrieval stage to ensure efficiency, and term-level weights are calculated based on statistical information. Then, the pre-trained model encodes both the query and the sentences and passages from the documents, extracting sentence-level and passage-level semantic similarities to the query. Finally, these semantic weights are combined with the term-level weights to generate an improved query for the second retrieval round. We conducted experiments on five TREC datasets and a medical dataset, showing improvements in official metrics such as MAP and P@10. The results demonstrate the effectiveness of utilizing multi-dimensional semantic information from pseudo-relevant documents to optimize query expansion. This study offers new insights into how the semantic information of pseudo-relevant documents can be effectively harnessed to enhance retrieval performance.https://doi.org/10.1038/s41598-024-82871-0Information retrievalPseudo-relevance feedbackSemantic information |
spellingShingle | Min Pan Yu Liu Jinguang Chen Ellen Anne Huang Jimmy X. Huang A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval Scientific Reports Information retrieval Pseudo-relevance feedback Semantic information |
title | A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval |
title_full | A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval |
title_fullStr | A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval |
title_full_unstemmed | A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval |
title_short | A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval |
title_sort | multi dimensional semantic pseudo relevance feedback framework for information retrieval |
topic | Information retrieval Pseudo-relevance feedback Semantic information |
url | https://doi.org/10.1038/s41598-024-82871-0 |
work_keys_str_mv | AT minpan amultidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval AT yuliu amultidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval AT jinguangchen amultidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval AT ellenannehuang amultidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval AT jimmyxhuang amultidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval AT minpan multidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval AT yuliu multidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval AT jinguangchen multidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval AT ellenannehuang multidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval AT jimmyxhuang multidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval |