A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval

Abstract Pre-trained models have garnered significant attention in the field of information retrieval, particularly for improving document ranking. Typically, an initial retrieval step using sparse methods such as BM25 is employed to obtain a set of pseudo-relevant documents, followed by re-ranking...

Full description

Saved in:
Bibliographic Details
Main Authors: Min Pan, Yu Liu, Jinguang Chen, Ellen Anne Huang, Jimmy X. Huang
Format: Article
Language:English
Published: Nature Portfolio 2024-12-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-82871-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559453469507584
author Min Pan
Yu Liu
Jinguang Chen
Ellen Anne Huang
Jimmy X. Huang
author_facet Min Pan
Yu Liu
Jinguang Chen
Ellen Anne Huang
Jimmy X. Huang
author_sort Min Pan
collection DOAJ
description Abstract Pre-trained models have garnered significant attention in the field of information retrieval, particularly for improving document ranking. Typically, an initial retrieval step using sparse methods such as BM25 is employed to obtain a set of pseudo-relevant documents, followed by re-ranking with a pre-trained model. However, the semantic information captured by pre-trained models from sentences or passages is usually only applied to document ranking, with limited use in query expansion. In fact, the semantic information within pseudo-relevant documents plays a critical role in selecting appropriate query expansion terms. Therefore, this paper proposes a novel approach that leverages pre-trained models to extract multi-dimensional semantic information from pseudo-relevant documents, offering more possibilities for query expansion. First, traditional sparse retrieval methods are used in the initial retrieval stage to ensure efficiency, and term-level weights are calculated based on statistical information. Then, the pre-trained model encodes both the query and the sentences and passages from the documents, extracting sentence-level and passage-level semantic similarities to the query. Finally, these semantic weights are combined with the term-level weights to generate an improved query for the second retrieval round. We conducted experiments on five TREC datasets and a medical dataset, showing improvements in official metrics such as MAP and P@10. The results demonstrate the effectiveness of utilizing multi-dimensional semantic information from pseudo-relevant documents to optimize query expansion. This study offers new insights into how the semantic information of pseudo-relevant documents can be effectively harnessed to enhance retrieval performance.
format Article
id doaj-art-9c8af92a06164c2a9b35ff37bb3a0818
institution Kabale University
issn 2045-2322
language English
publishDate 2024-12-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-9c8af92a06164c2a9b35ff37bb3a08182025-01-05T12:29:59ZengNature PortfolioScientific Reports2045-23222024-12-0114111710.1038/s41598-024-82871-0A multi-dimensional semantic pseudo-relevance feedback framework for information retrievalMin Pan0Yu Liu1Jinguang Chen2Ellen Anne Huang3Jimmy X. Huang4College of Computer and Information Engineering, Hubei Normal UniversityCollege of Computer and Information Engineering, Hubei Normal UniversitySchool of Electronic Information, Huzhou CollegeDepartment of Computer Science, Western UniversityInformation Retrieval and Knowledge Management Research Lab, School of Information Technology, York UniversityAbstract Pre-trained models have garnered significant attention in the field of information retrieval, particularly for improving document ranking. Typically, an initial retrieval step using sparse methods such as BM25 is employed to obtain a set of pseudo-relevant documents, followed by re-ranking with a pre-trained model. However, the semantic information captured by pre-trained models from sentences or passages is usually only applied to document ranking, with limited use in query expansion. In fact, the semantic information within pseudo-relevant documents plays a critical role in selecting appropriate query expansion terms. Therefore, this paper proposes a novel approach that leverages pre-trained models to extract multi-dimensional semantic information from pseudo-relevant documents, offering more possibilities for query expansion. First, traditional sparse retrieval methods are used in the initial retrieval stage to ensure efficiency, and term-level weights are calculated based on statistical information. Then, the pre-trained model encodes both the query and the sentences and passages from the documents, extracting sentence-level and passage-level semantic similarities to the query. Finally, these semantic weights are combined with the term-level weights to generate an improved query for the second retrieval round. We conducted experiments on five TREC datasets and a medical dataset, showing improvements in official metrics such as MAP and P@10. The results demonstrate the effectiveness of utilizing multi-dimensional semantic information from pseudo-relevant documents to optimize query expansion. This study offers new insights into how the semantic information of pseudo-relevant documents can be effectively harnessed to enhance retrieval performance.https://doi.org/10.1038/s41598-024-82871-0Information retrievalPseudo-relevance feedbackSemantic information
spellingShingle Min Pan
Yu Liu
Jinguang Chen
Ellen Anne Huang
Jimmy X. Huang
A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval
Scientific Reports
Information retrieval
Pseudo-relevance feedback
Semantic information
title A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval
title_full A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval
title_fullStr A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval
title_full_unstemmed A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval
title_short A multi-dimensional semantic pseudo-relevance feedback framework for information retrieval
title_sort multi dimensional semantic pseudo relevance feedback framework for information retrieval
topic Information retrieval
Pseudo-relevance feedback
Semantic information
url https://doi.org/10.1038/s41598-024-82871-0
work_keys_str_mv AT minpan amultidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval
AT yuliu amultidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval
AT jinguangchen amultidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval
AT ellenannehuang amultidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval
AT jimmyxhuang amultidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval
AT minpan multidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval
AT yuliu multidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval
AT jinguangchen multidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval
AT ellenannehuang multidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval
AT jimmyxhuang multidimensionalsemanticpseudorelevancefeedbackframeworkforinformationretrieval