Unlocking latent features of users and items: empowering multi-modal recommendation systems

Abstract Multimedia recommendation has emerged as a pivotal area in contemporary research, propelled by the exponential growth of digital media consumption. In recent years, the proliferation of multimedia content across diverse platforms has necessitated sophisticated recommendation systems to assi...

Full description

Saved in:
Bibliographic Details
Main Authors: Subham Raj, Sriparna Saha
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-95872-4
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Multimedia recommendation has emerged as a pivotal area in contemporary research, propelled by the exponential growth of digital media consumption. In recent years, the proliferation of multimedia content across diverse platforms has necessitated sophisticated recommendation systems to assist users in navigating this vast landscape. Existing research predominantly centers on integrating multimodal features as auxiliary information within user–item interaction models. However, this approach proves inadequate for an effective multimedia recommendation. Primarily, it implicitly captures collaborative item–item connections via high-order item–user–item associations. Given that items encompass diverse content modalities, we suggest that leveraging latent semantic item–item structures within these multimodal contents could significantly enhance item representations and consequently augment recommendation performance. Existing works also fail to effectively capture user–user affinity in multimedia recommendations as they only focus on improving the item representation. To this end, we propose a novel framework where we capture the latent features of different modalities and also consider the user–user affinity to solve the Recommendation System (RecSys) problem. We have also incorporated the cold-start study in our experiments. We did an extensive experiment over three publicly available datasets to demonstrate the efficacy of our framework over the state-of-the-art model.
ISSN:2045-2322