Viewport prediction with cross modal multiscale transformer for 360° video streaming
Abstract In the realm of immersive video technologies, efficient 360° video streaming remains a challenge due to the high bandwidth requirements and the dynamic nature of user viewports. Most existing approaches neglect the dependencies between different modalities, and personal preferences are rare...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-08-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-16011-7 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract In the realm of immersive video technologies, efficient 360° video streaming remains a challenge due to the high bandwidth requirements and the dynamic nature of user viewports. Most existing approaches neglect the dependencies between different modalities, and personal preferences are rarely considered. These limitations lead to inconsistent prediction performance. Here, we present a novel viewport prediction model leveraging a Cross Modal Multiscale Transformer (CMMST) that integrates user trajectory and video saliency features across different scales. Our approach outperforms baseline methods, maintaining high precision even with extended prediction intervals. By harnessing the Cross Modal attention mechanisms, CMMST captures intricate user preferences and viewing patterns, offering a promising solution for adaptive streaming in virtual reality and other immersive platforms. The code of this work is available at https://github.com/bbgua85776540/CMMST . |
|---|---|
| ISSN: | 2045-2322 |