MuIm: Analyzing Music–Image Correlations from an Artistic Perspective
Cross-modality understanding is essential for AI to tackle complex tasks that require both deterministic and generative capabilities, such as correlating music and visual art. The existing state-of-the-art methods of audio-visual correlation often rely on single-dimension information, focusing eithe...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-12-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/14/23/11470 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846124391164805120 |
|---|---|
| author | Ubaid Ullah Hyun-Chul Choi |
| author_facet | Ubaid Ullah Hyun-Chul Choi |
| author_sort | Ubaid Ullah |
| collection | DOAJ |
| description | Cross-modality understanding is essential for AI to tackle complex tasks that require both deterministic and generative capabilities, such as correlating music and visual art. The existing state-of-the-art methods of audio-visual correlation often rely on single-dimension information, focusing either on semantic or emotional attributes, thus failing to capture the full depth of these inherently complex modalities. Addressing this limitation, we introduce a novel approach that perceives music–image correlation as multilayered rather than as a direct one-to-one correspondence. To this end, we present a pioneering dataset with two segments: an artistic segment that pairs music with art based on both emotional and semantic attributes, and a realistic segment that links music with images through affective–semantic layers. In modeling emotional layers for the artistic segment, we found traditional 2D affective models inadequate, prompting us to propose a more interpretable hybrid-emotional rating system that serves both experts and non-experts. For the realistic segment, we utilize a web-based dataset with tags, dividing tag information into semantic and affective components to ensure a balanced and nuanced representation of music–image correlation. We conducted an in-depth statistical analysis and user study to evaluate our dataset’s effectiveness and applicability for AI-driven understanding. This work provides a foundation for advanced explorations into the complex relationships between auditory and visual art modalities, advancing the development of more sophisticated cross-modal AI systems. |
| format | Article |
| id | doaj-art-be07e9bb6fed4aaeaac0bcffa8d163ae |
| institution | Kabale University |
| issn | 2076-3417 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-be07e9bb6fed4aaeaac0bcffa8d163ae2024-12-13T16:24:01ZengMDPI AGApplied Sciences2076-34172024-12-0114231147010.3390/app142311470MuIm: Analyzing Music–Image Correlations from an Artistic PerspectiveUbaid Ullah0Hyun-Chul Choi1Intelligent Computer Vision Software Laboratory (ICVSLab), Department of Electronic Engineering, Yeungnam University, 280 Daehak-Ro, Gyeongsan 38541, Gyeongbuk, Republic of KoreaIntelligent Computer Vision Software Laboratory (ICVSLab), Department of Electronic Engineering, Yeungnam University, 280 Daehak-Ro, Gyeongsan 38541, Gyeongbuk, Republic of KoreaCross-modality understanding is essential for AI to tackle complex tasks that require both deterministic and generative capabilities, such as correlating music and visual art. The existing state-of-the-art methods of audio-visual correlation often rely on single-dimension information, focusing either on semantic or emotional attributes, thus failing to capture the full depth of these inherently complex modalities. Addressing this limitation, we introduce a novel approach that perceives music–image correlation as multilayered rather than as a direct one-to-one correspondence. To this end, we present a pioneering dataset with two segments: an artistic segment that pairs music with art based on both emotional and semantic attributes, and a realistic segment that links music with images through affective–semantic layers. In modeling emotional layers for the artistic segment, we found traditional 2D affective models inadequate, prompting us to propose a more interpretable hybrid-emotional rating system that serves both experts and non-experts. For the realistic segment, we utilize a web-based dataset with tags, dividing tag information into semantic and affective components to ensure a balanced and nuanced representation of music–image correlation. We conducted an in-depth statistical analysis and user study to evaluate our dataset’s effectiveness and applicability for AI-driven understanding. This work provides a foundation for advanced explorations into the complex relationships between auditory and visual art modalities, advancing the development of more sophisticated cross-modal AI systems.https://www.mdpi.com/2076-3417/14/23/11470music–imagecross-modalityneural networksmulti-modality |
| spellingShingle | Ubaid Ullah Hyun-Chul Choi MuIm: Analyzing Music–Image Correlations from an Artistic Perspective Applied Sciences music–image cross-modality neural networks multi-modality |
| title | MuIm: Analyzing Music–Image Correlations from an Artistic Perspective |
| title_full | MuIm: Analyzing Music–Image Correlations from an Artistic Perspective |
| title_fullStr | MuIm: Analyzing Music–Image Correlations from an Artistic Perspective |
| title_full_unstemmed | MuIm: Analyzing Music–Image Correlations from an Artistic Perspective |
| title_short | MuIm: Analyzing Music–Image Correlations from an Artistic Perspective |
| title_sort | muim analyzing music image correlations from an artistic perspective |
| topic | music–image cross-modality neural networks multi-modality |
| url | https://www.mdpi.com/2076-3417/14/23/11470 |
| work_keys_str_mv | AT ubaidullah muimanalyzingmusicimagecorrelationsfromanartisticperspective AT hyunchulchoi muimanalyzingmusicimagecorrelationsfromanartisticperspective |