MuIm: Analyzing Music–Image Correlations from an Artistic Perspective

Cross-modality understanding is essential for AI to tackle complex tasks that require both deterministic and generative capabilities, such as correlating music and visual art. The existing state-of-the-art methods of audio-visual correlation often rely on single-dimension information, focusing eithe...

Full description

Saved in:
Bibliographic Details
Main Authors: Ubaid Ullah, Hyun-Chul Choi
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/23/11470
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846124391164805120
author Ubaid Ullah
Hyun-Chul Choi
author_facet Ubaid Ullah
Hyun-Chul Choi
author_sort Ubaid Ullah
collection DOAJ
description Cross-modality understanding is essential for AI to tackle complex tasks that require both deterministic and generative capabilities, such as correlating music and visual art. The existing state-of-the-art methods of audio-visual correlation often rely on single-dimension information, focusing either on semantic or emotional attributes, thus failing to capture the full depth of these inherently complex modalities. Addressing this limitation, we introduce a novel approach that perceives music–image correlation as multilayered rather than as a direct one-to-one correspondence. To this end, we present a pioneering dataset with two segments: an artistic segment that pairs music with art based on both emotional and semantic attributes, and a realistic segment that links music with images through affective–semantic layers. In modeling emotional layers for the artistic segment, we found traditional 2D affective models inadequate, prompting us to propose a more interpretable hybrid-emotional rating system that serves both experts and non-experts. For the realistic segment, we utilize a web-based dataset with tags, dividing tag information into semantic and affective components to ensure a balanced and nuanced representation of music–image correlation. We conducted an in-depth statistical analysis and user study to evaluate our dataset’s effectiveness and applicability for AI-driven understanding. This work provides a foundation for advanced explorations into the complex relationships between auditory and visual art modalities, advancing the development of more sophisticated cross-modal AI systems.
format Article
id doaj-art-be07e9bb6fed4aaeaac0bcffa8d163ae
institution Kabale University
issn 2076-3417
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-be07e9bb6fed4aaeaac0bcffa8d163ae2024-12-13T16:24:01ZengMDPI AGApplied Sciences2076-34172024-12-0114231147010.3390/app142311470MuIm: Analyzing Music–Image Correlations from an Artistic PerspectiveUbaid Ullah0Hyun-Chul Choi1Intelligent Computer Vision Software Laboratory (ICVSLab), Department of Electronic Engineering, Yeungnam University, 280 Daehak-Ro, Gyeongsan 38541, Gyeongbuk, Republic of KoreaIntelligent Computer Vision Software Laboratory (ICVSLab), Department of Electronic Engineering, Yeungnam University, 280 Daehak-Ro, Gyeongsan 38541, Gyeongbuk, Republic of KoreaCross-modality understanding is essential for AI to tackle complex tasks that require both deterministic and generative capabilities, such as correlating music and visual art. The existing state-of-the-art methods of audio-visual correlation often rely on single-dimension information, focusing either on semantic or emotional attributes, thus failing to capture the full depth of these inherently complex modalities. Addressing this limitation, we introduce a novel approach that perceives music–image correlation as multilayered rather than as a direct one-to-one correspondence. To this end, we present a pioneering dataset with two segments: an artistic segment that pairs music with art based on both emotional and semantic attributes, and a realistic segment that links music with images through affective–semantic layers. In modeling emotional layers for the artistic segment, we found traditional 2D affective models inadequate, prompting us to propose a more interpretable hybrid-emotional rating system that serves both experts and non-experts. For the realistic segment, we utilize a web-based dataset with tags, dividing tag information into semantic and affective components to ensure a balanced and nuanced representation of music–image correlation. We conducted an in-depth statistical analysis and user study to evaluate our dataset’s effectiveness and applicability for AI-driven understanding. This work provides a foundation for advanced explorations into the complex relationships between auditory and visual art modalities, advancing the development of more sophisticated cross-modal AI systems.https://www.mdpi.com/2076-3417/14/23/11470music–imagecross-modalityneural networksmulti-modality
spellingShingle Ubaid Ullah
Hyun-Chul Choi
MuIm: Analyzing Music–Image Correlations from an Artistic Perspective
Applied Sciences
music–image
cross-modality
neural networks
multi-modality
title MuIm: Analyzing Music–Image Correlations from an Artistic Perspective
title_full MuIm: Analyzing Music–Image Correlations from an Artistic Perspective
title_fullStr MuIm: Analyzing Music–Image Correlations from an Artistic Perspective
title_full_unstemmed MuIm: Analyzing Music–Image Correlations from an Artistic Perspective
title_short MuIm: Analyzing Music–Image Correlations from an Artistic Perspective
title_sort muim analyzing music image correlations from an artistic perspective
topic music–image
cross-modality
neural networks
multi-modality
url https://www.mdpi.com/2076-3417/14/23/11470
work_keys_str_mv AT ubaidullah muimanalyzingmusicimagecorrelationsfromanartisticperspective
AT hyunchulchoi muimanalyzingmusicimagecorrelationsfromanartisticperspective