User Intent-Based Music Generation Model Combining Actor-Critic Approach With MusicVAE

This research proposes a way for music generation models to more clearly reflect user intent. Previous studies have attempted to generate personalized music through various inputs such as text and images, but these expressions have ambiguous meanings and are difficult to reflect directly in music. T...

Full description

Saved in:
Bibliographic Details
Main Authors: Soyoung Jang, Jaeho Lee
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11122520/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This research proposes a way for music generation models to more clearly reflect user intent. Previous studies have attempted to generate personalized music through various inputs such as text and images, but these expressions have ambiguous meanings and are difficult to reflect directly in music. To solve this problem, this study proposes a method to define the user’s emotional state based on color and use it as a condition for music generation. The colors were classified into saturation, luminance, and hue, and the pitch and density information corresponding to each emotional condition was extracted through a Mood Classifier trained on them. This information was incorporated into a loss function that redefined the MusicVAE’s latent vector, which was then subjected to Actor-Critic-based condition injection. The model performance was evaluated using Spotify Energy-Valence analysis, PCA-based latent space visualization, and listening tests (200 subjects), and found to be superior to existing models in both conditioned performance and musical naturalness.
ISSN:2169-3536