Music videos have traditionally been the domain of experts, but with text-to-video generative AI models, AI artists can now create them more easily. However, accurately reflecting the desired music-visual mise-en-scène remains challenging without specialized knowledge, highlighting the need for supportive tools. To address this, we conducted a design workshop with seven music video experts, identified design goals, and developed MVPrompt—a tool for generating music-visual mise-en-scène prompts. In a user study with 24 AI artists, MVPrompt outperformed the Baseline, effectively supporting the collaborative creative process. Specifically, the Visual Theme stage facilitated the exploration of tone and manner, while the Visual Scene & Grammar stage refined prompts with detailed mise-en-scène elements. By enabling AI artists to specify mise-en-scène creatively, MVPrompt enhances the experience of making music video scenes with text-to-video generative AI.
https://dl.acm.org/doi/10.1145/3706598.3713876
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)