VidTune: Creating Video Soundtracks with Generative Music and Video-Based Thumbnails

要旨

Music shapes the tone of videos, yet creators find it hard to find soundtracks that match their video's mood and narrative. Recent text-to-music models let creators generate music from text prompts, but our formative study (N=8) shows creators struggle to construct diverse prompts, quickly review and compare tracks, and understand their impact on the video. We present VidTune, a system that supports soundtrack creation by generating diverse music options from a creator’s prompt and producing contextual thumbnails for rapid review. VidTune extracts representative video subjects to ground thumbnails in context, maps each track’s valence and energy onto visual cues like color and brightness, and depicts prominent genres and instruments. Creators can refine tracks with natural language edits, which VidTune expands into new generations. In a controlled user study (N=12) and an exploratory case study (N=6), participants found VidTune helpful for efficiently reviewing and comparing music options and described the process as playful and enriching.

著者
Mina Huh
University of Texas, Austin, Austin, Texas, United States
C. Ailie Fraser
Adobe Research, Seattle, Washington, United States
Ding Li
Adobe Research, Seattle, Washington, United States
Mira Dontcheva
Adobe Research, Seattle, Washington, United States
Bryan Wang
Adobe, Seattle, Washington, United States

会議: CHI 2026

ACM CHI Conference on Human Factors in Computing Systems

セッション: Music to My Ears

P1 - Room 132
7 件の発表
2026-04-17 20:15:00
2026-04-17 21:45:00