VidTune: Creating Video Soundtracks with Generative Music and Video-Based Thumbnails

Music shapes the tone of videos, yet creators find it hard to find soundtracks that match their video's mood and narrative. Recent text-to-music models let creators generate music from text prompts, but our formative study (N=8) shows creators struggle to construct diverse prompts, quickly review and compare tracks, and understand their impact on the video. We present VidTune, a system that supports soundtrack creation by generating diverse music options from a creator’s prompt and producing contextual thumbnails for rapid review. VidTune extracts representative video subjects to ground thumbnails in context, maps each track’s valence and energy onto visual cues like color and brightness, and depicts prominent genres and instruments. Creators can refine tracks with natural language edits, which VidTune expands into new generations. In a controlled user study (N=12) and an exploratory case study (N=6), participants found VidTune helpful for efficiently reviewing and comparing music options and described the process as playful and enriching.

University of Texas, Austin, Austin, Texas, United States

Adobe Research, Seattle, Washington, United States

Adobe, Seattle, Washington, United States

ACM CHI Conference on Human Factors in Computing Systems

P1 - Room 132

7 件の発表

開始日時2026-04-17 20:15:00

終了日時2026-04-17 21:45:00

お気に入り

あとで読む

コレクション

VidTune: Creating Video Soundtracks with Generative Music and Video-Based Thumbnails

要旨

著者

会議: CHI 2026

セッション: Music to My Ears