AMUSE: Human-AI Collaborative Songwriting with Multimodal Inspirations

Songwriting is often driven by multimodal inspirations, such as imagery, narratives, or existing music, yet songwriters remain unsupported by current music AI systems in incorporating these multimodal inputs into their creative processes. We introduce Amuse, a songwriting assistant that transforms multimodal (image, text, or audio) inputs into chord progressions that can be seamlessly incorporated into songwriters' creative process. A key feature of Amuse is its novel method for generating coherent chords that are relevant to music keywords in the absence of datasets with paired examples of multimodal inputs and chords. Specifically, we propose a method that leverages multimodal language models to convert multimodal inputs into noisy chord suggestions and uses a unimodal chord model to filter the suggestions. A user study with songwriters shows that Amuse effectively supports transforming multimodal ideas into coherent musical suggestions, enhancing users' agency and creativity throughout the songwriting process.

KAIST, Daejeon, Korea, Republic of

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

10.1145/3706598.3713818

https://dl.acm.org/doi/10.1145/3706598.3713818

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)

Annex Hall F205

7 件の発表

開始日時2025-04-30 01:20:00

終了日時2025-04-30 02:50:00

読み込み中…

お気に入り

あとで読む

コレクション

AMUSE: Human-AI Collaborative Songwriting with Multimodal Inspirations

要旨

受賞
Best Paper

著者

DOI

論文URL

動画

会議: CHI 2025

セッション: Creative Tools

日本語まとめ

AMUSE: Human-AI Collaborative Songwriting with Multimodal Inspirations

要旨

受賞Best Paper

著者

DOI

論文URL

動画

会議: CHI 2025

セッション: Creative Tools

日本語まとめ

受賞
Best Paper