Toyteller: AI-powered Visual Storytelling Through Toy-Playing with Character Symbols

要旨

We introduce Toyteller, an AI-powered storytelling system where users generate a mix of story text and visuals by directly manipulating character symbols like they are toy-playing. Anthropomorphized symbol motions can convey rich and nuanced social interactions; Toyteller leverages these motions (1) to let users steer story text generation and (2) as a visual output format that accompanies story text. We enabled motion-steered text generation and text-steered motion generation by mapping motions and text onto a shared semantic space so that large language models and motion generation models can use it as a translational layer. Technical evaluations showed that Toyteller outperforms a competitive baseline, GPT-4o. Our user study identified that toy-playing helps express intentions difficult to verbalize. However, only motions could not express all user intentions, suggesting combining it with other modalities like language. We discuss the design space of toy-playing interactions and implications for technical HCI research on human-AI interaction.

著者
John Joon Young. Chung
Midjourney, San Francisco, California, United States
Melissa Roemmele
Midjourney, San Francisco, California, United States
Max Kreminski
Midjourney, San Francisco, California, United States
DOI

10.1145/3706598.3713435

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713435

動画

会議: CHI 2025

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)

セッション: Digital Storytelling

G304
7 件の発表
2025-04-29 01:20:00
2025-04-29 02:50:00
日本語まとめ
読み込み中…