Creative Tools

会議の名前
CHI 2025
AMUSE: Human-AI Collaborative Songwriting with Multimodal Inspirations
要旨

Songwriting is often driven by multimodal inspirations, such as imagery, narratives, or existing music, yet songwriters remain unsupported by current music AI systems in incorporating these multimodal inputs into their creative processes. We introduce Amuse, a songwriting assistant that transforms multimodal (image, text, or audio) inputs into chord progressions that can be seamlessly incorporated into songwriters' creative process. A key feature of Amuse is its novel method for generating coherent chords that are relevant to music keywords in the absence of datasets with paired examples of multimodal inputs and chords. Specifically, we propose a method that leverages multimodal language models to convert multimodal inputs into noisy chord suggestions and uses a unimodal chord model to filter the suggestions. A user study with songwriters shows that Amuse effectively supports transforming multimodal ideas into coherent musical suggestions, enhancing users' agency and creativity throughout the songwriting process.

受賞
Best Paper
著者
Yewon Kim
KAIST, Daejeon, Korea, Republic of
Sung-Ju Lee
KAIST, Daejeon, Korea, Republic of
Chris Donahue
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
DOI

10.1145/3706598.3713818

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713818

動画
Compositional Structures as Substrates for Human-AI Co-creation Environment: A Design Approach and A Case Study
要旨

It has been increasingly recognized that effective human-AI co-creation requires more than prompts and results, but an environment with empowering structures that facilitate exploration, planning, iteration, as well as control and inspection of AI generation. Yet, a concrete design approach to such an environment has not been established. Our literature analysis highlights that compositional structures—which organize and visualize individual elements into meaningful wholes—are highly effective in granting creators control over the essential aspects of their content. However, efficiently aggregating and connecting these structures to support the full creation process remains challenging. We, therefore, propose a design approach of leveraging compositional structures as the substrates and infusing AI within and across these structures to enable controlled and fluid creation process. We evaluate this approach through a case study of developing a video co-creation environment using this approach. User evaluation shows that such an environment allowed users to stay oriented in their creation activity, remain aware and in control of AI’s generation, and enable flexible human-AI collaborative workflows.

著者
Yining Cao
University of California, San Diego, San Diego, California, United States
Yiyi Huang
University of California, San Diego, San Diego, California, United States
Anh Truong
Adobe Research, San Francisco, California, United States
Hijung Valentina Shin
Adobe Research, Cambridge, Massachusetts, United States
Haijun Xia
University of California, San Diego, San Diego, California, United States
DOI

10.1145/3706598.3713401

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713401

動画
XCam: Mixed-Initiative Virtual Cinematography for Live Production of Virtual Reality Experiences
要旨

VR is often utilized for organizing virtual events such as meetings, conferences, and concerts; however, support for live production is lacking in most existing VR tools. We present XCam, a toolkit enabling mixed-initiative control over virtual camera systems---from fully manual control by users to increasingly automated, system-driven control with minimal user intervention. XCam's architectural design separates the concerns of object tracking, camera motion, and scene transition, giving more degrees of freedom to operators who can adjust the level of automation along all three dimensions. We used to conduct two studies: (1) interviews with six VR content creators probe into what aspects should and shouldn't be automated based on six applications developed with XCam; (2) three workshops with experts explore XCam's utility in live production of an interactive VR film sequence, a lecture on cinematography, and an alumni meeting in social VR. Expert feedback from our studies suggests how to balance automation and control, and the opportunities and limits of future AI-driven tools.

著者
Michael Nebeling
University of Michigan, Ann Arbor, Michigan, United States
Liwei Wu
University of Waterloo, Waterloo, Ontario, Canada
Hanuma Teja Maddali
University of Maryland, College Park, Maryland, United States
DOI

10.1145/3706598.3713305

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713305

動画
Paratrouper: Exploratory Creation of Character Cast Visuals Using Generative AI
要旨

Great characters are critical to the success of many forms of media, such as comics, games, and films. Designing visually compelling casts of characters requires significant skill and consideration, and there is a lack of specialized tools to support this endeavor. We investigate how AI-driven image-generation techniques can empower creatives to explore a variety of visual design possibilities for individual and groups of characters. Informed by interviews with character designers, Paratrouper is a multi-modal system that enables creating and experimenting with multiple permutations for character casts and visualizing them in various contexts as part of a holistic approach to design. We demonstrate how Paratrouper supports different aspects of the character design process, and share insights from its use by eight creators. Our work highlights the interplay between creative agency and serendipity, as well as the visual interrelationships among character aesthetics.

著者
Joanne Leong
Autodesk Research, Toronto, Ontario, Canada
David Ledo
Autodesk Research, Toronto, Ontario, Canada
Thomas Driscoll
Autodesk Research, Toronto, Ontario, Canada
Tovi Grossman
University of Toronto, Toronto, Ontario, Canada
George Fitzmaurice
Autodesk Research, Toronto, Ontario, Canada
Fraser Anderson
Autodesk Research, Toronto, Ontario, Canada
DOI

10.1145/3706598.3714242

論文URL

https://dl.acm.org/doi/10.1145/3706598.3714242

動画
VAction: A Lightweight and Integrated VR Training System for Authentic Film-Shooting Experience
要旨

The film industry exerts significant economic and cultural influence, and its rapid development is contingent upon the expertise of industry professionals, underscoring the critical importance of film-shooting education. However, this process typically necessitates multiple practice in complex professional venues using expensive equipment, presenting a significant obstacle for ordinary learners who struggle to access such training environments. Despite VR technology has already shown its potential in education, existing research has not addressed the crucial learning component of replicating the shooting process. Moreover, the limited functionality of traditional controllers hinder the fulfillment of the educational requirements. Therefore, we developed VAction VR system, combining high-fidelity virtual environments with a custom-designed controller to simulate the real-world camera operation experience. The system’s lightweight design ensures cost-effective and efficient deployment. Experiment results demonstrated that VAction significantly outperforms traditional methods in both practice effectiveness and user experience, indicating its potential and usefulness in film-shooting education.

著者
Shaocong Wang
Tsinghua University, Beijing, China
Che Qu
Beijing Film Academy, Beijing, Beijing, China
Minjing Yu
Tianjin University, Tianjin, China
Chao Zhou
Institute of Software Chinese Acadamy of Sciences, Beijing, China
Yuntao Wang
Tsinghua University, Beijing, China
Yu-Hui Wen
Beijing Jiaotong University, Beijing, China
Yuanchun Shi
Tsinghua University, Beijing, China
Yong-Jin Liu
Tsinghua University, Beijing, Beijing, China
DOI

10.1145/3706598.3714217

論文URL

https://dl.acm.org/doi/10.1145/3706598.3714217

動画
4D Bioforming with Bees: An Industry-Compatible Prototyping Method for Polymorphic Honeycomb Creation
要旨

4D bioforming is a captivating yet often overlooked natural aesthetic phenomenon, involving the polymorphic transformations that occur during the construction of honeycomb by bees, which presents a significant opportunity for the innovative transformation of traditional apiculture. This paper proposes an industry-compatible prototyping method for polymorphic honeycomb creation following the phenomenon of 4D bioforming, aiming to introduce innovation to honeycomb forms through 4D bioforming while preserving the central role of beekeepers. The method is designed to align with the practical habits of beekeepers and can be outlined in four key steps: scaffold creation, quadrilateral shape division, bee path compilation with outer mold, and 4D bioforming. The dynamic temporal changes in the honeycomb were successfully demonstrated, enhancing the artistic aspect of honeycomb creation. Evaluation results suggest that the method is compatible with traditional practices, easily adoptable by beekeepers and that the polymorphic honeycomb meets essential aesthetic standards.

受賞
Honorable Mention
著者
Yixiong Wang
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
Huajie Suen
Sichuan Fine Arts Institute, Chongqing, Chongqing, China
Shengfeng Duan
Sichuan Fine Arts Institute, Chongqing, Chongqing, China
Chen Liang
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, China
DOI

10.1145/3706598.3713696

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713696

動画
Reflecting on Design Paradigms of Animated Data Video Tools
要旨

Animated data videos have gained significant popularity in recent years. However, authoring data videos remains challenging due to the complexity of creating and coordinating diverse components (e.g., visualization, animation, audio, etc.). Although numerous tools have been developed to streamline the process, there is a lack of comprehensive understanding and reflection of their design paradigms to inform future development. To address this gap, we propose a framework for understanding data video creation tools along two dimensions: what data video components to create and coordinate, including visual, motion, narrative, and audio components, and how to support the creation and coordination. By applying the framework to analyze 46 existing tools, we summarized key design paradigms of creating and coordinating each component based on the varying work distribution for humans and AI in these tools. Finally, we share our detailed reflections, highlight gaps from a holistic view, and discuss future directions to address them.

著者
Leixian Shen
The Hong Kong University of Science and Technology, Hong Kong, China
Haotian Li
Microsoft Research Asia, Beijing, China
Yun Wang
Microsoft Research Asia, Hong Kong, China
Huamin Qu
The Hong Kong University of Science and Technology, Hong Kong, China
DOI

10.1145/3706598.3713449

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713449

動画