Songwriting is often driven by multimodal inspirations, such as imagery, narratives, or existing music, yet songwriters remain unsupported by current music AI systems in incorporating these multimodal inputs into their creative processes. We introduce Amuse, a songwriting assistant that transforms multimodal (image, text, or audio) inputs into chord progressions that can be seamlessly incorporated into songwriters' creative process. A key feature of Amuse is its novel method for generating coherent chords that are relevant to music keywords in the absence of datasets with paired examples of multimodal inputs and chords. Specifically, we propose a method that leverages multimodal language models to convert multimodal inputs into noisy chord suggestions and uses a unimodal chord model to filter the suggestions. A user study with songwriters shows that Amuse effectively supports transforming multimodal ideas into coherent musical suggestions, enhancing users' agency and creativity throughout the songwriting process.
https://dl.acm.org/doi/10.1145/3706598.3713818
It has been increasingly recognized that effective human-AI co-creation requires more than prompts and results, but an environment with empowering structures that facilitate exploration, planning, iteration, as well as control and inspection of AI generation. Yet, a concrete design approach to such an environment has not been established. Our literature analysis highlights that compositional structures—which organize and visualize individual elements into meaningful wholes—are highly effective in granting creators control over the essential aspects of their content. However, efficiently aggregating and connecting these structures to support the full creation process remains challenging. We, therefore, propose a design approach of leveraging compositional structures as the substrates and infusing AI within and across these structures to enable controlled and fluid creation process. We evaluate this approach through a case study of developing a video co-creation environment using this approach. User evaluation shows that such an environment allowed users to stay oriented in their creation activity, remain aware and in control of AI’s generation, and enable flexible human-AI collaborative workflows.
https://dl.acm.org/doi/10.1145/3706598.3713401
VR is often utilized for organizing virtual events such as meetings, conferences, and concerts; however, support for live production is lacking in most existing VR tools. We present XCam, a toolkit enabling mixed-initiative control over virtual camera systems---from fully manual control by users to increasingly automated, system-driven control with minimal user intervention. XCam's architectural design separates the concerns of object tracking, camera motion, and scene transition, giving more degrees of freedom to operators who can adjust the level of automation along all three dimensions. We used to conduct two studies: (1) interviews with six VR content creators probe into what aspects should and shouldn't be automated based on six applications developed with XCam; (2) three workshops with experts explore XCam's utility in live production of an interactive VR film sequence, a lecture on cinematography, and an alumni meeting in social VR. Expert feedback from our studies suggests how to balance automation and control, and the opportunities and limits of future AI-driven tools.
https://dl.acm.org/doi/10.1145/3706598.3713305
Great characters are critical to the success of many forms of media, such as comics, games, and films. Designing visually compelling casts of characters requires significant skill and consideration, and there is a lack of specialized tools to support this endeavor. We investigate how AI-driven image-generation techniques can empower creatives to explore a variety of visual design possibilities for individual and groups of characters. Informed by interviews with character designers, Paratrouper is a multi-modal system that enables creating and experimenting with multiple permutations for character casts and visualizing them in various contexts as part of a holistic approach to design. We demonstrate how Paratrouper supports different aspects of the character design process, and share insights from its use by eight creators. Our work highlights the interplay between creative agency and serendipity, as well as the visual interrelationships among character aesthetics.
https://dl.acm.org/doi/10.1145/3706598.3714242
The film industry exerts significant economic and cultural influence, and its rapid development is contingent upon the expertise of industry professionals, underscoring the critical importance of film-shooting education. However, this process typically necessitates multiple practice in complex professional venues using expensive equipment, presenting a significant obstacle for ordinary learners who struggle to access such training environments. Despite VR technology has already shown its potential in education, existing research has not addressed the crucial learning component of replicating the shooting process. Moreover, the limited functionality of traditional controllers hinder the fulfillment of the educational requirements. Therefore, we developed VAction VR system, combining high-fidelity virtual environments with a custom-designed controller to simulate the real-world camera operation experience. The system’s lightweight design ensures cost-effective and efficient deployment. Experiment results demonstrated that VAction significantly outperforms traditional methods in both practice effectiveness and user experience, indicating its potential and usefulness in film-shooting education.
https://dl.acm.org/doi/10.1145/3706598.3714217
4D bioforming is a captivating yet often overlooked natural aesthetic phenomenon, involving the polymorphic transformations that occur during the construction of honeycomb by bees, which presents a significant opportunity for the innovative transformation of traditional apiculture. This paper proposes an industry-compatible prototyping method for polymorphic honeycomb creation following the phenomenon of 4D bioforming, aiming to introduce innovation to honeycomb forms through 4D bioforming while preserving the central role of beekeepers. The method is designed to align with the practical habits of beekeepers and can be outlined in four key steps: scaffold creation, quadrilateral shape division, bee path compilation with outer mold, and 4D bioforming. The dynamic temporal changes in the honeycomb were successfully demonstrated, enhancing the artistic aspect of honeycomb creation. Evaluation results suggest that the method is compatible with traditional practices, easily adoptable by beekeepers and that the polymorphic honeycomb meets essential aesthetic standards.
https://dl.acm.org/doi/10.1145/3706598.3713696
Animated data videos have gained significant popularity in recent years. However, authoring data videos remains challenging due to the complexity of creating and coordinating diverse components (e.g., visualization, animation, audio, etc.). Although numerous tools have been developed to streamline the process, there is a lack of comprehensive understanding and reflection of their design paradigms to inform future development. To address this gap, we propose a framework for understanding data video creation tools along two dimensions: what data video components to create and coordinate, including visual, motion, narrative, and audio components, and how to support the creation and coordination. By applying the framework to analyze 46 existing tools, we summarized key design paradigms of creating and coordinating each component based on the varying work distribution for humans and AI in these tools. Finally, we share our detailed reflections, highlight gaps from a holistic view, and discuss future directions to address them.
https://dl.acm.org/doi/10.1145/3706598.3713449